--- phase: 6 plan: 2 wave: 2 title: "爬虫入库统计 API + 前端监控区域(QUAL-06/07)" depends_on: - "06-01-PLAN.md" files_modified: - app/api/v1/job/job.py # 新增 GET /data/stats 端点 - web/src/views/cleaning/monitor.vue # 新增爬虫统计区域 - web/src/api/index.js # 新增 getIngestStats API autonomous: true requirements: - QUAL-06 - QUAL-07 --- # Phase 6 Plan 02: 爬虫入库统计 API + 前端监控(QUAL-06/07) ## Objective ### QUAL-07 状态确认(已完成) `cleaning/monitor.vue` 已包含: - ✅ 待清洗公司列表(队列表格) - ✅ 触发清洗 - ✅ 查看结果 **QUAL-07 无需额外改动。** ### QUAL-06 缺口 现有监控页面仅展示公司清洗队列状态,**缺少爬虫职位入库的实时统计**: - 各平台最近抓取时间(ClickHouse `created_at` 最大值) - 数量趋势(近 7 天每日入库量) - 错误状态(失败/去重统计暂不通过 ClickHouse,后续可扩展) ## Must Haves - [ ] 后端新增 `GET /api/v1/job/data/stats` 端点,接受 `platform`(可选)和 `days`(默认 7)参数 - 返回:各平台 `total`、`today`、`last_ingest_at`、`daily_counts`(列表) - [ ] 前端 `monitor.vue` 在现有 4 个 metric-card 上方新增一个"爬虫入库"统计区域: - 3 个平台卡片,各显示:总量、今日、最近抓取时间 - 一个数量趋势表格(近 7 天,按日显示 boss/qcwy/zhilian) - [ ] `web/src/api/index.js` 新增 `getIngestStats` 函数 - [ ] 前端 `pnpm dev`(或工具链验证)可正常加载 - [ ] 全量 pytest 回归 `pipenv run python -m pytest tests/` 无失败 --- ## Wave 2(依赖 Plan 01) ### Task 2.1: 后端新增 GET /job/data/stats 端点 - `app/api/v1/job/job.py`(当前 123 行) - `app/core/clickhouse.py`(获取 client 方式) 在 `job.py` 中追加端点: ```python @router.get("/data/stats", summary="各平台入库统计") async def get_ingest_stats( platform: Optional[PlatformType] = None, days: int = 7, service: IngestService = Depends(get_ingest_service), ) -> Dict[str, Any]: """ 查询各平台 ClickHouse 入库统计:总量、今日、最近入库时间、近 N 天每日趋势 """ from app.core.clickhouse import clickhouse_manager client = await clickhouse_manager.get_client() platforms = [platform.value] if platform else ["boss", "qcwy", "zhilian"] table_map = {"boss": "boss_job", "qcwy": "qcwy_job", "zhilian": "zhilian_job"} result = {} for p in platforms: table = f"job_data.{table_map[p]}" try: # 总量 r_total = await client.query(f"SELECT count() FROM {table}") total = r_total.result_rows[0][0] if r_total.result_rows else 0 # 今日 r_today = await client.query( f"SELECT count() FROM {table} WHERE toDate(created_at) = today()" ) today = r_today.result_rows[0][0] if r_today.result_rows else 0 # 最近入库时间 r_last = await client.query( f"SELECT max(created_at) FROM {table}" ) last_at = str(r_last.result_rows[0][0]) if r_last.result_rows and r_last.result_rows[0][0] else None # 近 N 天每日趋势 r_daily = await client.query( f"SELECT toDate(created_at) AS day, count() AS cnt " f"FROM {table} " f"WHERE created_at >= today() - {days} " f"GROUP BY day ORDER BY day DESC" ) daily_counts = [{"date": str(row[0]), "count": row[1]} for row in r_daily.result_rows] result[p] = { "total": total, "today": today, "last_ingest_at": last_at, "daily_counts": daily_counts, } except Exception as e: result[p] = {"error": str(e), "total": 0, "today": 0, "last_ingest_at": None, "daily_counts": []} return {"code": 200, "data": result} ``` --- ### Task 2.2: 前端新增 getIngestStats API 在 `web/src/api/index.js` 找到已有 API 函数,追加: ```js getIngestStats: (params) => request.get('/job/data/stats', { params }), ``` --- ### Task 2.3: 前端 monitor.vue 添加爬虫统计区域 在 `monitor.vue` 的 `` **之前** 插入一个新 section: ```html {{ p.label }} {{ p.total.toLocaleString() }} 今日 +{{ p.today }} · 最近 {{ p.last_ingest_at || '--' }} 近 7 天入库趋势 ``` 对应 `