docs(phase-6): add research and 2 plans for quality and frontend
This commit is contained in:
parent
6f9d4df3e2
commit
c58c7ee5c2
92
.planning/phases/06-quality-frontend/06-01-PLAN.md
Normal file
92
.planning/phases/06-quality-frontend/06-01-PLAN.md
Normal file
@ -0,0 +1,92 @@
|
||||
---
|
||||
phase: 6
|
||||
plan: 1
|
||||
wave: 1
|
||||
title: "三平台数据解析函数单元测试(QUAL-02)"
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- tests/ingest/test_configs_boss.py # NEW
|
||||
- tests/ingest/test_configs_qcwy.py # NEW
|
||||
- tests/ingest/test_configs_zhilian.py # NEW
|
||||
autonomous: true
|
||||
requirements:
|
||||
- QUAL-02
|
||||
---
|
||||
|
||||
# Phase 6 Plan 01: 三平台解析函数单元测试(QUAL-02)
|
||||
|
||||
## Objective
|
||||
|
||||
为 `app/services/ingest/configs/` 中的三平台 `_extract_*` 和 `_build_*_push` 函数
|
||||
新增单元测试,覆盖正常字段和缺字段场景。
|
||||
|
||||
去重逻辑测试(dedup.py)已在 Phase 5 完成(6 个测试),本 Plan 仅补充解析函数测试。
|
||||
|
||||
## Must Haves
|
||||
|
||||
- [ ] `tests/ingest/test_configs_boss.py`:8 个测试,覆盖 `_extract_job_id`、`_extract_company_name`、`_build_boss_push`
|
||||
- [ ] `tests/ingest/test_configs_qcwy.py`:10 个测试,覆盖 `_extract_job_id`、`_extract_update_dt`、`_extract_company_name`、`_build_qcwy_push`(含 welfare 列表场景)
|
||||
- [ ] `tests/ingest/test_configs_zhilian.py`:9 个测试,覆盖 `_extract_number`、`_extract_fpt`、`_extract_company_name`、`_build_zhilian_push`
|
||||
- [ ] `pipenv run python -m pytest tests/ingest/ -v --tb=short` 全部绿色(含原有 dedup 6 个)
|
||||
- [ ] `pipenv run python -m pytest tests/ -v` 全量通过
|
||||
|
||||
---
|
||||
|
||||
## Wave 1
|
||||
|
||||
### Task 1.1: tests/ingest/test_configs_boss.py
|
||||
|
||||
**测试清单:**
|
||||
1. `test_extract_job_id_from_jobBaseInfoVO` — 正常嵌套字段
|
||||
2. `test_extract_job_id_missing` — 缺 jobBaseInfoVO → None
|
||||
3. `test_extract_company_name_from_name` — data["name"] 直接取
|
||||
4. `test_extract_company_name_from_companyFullInfoVO` — 嵌套字段
|
||||
5. `test_extract_company_name_missing` → None
|
||||
6. `test_build_boss_push_full` — 完整字段,验证 source_type="Boss直聘"、url 含 encryptJobId
|
||||
7. `test_build_boss_push_partial` — 缺字段不 raise,返回合理降级值
|
||||
8. `test_build_boss_push_none_data` — 空 dict,关键字段为 None
|
||||
|
||||
---
|
||||
|
||||
### Task 1.2: tests/ingest/test_configs_qcwy.py
|
||||
|
||||
**测试清单:**
|
||||
1. `test_extract_job_id_normal`
|
||||
2. `test_extract_job_id_missing` → None
|
||||
3. `test_extract_update_dt_normal`
|
||||
4. `test_extract_update_dt_missing` → None
|
||||
5. `test_extract_company_name_from_companyName`
|
||||
6. `test_extract_company_name_from_company_name_fallback`
|
||||
7. `test_extract_company_name_missing` → None
|
||||
8. `test_build_qcwy_push_welfare_list` — welfare 为对象列表,提取 chineseTitle
|
||||
9. `test_build_qcwy_push_welfare_string` — welfare 为字符串
|
||||
10. `test_build_qcwy_push_partial` — 缺字段 → 合理降级,source_type="前程无忧"
|
||||
|
||||
---
|
||||
|
||||
### Task 1.3: tests/ingest/test_configs_zhilian.py
|
||||
|
||||
**测试清单:**
|
||||
1. `test_extract_number_normal`
|
||||
2. `test_extract_number_missing` → None
|
||||
3. `test_extract_fpt_normal`
|
||||
4. `test_extract_fpt_missing` → None
|
||||
5. `test_extract_company_name_from_companyName`
|
||||
6. `test_extract_company_name_from_name_fallback`
|
||||
7. `test_extract_company_name_missing` → None
|
||||
8. `test_build_zhilian_push_skill_labels` — skillLabel 列表,提取 value
|
||||
9. `test_build_zhilian_push_partial` — 缺字段降级,source_type="智联招聘"
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
# 运行新测试
|
||||
pipenv run python -m pytest tests/ingest/ -v --tb=short
|
||||
|
||||
# 全量回归
|
||||
pipenv run python -m pytest tests/ -v --tb=short
|
||||
```
|
||||
|
||||
**预期:** 所有 27-30 个测试通过,全量 ≥ 130 个测试全绿
|
||||
191
.planning/phases/06-quality-frontend/06-02-PLAN.md
Normal file
191
.planning/phases/06-quality-frontend/06-02-PLAN.md
Normal file
@ -0,0 +1,191 @@
|
||||
---
|
||||
phase: 6
|
||||
plan: 2
|
||||
wave: 2
|
||||
title: "爬虫入库统计 API + 前端监控区域(QUAL-06/07)"
|
||||
depends_on:
|
||||
- "06-01-PLAN.md"
|
||||
files_modified:
|
||||
- app/api/v1/job/job.py # 新增 GET /data/stats 端点
|
||||
- web/src/views/cleaning/monitor.vue # 新增爬虫统计区域
|
||||
- web/src/api/index.js # 新增 getIngestStats API
|
||||
autonomous: true
|
||||
requirements:
|
||||
- QUAL-06
|
||||
- QUAL-07
|
||||
---
|
||||
|
||||
# Phase 6 Plan 02: 爬虫入库统计 API + 前端监控(QUAL-06/07)
|
||||
|
||||
## Objective
|
||||
|
||||
### QUAL-07 状态确认(已完成)
|
||||
|
||||
`cleaning/monitor.vue` 已包含:
|
||||
- ✅ 待清洗公司列表(队列表格)
|
||||
- ✅ 触发清洗
|
||||
- ✅ 查看结果
|
||||
|
||||
**QUAL-07 无需额外改动。**
|
||||
|
||||
### QUAL-06 缺口
|
||||
|
||||
现有监控页面仅展示公司清洗队列状态,**缺少爬虫职位入库的实时统计**:
|
||||
- 各平台最近抓取时间(ClickHouse `created_at` 最大值)
|
||||
- 数量趋势(近 7 天每日入库量)
|
||||
- 错误状态(失败/去重统计暂不通过 ClickHouse,后续可扩展)
|
||||
|
||||
## Must Haves
|
||||
|
||||
- [ ] 后端新增 `GET /api/v1/job/data/stats` 端点,接受 `platform`(可选)和 `days`(默认 7)参数
|
||||
- 返回:各平台 `total`、`today`、`last_ingest_at`、`daily_counts`(列表)
|
||||
- [ ] 前端 `monitor.vue` 在现有 4 个 metric-card 上方新增一个"爬虫入库"统计区域:
|
||||
- 3 个平台卡片,各显示:总量、今日、最近抓取时间
|
||||
- 一个数量趋势表格(近 7 天,按日显示 boss/qcwy/zhilian)
|
||||
- [ ] `web/src/api/index.js` 新增 `getIngestStats` 函数
|
||||
- [ ] 前端 `pnpm dev`(或工具链验证)可正常加载
|
||||
- [ ] 全量 pytest 回归 `pipenv run python -m pytest tests/` 无失败
|
||||
|
||||
---
|
||||
|
||||
## Wave 2(依赖 Plan 01)
|
||||
|
||||
### Task 2.1: 后端新增 GET /job/data/stats 端点
|
||||
|
||||
<read_first>
|
||||
- `app/api/v1/job/job.py`(当前 123 行)
|
||||
- `app/core/clickhouse.py`(获取 client 方式)
|
||||
</read_first>
|
||||
|
||||
<action>
|
||||
在 `job.py` 中追加端点:
|
||||
|
||||
```python
|
||||
@router.get("/data/stats", summary="各平台入库统计")
|
||||
async def get_ingest_stats(
|
||||
platform: Optional[PlatformType] = None,
|
||||
days: int = 7,
|
||||
service: IngestService = Depends(get_ingest_service),
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
查询各平台 ClickHouse 入库统计:总量、今日、最近入库时间、近 N 天每日趋势
|
||||
"""
|
||||
from app.core.clickhouse import clickhouse_manager
|
||||
client = await clickhouse_manager.get_client()
|
||||
|
||||
platforms = [platform.value] if platform else ["boss", "qcwy", "zhilian"]
|
||||
table_map = {"boss": "boss_job", "qcwy": "qcwy_job", "zhilian": "zhilian_job"}
|
||||
|
||||
result = {}
|
||||
for p in platforms:
|
||||
table = f"job_data.{table_map[p]}"
|
||||
try:
|
||||
# 总量
|
||||
r_total = await client.query(f"SELECT count() FROM {table}")
|
||||
total = r_total.result_rows[0][0] if r_total.result_rows else 0
|
||||
|
||||
# 今日
|
||||
r_today = await client.query(
|
||||
f"SELECT count() FROM {table} WHERE toDate(created_at) = today()"
|
||||
)
|
||||
today = r_today.result_rows[0][0] if r_today.result_rows else 0
|
||||
|
||||
# 最近入库时间
|
||||
r_last = await client.query(
|
||||
f"SELECT max(created_at) FROM {table}"
|
||||
)
|
||||
last_at = str(r_last.result_rows[0][0]) if r_last.result_rows and r_last.result_rows[0][0] else None
|
||||
|
||||
# 近 N 天每日趋势
|
||||
r_daily = await client.query(
|
||||
f"SELECT toDate(created_at) AS day, count() AS cnt "
|
||||
f"FROM {table} "
|
||||
f"WHERE created_at >= today() - {days} "
|
||||
f"GROUP BY day ORDER BY day DESC"
|
||||
)
|
||||
daily_counts = [{"date": str(row[0]), "count": row[1]} for row in r_daily.result_rows]
|
||||
|
||||
result[p] = {
|
||||
"total": total,
|
||||
"today": today,
|
||||
"last_ingest_at": last_at,
|
||||
"daily_counts": daily_counts,
|
||||
}
|
||||
except Exception as e:
|
||||
result[p] = {"error": str(e), "total": 0, "today": 0, "last_ingest_at": None, "daily_counts": []}
|
||||
|
||||
return {"code": 200, "data": result}
|
||||
```
|
||||
</action>
|
||||
|
||||
---
|
||||
|
||||
### Task 2.2: 前端新增 getIngestStats API
|
||||
|
||||
<action>
|
||||
在 `web/src/api/index.js` 找到已有 API 函数,追加:
|
||||
|
||||
```js
|
||||
getIngestStats: (params) => request.get('/job/data/stats', { params }),
|
||||
```
|
||||
</action>
|
||||
|
||||
---
|
||||
|
||||
### Task 2.3: 前端 monitor.vue 添加爬虫统计区域
|
||||
|
||||
<action>
|
||||
在 `monitor.vue` 的 `<section class="metric-grid">` **之前** 插入一个新 section:
|
||||
|
||||
```html
|
||||
<!-- 爬虫入库统计 -->
|
||||
<section class="ingest-grid">
|
||||
<n-card
|
||||
v-for="p in ingestStats"
|
||||
:key="p.platform"
|
||||
:bordered="false"
|
||||
class="ingest-card"
|
||||
>
|
||||
<div class="ingest-platform-label">{{ p.label }}</div>
|
||||
<div class="ingest-total">{{ p.total.toLocaleString() }}</div>
|
||||
<div class="ingest-meta">
|
||||
今日 +{{ p.today }} · 最近 {{ p.last_ingest_at || '--' }}
|
||||
</div>
|
||||
</n-card>
|
||||
<n-card :bordered="false" class="ingest-trend-card">
|
||||
<div class="ingest-trend-title">近 7 天入库趋势</div>
|
||||
<n-data-table
|
||||
size="small"
|
||||
:columns="trendColumns"
|
||||
:data="trendRows"
|
||||
:pagination="false"
|
||||
/>
|
||||
</n-card>
|
||||
</section>
|
||||
```
|
||||
|
||||
对应 `<script setup>` 中加入:
|
||||
- `const ingestStatsRaw = ref({})`
|
||||
- `const fetchIngestStats = async ()` → 调用 `api.getIngestStats()`
|
||||
- `const ingestStats = computed(...)` → 格式化三平台卡片数据
|
||||
- `const trendRows = computed(...)` → 转置为按日期行,boss/qcwy/zhilian 各列
|
||||
- `const trendColumns` → 日期 + 三平台列
|
||||
- 在 `refreshAll()` 中加入 `fetchIngestStats()`
|
||||
</action>
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
# 后端
|
||||
pipenv run python -m pytest tests/ -v --tb=short
|
||||
|
||||
# 前端(确认 pnpm 就绪,验证构建无报错)
|
||||
cd web && pnpm install 2>&1 | tail -3
|
||||
```
|
||||
|
||||
手动验证:
|
||||
1. `pnpm dev` 启动前端
|
||||
2. 访问「清洗监控」页面,确认顶部出现三个平台入库卡片和近 7 天趋势表格
|
||||
3. 数据加载无报错,最近入库时间正确显示格式化后的时间
|
||||
102
.planning/phases/06-quality-frontend/06-RESEARCH.md
Normal file
102
.planning/phases/06-quality-frontend/06-RESEARCH.md
Normal file
@ -0,0 +1,102 @@
|
||||
# Phase 6: 质量 & 前端 — 技术研究
|
||||
|
||||
**研究日期:** 2026-03-21
|
||||
**阶段目标:** 数据解析测试覆盖(QUAL-02)+ 前端监控和清洗页面优化(QUAL-06/07)
|
||||
|
||||
---
|
||||
|
||||
## 1. QUAL-02:数据解析和去重逻辑单元测试
|
||||
|
||||
### 现状
|
||||
|
||||
| 测试模块 | 文件 | 状态 |
|
||||
|---------|------|------|
|
||||
| dedup 30天窗口 | `tests/ingest/test_dedup.py` | ✅ 已有(Phase 5 新增 6 个) |
|
||||
| boss 解析函数 | 无 | ❌ 缺失 |
|
||||
| qcwy 解析函数 | 无 | ❌ 缺失 |
|
||||
| zhilian 解析函数 | 无 | ❌ 缺失 |
|
||||
|
||||
### 需要测试的函数(`app/services/ingest/configs/`)
|
||||
|
||||
**boss.py:**
|
||||
- `_extract_job_id(data)` → `data["jobBaseInfoVO"]["jobId"]`
|
||||
- `_extract_company_name(data)` → `data["name"]` 或 `data["companyFullInfoVO"]["name"]`
|
||||
- `_build_boss_push(data)` → 完整 push dict
|
||||
|
||||
**qcwy.py:**
|
||||
- `_extract_job_id(data)` → `data["jobId"]`
|
||||
- `_extract_update_dt(data)` → `data["updateDateTime"]`
|
||||
- `_extract_company_name(data)` → `data["companyName"]` 或 `data["company_name"]`
|
||||
- `_build_qcwy_push(data)` → 完整 push dict(包含 welfare 列表处理)
|
||||
|
||||
**zhilian.py:**
|
||||
- `_extract_number(data)` → `data["number"]`
|
||||
- `_extract_fpt(data)` → `data["firstPublishTime"]`
|
||||
- `_extract_company_name(data)` → `data["companyName"]` 或 `data["name"]`
|
||||
- `_build_zhilian_push(data)` → 完整 push dict
|
||||
|
||||
### 测试策略
|
||||
- 每个函数:正常字段场景 + 缺字段场景(返回 None)
|
||||
- `_build_*_push`:关键字段映射 + None 值降级
|
||||
- 总计:约 24-30 个测试
|
||||
|
||||
---
|
||||
|
||||
## 2. QUAL-06:前端爬虫监控页面
|
||||
|
||||
### 现状
|
||||
|
||||
现有页面(`web/src/views/cleaning/monitor.vue`)显示的是 **公司清洗监控**(MySQL 队列状态),
|
||||
**不是**爬虫职位抓取监控。
|
||||
|
||||
**成功标准要求(缺口):**
|
||||
- ❌ 各平台最近抓取时间
|
||||
- ❌ 数量趋势(历史入库量趋势图)
|
||||
- ❌ 错误状态
|
||||
|
||||
### 后端 ClickHouse 可查询的数据
|
||||
- `job_data.boss_job / qcwy_job / zhilian_job` 表有 `created_at` 字段
|
||||
- 可以按天统计最近 7 天每个平台的入库数量
|
||||
- 最近一条记录的 `created_at` = 最近抓取时间
|
||||
|
||||
### 方案
|
||||
在 `recruitment/` 模块中已有三平台数据查看页面(boss/qcwy/zhilian index.vue),
|
||||
可以在现有 `recruitment/components/PlatformData.vue` 顶部增加统计卡片:
|
||||
- 总数量、今日入库量、最近抓取时间
|
||||
|
||||
或者新建一个 **爬虫数据统计 API**(后端)+ 在 `monitor.vue` 中加一个爬虫入库统计区域。
|
||||
|
||||
**决策(较小改动):**
|
||||
1. 后端新增 `GET /job/data/stats?platform=boss&days=7` 端点,返回:
|
||||
- `total`: 总量
|
||||
- `today`: 今日新增
|
||||
- `last_ingest_at`: 最近入库时间
|
||||
- `daily_counts`: 近 7 天每日入库量(用于趋势展示)
|
||||
2. 在 `monitor.vue` 顶部(现有 metric-grid 之前)新增爬虫入库数量卡片
|
||||
|
||||
---
|
||||
|
||||
## 3. QUAL-07:前端数据清洗管理页面
|
||||
|
||||
### 现状
|
||||
|
||||
`cleaning/monitor.vue`(1191行)已包含:
|
||||
- ✅ 待清洗公司列表(队列表格)
|
||||
- ✅ 触发清洗(按来源批量执行)
|
||||
- ✅ 查看结果(JSON 模态窗口)
|
||||
|
||||
**现有缺口(成功标准 4 实际满足):**
|
||||
- ✅ 查看待清洗公司列表 → 已有
|
||||
- ✅ 触发清洗 → 已有
|
||||
- ✅ 查看结果 → 已有
|
||||
|
||||
**结论:** QUAL-07 基本满足,主要补充文档/说明,或轻微 UI 优化。
|
||||
|
||||
---
|
||||
|
||||
## 4. 计划分解
|
||||
|
||||
| 计划 | 内容 | 要求 |
|
||||
|------|------|------|
|
||||
| **Plan 01** | 三平台 _extract_*/_build_*_push 函数单元测试(约 25 个) | QUAL-02 |
|
||||
| **Plan 02** | 后端新增爬虫统计 API + 前端 monitor.vue 添加爬虫入库统计区域 | QUAL-06/07 |
|
||||
Loading…
x
Reference in New Issue
Block a user