win
|
6c8eb00a50
|
feat(06): quality & frontend (QUAL-02, QUAL-06)
Plan 01 - QUAL-02: 三平台解析函数单元测试:
- tests/ingest/test_configs_boss.py: 10 个测试
(_extract_job_id, _extract_company_name, _build_boss_push)
- tests/ingest/test_configs_qcwy.py: 12 个测试
(_extract_job_id, _extract_update_dt, _extract_company_name, _build_qcwy_push)
- tests/ingest/test_configs_zhilian.py: 12 个测试
(_extract_number, _extract_fpt, _extract_company_name, _build_zhilian_push)
Plan 02 - QUAL-06: 爬虫入库统计 API + 前端监控区域:
- job.py: GET /job/data/stats 端点(总量/今日/最近入库时间/近7天趋势)
- web/src/api/index.js: getIngestStats() 方法
- monitoring.vue: 新增爬虫职位入库统计区域(三平台卡片 + 趋势表格)
- job.py: Optional 导入修复
QUAL-07: 确认 monitor.vue 已有完整清洗队列功能,无需改动
Full regression: 146 passed (112 existing + 34 new)
|
2026-03-21 22:56:24 +08:00 |
|
win
|
3d202c3486
|
feat(05): data pipeline optimization (DATA-01, DATA-04)
Plan 01 - DATA-01: 30-day window dedup fix:
- dedup.py: both single-field and double-field SQL queries now include
AND created_at > now() - INTERVAL 30 DAY
- tests/ingest/test_dedup.py: 6 mock tests validating 30-day window
Plan 02 - DATA-04: company vs search job channel separation:
- schemas/ingest.py: ChannelType.COMPANY = 'company'
- configs/boss.py: register channel='company' config
- configs/qcwy.py: register channel='company' config
- configs/zhilian.py: register channel='company' config
- company_jobs_sync.py: store_batch(..., 'mini', ...) → (..., 'company', ...)
DATA-02: confirmed already complete (job.py has /data/batch-async endpoint)
DATA-03: confirmed already complete (company_cleaner.py full pipeline)
Full regression: 112 passed (106 existing + 6 new)
|
2026-03-21 19:50:06 +08:00 |
|