win
6c8eb00a50
feat(06): quality & frontend (QUAL-02, QUAL-06)
...
Plan 01 - QUAL-02: 三平台解析函数单元测试:
- tests/ingest/test_configs_boss.py: 10 个测试
(_extract_job_id, _extract_company_name, _build_boss_push)
- tests/ingest/test_configs_qcwy.py: 12 个测试
(_extract_job_id, _extract_update_dt, _extract_company_name, _build_qcwy_push)
- tests/ingest/test_configs_zhilian.py: 12 个测试
(_extract_number, _extract_fpt, _extract_company_name, _build_zhilian_push)
Plan 02 - QUAL-06: 爬虫入库统计 API + 前端监控区域:
- job.py: GET /job/data/stats 端点(总量/今日/最近入库时间/近7天趋势)
- web/src/api/index.js: getIngestStats() 方法
- monitoring.vue: 新增爬虫职位入库统计区域(三平台卡片 + 趋势表格)
- job.py: Optional 导入修复
QUAL-07: 确认 monitor.vue 已有完整清洗队列功能,无需改动
Full regression: 146 passed (112 existing + 34 new)
2026-03-21 22:56:24 +08:00
win
3d202c3486
feat(05): data pipeline optimization (DATA-01, DATA-04)
...
Plan 01 - DATA-01: 30-day window dedup fix:
- dedup.py: both single-field and double-field SQL queries now include
AND created_at > now() - INTERVAL 30 DAY
- tests/ingest/test_dedup.py: 6 mock tests validating 30-day window
Plan 02 - DATA-04: company vs search job channel separation:
- schemas/ingest.py: ChannelType.COMPANY = 'company'
- configs/boss.py: register channel='company' config
- configs/qcwy.py: register channel='company' config
- configs/zhilian.py: register channel='company' config
- company_jobs_sync.py: store_batch(..., 'mini', ...) → (..., 'company', ...)
DATA-02: confirmed already complete (job.py has /data/batch-async endpoint)
DATA-03: confirmed already complete (company_cleaner.py full pipeline)
Full regression: 112 passed (106 existing + 6 new)
2026-03-21 19:50:06 +08:00
win
8c2c2d29d7
feat(03): migrate job51+zhilian to crawler_core (ARCH-04/05)
...
job51 (spiderJobs/platforms/job51/):
- client.py: HTTPClient+Job51Sign from crawler_core
- api.py: ApiResult→Result, self._http→self.http_client, _request() POST overrides
- main.py: BaseFetcher/BaseSearcher from crawler_core
- sign.py: backward-compatible stub re-exporting crawler_core.qcwy.sign.Job51Sign
zhilian (spiderJobs/platforms/zhilian/):
- client.py: HTTPClient+ZhilianSign from crawler_core
- api.py: add _parse_zhilian_response (HTTP 200=success), add _parse()/_request()
to all classes (GET fetchers + POST searcher overrides)
- main.py: BaseFetcher/BaseSearcher from crawler_core
- sign.py: backward-compatible stub re-exporting crawler_core.zhilian.sign.ZhilianSign
tests: 34 new mock tests (17 job51 + 17 zhilian)
Full regression: 98 passed (job51:17 + zhilian:17 + boss:22 + crawler_core:41 + 1)
2026-03-21 19:18:22 +08:00
win
5bd44774b9
test(02-02): add Boss HTTP layer mock tests (QUAL-03)
...
- 22 tests covering SearchRecJobs, GetBrandDetail, SearchBrandJobs, GetJobDetail, BossClient
- Uses MagicMock (requests_go not compatible with respx)
- Covers success responses, HTTP errors, biz errors, Traceid injection
- All 22 tests pass (0.08s)
2026-03-21 19:03:01 +08:00
win
333a6d155e
test(01-02): write sign algorithm unit tests for crawler_core
...
- Add tests/crawler_core/test_boss_sign.py: 13 tests for BossSign, _compute_checksum, _generate_uuid
- Add tests/crawler_core/test_qcwy_sign.py: 10 tests for Job51Sign and SIGN_KEY
- Add tests/crawler_core/test_zhilian_sign.py: 13 tests for ZhilianSign
- Add conftest.py at project root to add project root to sys.path
- Update pyproject.toml with [tool.pytest.ini_options] pythonpath config
- Fix crawler_core/__init__.py: wrap heavy-dep imports in try/except so sign subpackages are importable in lightweight envs without requests_go installed
- Remove tests/crawler_core/__init__.py to prevent namespace shadowing of crawler_core package
2026-03-21 18:20:43 +08:00