win
3d202c3486
feat(05): data pipeline optimization (DATA-01, DATA-04)
...
Plan 01 - DATA-01: 30-day window dedup fix:
- dedup.py: both single-field and double-field SQL queries now include
AND created_at > now() - INTERVAL 30 DAY
- tests/ingest/test_dedup.py: 6 mock tests validating 30-day window
Plan 02 - DATA-04: company vs search job channel separation:
- schemas/ingest.py: ChannelType.COMPANY = 'company'
- configs/boss.py: register channel='company' config
- configs/qcwy.py: register channel='company' config
- configs/zhilian.py: register channel='company' config
- company_jobs_sync.py: store_batch(..., 'mini', ...) → (..., 'company', ...)
DATA-02: confirmed already complete (job.py has /data/batch-async endpoint)
DATA-03: confirmed already complete (company_cleaner.py full pipeline)
Full regression: 112 passed (106 existing + 6 new)
2026-03-21 19:50:06 +08:00
win
8c2c2d29d7
feat(03): migrate job51+zhilian to crawler_core (ARCH-04/05)
...
job51 (spiderJobs/platforms/job51/):
- client.py: HTTPClient+Job51Sign from crawler_core
- api.py: ApiResult→Result, self._http→self.http_client, _request() POST overrides
- main.py: BaseFetcher/BaseSearcher from crawler_core
- sign.py: backward-compatible stub re-exporting crawler_core.qcwy.sign.Job51Sign
zhilian (spiderJobs/platforms/zhilian/):
- client.py: HTTPClient+ZhilianSign from crawler_core
- api.py: add _parse_zhilian_response (HTTP 200=success), add _parse()/_request()
to all classes (GET fetchers + POST searcher overrides)
- main.py: BaseFetcher/BaseSearcher from crawler_core
- sign.py: backward-compatible stub re-exporting crawler_core.zhilian.sign.ZhilianSign
tests: 34 new mock tests (17 job51 + 17 zhilian)
Full regression: 98 passed (job51:17 + zhilian:17 + boss:22 + crawler_core:41 + 1)
2026-03-21 19:18:22 +08:00
win
5bd44774b9
test(02-02): add Boss HTTP layer mock tests (QUAL-03)
...
- 22 tests covering SearchRecJobs, GetBrandDetail, SearchBrandJobs, GetJobDetail, BossClient
- Uses MagicMock (requests_go not compatible with respx)
- Covers success responses, HTTP errors, biz errors, Traceid injection
- All 22 tests pass (0.08s)
2026-03-21 19:03:01 +08:00
win
333a6d155e
test(01-02): write sign algorithm unit tests for crawler_core
...
- Add tests/crawler_core/test_boss_sign.py: 13 tests for BossSign, _compute_checksum, _generate_uuid
- Add tests/crawler_core/test_qcwy_sign.py: 10 tests for Job51Sign and SIGN_KEY
- Add tests/crawler_core/test_zhilian_sign.py: 13 tests for ZhilianSign
- Add conftest.py at project root to add project root to sys.path
- Update pyproject.toml with [tool.pytest.ini_options] pythonpath config
- Fix crawler_core/__init__.py: wrap heavy-dep imports in try/except so sign subpackages are importable in lightweight envs without requests_go installed
- Remove tests/crawler_core/__init__.py to prevent namespace shadowing of crawler_core package
2026-03-21 18:20:43 +08:00