26 Commits

Author SHA1 Message Date
win
00a727519f docs(phase-3): complete execution — 2/2 plans, 98 tests passing
- ARCH-04: job51 migrated to crawler_core (no old deps)
- ARCH-05: zhilian migrated to crawler_core (no old deps)
- 34 new mock tests (17 job51 + 17 zhilian)
- Added _parse_zhilian_response custom parser for zhilian API format
- Fixed POST Searcher _request() overrides for job51/zhilian
- Full regression: 98 passed in 0.12s
2026-03-21 19:19:17 +08:00
win
8c2c2d29d7 feat(03): migrate job51+zhilian to crawler_core (ARCH-04/05)
job51 (spiderJobs/platforms/job51/):
- client.py: HTTPClient+Job51Sign from crawler_core
- api.py: ApiResult→Result, self._http→self.http_client, _request() POST overrides
- main.py: BaseFetcher/BaseSearcher from crawler_core
- sign.py: backward-compatible stub re-exporting crawler_core.qcwy.sign.Job51Sign

zhilian (spiderJobs/platforms/zhilian/):
- client.py: HTTPClient+ZhilianSign from crawler_core
- api.py: add _parse_zhilian_response (HTTP 200=success), add _parse()/_request()
  to all classes (GET fetchers + POST searcher overrides)
- main.py: BaseFetcher/BaseSearcher from crawler_core
- sign.py: backward-compatible stub re-exporting crawler_core.zhilian.sign.ZhilianSign

tests: 34 new mock tests (17 job51 + 17 zhilian)
Full regression: 98 passed (job51:17 + zhilian:17 + boss:22 + crawler_core:41 + 1)
2026-03-21 19:18:22 +08:00
win
024c2bcd49 docs(phase-3): add research and 2 plans for job51+zhilian migration 2026-03-21 19:10:59 +08:00
win
f6913ffdde docs(phase-2): complete phase execution — 2/2 plans done, verification passed
- ARCH-03: Boss crawler migrated to crawler_core (no inline signatures or HTTP boilerplate)
- QUAL-03: 22 mock tests pass covering all Boss API classes
- Anti-crawl mechanisms preserved (TLS fingerprint, proxy rotation, 10s delay)
- Phase 1 regression: 41 tests still passing
2026-03-21 19:04:55 +08:00
win
5bd44774b9 test(02-02): add Boss HTTP layer mock tests (QUAL-03)
- 22 tests covering SearchRecJobs, GetBrandDetail, SearchBrandJobs, GetJobDetail, BossClient
- Uses MagicMock (requests_go not compatible with respx)
- Covers success responses, HTTP errors, biz errors, Traceid injection
- All 22 tests pass (0.08s)
2026-03-21 19:03:01 +08:00
win
919ed9f799 docs(02-01): add plan execution summary 2026-03-21 19:00:58 +08:00
win
46883cef8a feat(02-01): migrate Boss spider layer from spiderJobs.core to crawler_core
- client.py: inherit crawler_core.http_client.HTTPClient, use crawler_core.boss.sign.BossSign
- api.py: use crawler_core.base.Result/BaseFetcher/BaseSearcher, fix self._http -> self.http_client
- main.py: import BaseFetcher/BaseSearcher and BossSign from crawler_core
- sign.py: replace with backward-compat stub re-exporting BossSign from crawler_core
Satisfies ARCH-03
2026-03-21 19:00:30 +08:00
win
b20f77fa19 docs(phase-2): add research and 2 plans for Boss crawler migration 2026-03-21 18:48:58 +08:00
win
76085ac403 docs(01-02): complete sign algorithms plan — SUMMARY, STATE, ROADMAP updates
- Create 01-02-SUMMARY.md with implementation details, test counts, deviation docs
- STATE.md: mark plan 02 complete, add decisions from plan 02, update position
- ROADMAP.md: mark phase 1 plans 2/2 complete, update progress table
- REQUIREMENTS.md: mark QUAL-01 complete (41 sign algorithm unit tests)
2026-03-21 18:27:35 +08:00
win
333a6d155e test(01-02): write sign algorithm unit tests for crawler_core
- Add tests/crawler_core/test_boss_sign.py: 13 tests for BossSign, _compute_checksum, _generate_uuid
- Add tests/crawler_core/test_qcwy_sign.py: 10 tests for Job51Sign and SIGN_KEY
- Add tests/crawler_core/test_zhilian_sign.py: 13 tests for ZhilianSign
- Add conftest.py at project root to add project root to sys.path
- Update pyproject.toml with [tool.pytest.ini_options] pythonpath config
- Fix crawler_core/__init__.py: wrap heavy-dep imports in try/except so sign subpackages are importable in lightweight envs without requests_go installed
- Remove tests/crawler_core/__init__.py to prevent namespace shadowing of crawler_core package
2026-03-21 18:20:43 +08:00
win
d7c8bec287 docs(01-01): complete crawler_core package plan — SUMMARY, STATE, ROADMAP updates
- Create 01-01-SUMMARY.md with implementation details and interface contracts
- STATE.md: advance to plan 2, record metrics, add decisions from plan 01
- ROADMAP.md: update phase 1 plan progress (1/2 plans complete)
- REQUIREMENTS.md: mark ARCH-01, ARCH-02, QUAL-04, QUAL-05 complete
- crawler_core/__init__.py: preserve linter-added try/except ImportError guard
2026-03-21 18:14:19 +08:00
win
04d6303da2 feat(01-01): create crawler_core/base.py with Result[T] and crawler_core/__init__.py
- Define generic Result[T] dataclass (7 fields: success, status_code, data, list, count, is_end_page, error)
- Port parse_response() from spiderJobs/core/base.py returning Result[Any]
- BaseFetcher: 4 template methods (_build_params, _parse required; _build_headers, _check_blocked optional)
- BaseSearcher: 4 template methods with load_all() paginator using stdlib logging
- crawler_core/__init__.py exports BaseFetcher, BaseSearcher, Result, HTTPClient, parse_response
- No ApiResult, no loguru, no spiderJobs/app imports
2026-03-21 18:10:40 +08:00
win
ceb359d535 feat(01-01): create crawler_core/http_client.py with tenacity retry and stdlib logging
- Port HTTPClient from spiderJobs/core/http_client.py
- Add tenacity @retry decorator on post() and get() (3 attempts, min=10s wait)
- Use stdlib logging.getLogger('crawler_core.http_client') — no loguru
- No imports from spiderJobs.* or app.*
- TLS fingerprint and proxy logic preserved unchanged
2026-03-21 18:08:59 +08:00
win
bd1e50e410 feat(01-02): port sign algorithms to crawler_core/ platform directories
- Add crawler_core/boss/sign.py: BossSign traceid generator (pure stdlib)
- Add crawler_core/qcwy/sign.py: Job51Sign HMAC-SHA256 signing (pure stdlib)
- Add crawler_core/zhilian/sign.py: ZhilianSign header/param signing (pure stdlib)
- Add __init__.py for all three crawler_core platform directories
- Updated module docstrings to reference crawler_core; all logic unchanged
- No imports from spiderJobs or app; no HTTP dependencies
2026-03-21 18:08:53 +08:00
win
4932177f7c feat(01-01): create crawler_core package scaffold and pyproject.toml
- Create crawler_core/pyproject.toml with setuptools build config
- Add platform namespace __init__.py files for boss, qcwy, zhilian
- Add requests_go==1.0.9 and tenacity>=8.0 to Pipfile [packages]
- Add pytest, pytest-cov, pytest-anyio to Pipfile [dev-packages]
2026-03-21 18:07:54 +08:00
win
fe9a6d1403 docs(phase-1): create plans (2 plans, 2 waves) with checker revision 2026-03-21 17:53:13 +08:00
win
b27686a409 docs(01-shared-core): create phase 1 plans for crawler_core shared package
Plan 01-01 (Wave 1): Package scaffold with HTTPClient + tenacity retry (min=10s)
+ stdlib logging + BaseFetcher/BaseSearcher base classes + pyproject.toml.
Covers ARCH-01, ARCH-02, QUAL-04, QUAL-05.

Plan 01-02 (Wave 2): Sign algorithm migration (Boss/Job51/Zhilian) to
crawler_core/ + comprehensive unit tests — no HTTP, no mocks, pure functions.
Covers QUAL-01. 24+ test cases across 3 test files.

ROADMAP updated: Phase 1 now shows 2 concrete plans instead of TBD.
2026-03-21 17:45:14 +08:00
win
81b9305568 docs: gather Phase 1 context (shared core package) 2026-03-21 17:08:26 +08:00
win
44b5f390aa docs: create roadmap (6 phases) 2026-03-21 17:00:12 +08:00
win
5e9102148a docs: define v1 requirements 2026-03-21 16:39:05 +08:00
win
f3005ef525 docs: add research findings (stack, features, architecture, pitfalls, summary) 2026-03-21 16:36:37 +08:00
win
030da1ce53 chore: add project config 2026-03-21 16:19:53 +08:00
win
9166c4f7bc docs: initialize project 2026-03-21 16:17:12 +08:00
zfc
3d7e96845d up 2026-01-24 17:07:34 +08:00
duxin
7285475eb5 add time.sleep > 10 2026-01-20 15:42:47 +08:00
59bfefff0e feat: 优化公司数据去重逻辑,扩大检查范围到90天 2026-01-14 22:14:33 +08:00