17 Commits

Author SHA1 Message Date
win
333a6d155e test(01-02): write sign algorithm unit tests for crawler_core
- Add tests/crawler_core/test_boss_sign.py: 13 tests for BossSign, _compute_checksum, _generate_uuid
- Add tests/crawler_core/test_qcwy_sign.py: 10 tests for Job51Sign and SIGN_KEY
- Add tests/crawler_core/test_zhilian_sign.py: 13 tests for ZhilianSign
- Add conftest.py at project root to add project root to sys.path
- Update pyproject.toml with [tool.pytest.ini_options] pythonpath config
- Fix crawler_core/__init__.py: wrap heavy-dep imports in try/except so sign subpackages are importable in lightweight envs without requests_go installed
- Remove tests/crawler_core/__init__.py to prevent namespace shadowing of crawler_core package
2026-03-21 18:20:43 +08:00
win
d7c8bec287 docs(01-01): complete crawler_core package plan — SUMMARY, STATE, ROADMAP updates
- Create 01-01-SUMMARY.md with implementation details and interface contracts
- STATE.md: advance to plan 2, record metrics, add decisions from plan 01
- ROADMAP.md: update phase 1 plan progress (1/2 plans complete)
- REQUIREMENTS.md: mark ARCH-01, ARCH-02, QUAL-04, QUAL-05 complete
- crawler_core/__init__.py: preserve linter-added try/except ImportError guard
2026-03-21 18:14:19 +08:00
win
04d6303da2 feat(01-01): create crawler_core/base.py with Result[T] and crawler_core/__init__.py
- Define generic Result[T] dataclass (7 fields: success, status_code, data, list, count, is_end_page, error)
- Port parse_response() from spiderJobs/core/base.py returning Result[Any]
- BaseFetcher: 4 template methods (_build_params, _parse required; _build_headers, _check_blocked optional)
- BaseSearcher: 4 template methods with load_all() paginator using stdlib logging
- crawler_core/__init__.py exports BaseFetcher, BaseSearcher, Result, HTTPClient, parse_response
- No ApiResult, no loguru, no spiderJobs/app imports
2026-03-21 18:10:40 +08:00
win
ceb359d535 feat(01-01): create crawler_core/http_client.py with tenacity retry and stdlib logging
- Port HTTPClient from spiderJobs/core/http_client.py
- Add tenacity @retry decorator on post() and get() (3 attempts, min=10s wait)
- Use stdlib logging.getLogger('crawler_core.http_client') — no loguru
- No imports from spiderJobs.* or app.*
- TLS fingerprint and proxy logic preserved unchanged
2026-03-21 18:08:59 +08:00
win
bd1e50e410 feat(01-02): port sign algorithms to crawler_core/ platform directories
- Add crawler_core/boss/sign.py: BossSign traceid generator (pure stdlib)
- Add crawler_core/qcwy/sign.py: Job51Sign HMAC-SHA256 signing (pure stdlib)
- Add crawler_core/zhilian/sign.py: ZhilianSign header/param signing (pure stdlib)
- Add __init__.py for all three crawler_core platform directories
- Updated module docstrings to reference crawler_core; all logic unchanged
- No imports from spiderJobs or app; no HTTP dependencies
2026-03-21 18:08:53 +08:00
win
4932177f7c feat(01-01): create crawler_core package scaffold and pyproject.toml
- Create crawler_core/pyproject.toml with setuptools build config
- Add platform namespace __init__.py files for boss, qcwy, zhilian
- Add requests_go==1.0.9 and tenacity>=8.0 to Pipfile [packages]
- Add pytest, pytest-cov, pytest-anyio to Pipfile [dev-packages]
2026-03-21 18:07:54 +08:00
win
fe9a6d1403 docs(phase-1): create plans (2 plans, 2 waves) with checker revision 2026-03-21 17:53:13 +08:00
win
b27686a409 docs(01-shared-core): create phase 1 plans for crawler_core shared package
Plan 01-01 (Wave 1): Package scaffold with HTTPClient + tenacity retry (min=10s)
+ stdlib logging + BaseFetcher/BaseSearcher base classes + pyproject.toml.
Covers ARCH-01, ARCH-02, QUAL-04, QUAL-05.

Plan 01-02 (Wave 2): Sign algorithm migration (Boss/Job51/Zhilian) to
crawler_core/ + comprehensive unit tests — no HTTP, no mocks, pure functions.
Covers QUAL-01. 24+ test cases across 3 test files.

ROADMAP updated: Phase 1 now shows 2 concrete plans instead of TBD.
2026-03-21 17:45:14 +08:00
win
81b9305568 docs: gather Phase 1 context (shared core package) 2026-03-21 17:08:26 +08:00
win
44b5f390aa docs: create roadmap (6 phases) 2026-03-21 17:00:12 +08:00
win
5e9102148a docs: define v1 requirements 2026-03-21 16:39:05 +08:00
win
f3005ef525 docs: add research findings (stack, features, architecture, pitfalls, summary) 2026-03-21 16:36:37 +08:00
win
030da1ce53 chore: add project config 2026-03-21 16:19:53 +08:00
win
9166c4f7bc docs: initialize project 2026-03-21 16:17:12 +08:00
zfc
3d7e96845d up 2026-01-24 17:07:34 +08:00
duxin
7285475eb5 add time.sleep > 10 2026-01-20 15:42:47 +08:00
59bfefff0e feat: 优化公司数据去重逻辑,扩大检查范围到90天 2026-01-14 22:14:33 +08:00