win
04d6303da2
feat(01-01): create crawler_core/base.py with Result[T] and crawler_core/__init__.py
...
- Define generic Result[T] dataclass (7 fields: success, status_code, data, list, count, is_end_page, error)
- Port parse_response() from spiderJobs/core/base.py returning Result[Any]
- BaseFetcher: 4 template methods (_build_params, _parse required; _build_headers, _check_blocked optional)
- BaseSearcher: 4 template methods with load_all() paginator using stdlib logging
- crawler_core/__init__.py exports BaseFetcher, BaseSearcher, Result, HTTPClient, parse_response
- No ApiResult, no loguru, no spiderJobs/app imports
2026-03-21 18:10:40 +08:00
win
ceb359d535
feat(01-01): create crawler_core/http_client.py with tenacity retry and stdlib logging
...
- Port HTTPClient from spiderJobs/core/http_client.py
- Add tenacity @retry decorator on post() and get() (3 attempts, min=10s wait)
- Use stdlib logging.getLogger('crawler_core.http_client') — no loguru
- No imports from spiderJobs.* or app.*
- TLS fingerprint and proxy logic preserved unchanged
2026-03-21 18:08:59 +08:00
win
bd1e50e410
feat(01-02): port sign algorithms to crawler_core/ platform directories
...
- Add crawler_core/boss/sign.py: BossSign traceid generator (pure stdlib)
- Add crawler_core/qcwy/sign.py: Job51Sign HMAC-SHA256 signing (pure stdlib)
- Add crawler_core/zhilian/sign.py: ZhilianSign header/param signing (pure stdlib)
- Add __init__.py for all three crawler_core platform directories
- Updated module docstrings to reference crawler_core; all logic unchanged
- No imports from spiderJobs or app; no HTTP dependencies
2026-03-21 18:08:53 +08:00
win
4932177f7c
feat(01-01): create crawler_core package scaffold and pyproject.toml
...
- Create crawler_core/pyproject.toml with setuptools build config
- Add platform namespace __init__.py files for boss, qcwy, zhilian
- Add requests_go==1.0.9 and tenacity>=8.0 to Pipfile [packages]
- Add pytest, pytest-cov, pytest-anyio to Pipfile [dev-packages]
2026-03-21 18:07:54 +08:00
win
fe9a6d1403
docs(phase-1): create plans (2 plans, 2 waves) with checker revision
2026-03-21 17:53:13 +08:00
win
b27686a409
docs(01-shared-core): create phase 1 plans for crawler_core shared package
...
Plan 01-01 (Wave 1): Package scaffold with HTTPClient + tenacity retry (min=10s)
+ stdlib logging + BaseFetcher/BaseSearcher base classes + pyproject.toml.
Covers ARCH-01, ARCH-02, QUAL-04, QUAL-05.
Plan 01-02 (Wave 2): Sign algorithm migration (Boss/Job51/Zhilian) to
crawler_core/ + comprehensive unit tests — no HTTP, no mocks, pure functions.
Covers QUAL-01. 24+ test cases across 3 test files.
ROADMAP updated: Phase 1 now shows 2 concrete plans instead of TBD.
2026-03-21 17:45:14 +08:00
win
81b9305568
docs: gather Phase 1 context (shared core package)
2026-03-21 17:08:26 +08:00
win
44b5f390aa
docs: create roadmap (6 phases)
2026-03-21 17:00:12 +08:00
win
5e9102148a
docs: define v1 requirements
2026-03-21 16:39:05 +08:00
win
f3005ef525
docs: add research findings (stack, features, architecture, pitfalls, summary)
2026-03-21 16:36:37 +08:00
win
030da1ce53
chore: add project config
2026-03-21 16:19:53 +08:00
win
9166c4f7bc
docs: initialize project
2026-03-21 16:17:12 +08:00
zfc
3d7e96845d
up
2026-01-24 17:07:34 +08:00
duxin
7285475eb5
add time.sleep > 10
2026-01-20 15:42:47 +08:00
59bfefff0e
feat: 优化公司数据去重逻辑,扩大检查范围到90天
2026-01-14 22:14:33 +08:00