docs: gather Phase 1 context (shared core package)
This commit is contained in:
parent
44b5f390aa
commit
81b9305568
100
.planning/phases/01-shared-core/1-CONTEXT.md
Normal file
100
.planning/phases/01-shared-core/1-CONTEXT.md
Normal file
@ -0,0 +1,100 @@
|
||||
# Phase 1: 共享核心包 - Context
|
||||
|
||||
**Gathered:** 2026-03-21
|
||||
**Status:** Ready for planning
|
||||
|
||||
<domain>
|
||||
## Phase Boundary
|
||||
|
||||
提取 crawler_core/ 为独立可安装共享包,包含签名算法、HTTP 客户端封装、基类定义和数据模型。三个平台(Boss/前程无忧/智联)的后续重写全部基于此包展开。本阶段不改造任何平台的具体爬虫实现,旧爬虫保持运行不受影响。
|
||||
|
||||
</domain>
|
||||
|
||||
<decisions>
|
||||
## Implementation Decisions
|
||||
|
||||
### 包结构和安装方式
|
||||
- **D-01:** 包放在项目根目录 `crawler_core/`,与 `app/` 和 `spiderJobs/` 平级
|
||||
- **D-02:** 使用 `pyproject.toml` 管理包元数据,支持 `pip install -e ./crawler_core`
|
||||
- **D-03:** 最小依赖范围 — 只依赖 `requests_go` + Python 标准库,不拉入 FastAPI/Tortoise/loguru
|
||||
- **D-04:** 包名为 `crawler_core`
|
||||
|
||||
### 基类接口设计
|
||||
- **D-05:** 保持两个基类:`BaseFetcher`(单条详情获取)和 `BaseSearcher`(列表搜索)
|
||||
- **D-06:** 模板方法包含 4 个:`_build_params()`(必须)、`_parse_response()`(必须)、`_build_headers()`(可选覆写)、`_check_blocked()`(可选覆写,判断是否被封)
|
||||
- **D-07:** 返回值使用类型化 `Result[T]` dataclass,取代现有的松散 `ApiResult`
|
||||
|
||||
### 核心模块分层
|
||||
- **D-08:** 签名算法保持各平台独立类(BossSign、Job51Sign、ZhilianSign),不强制统一 BaseSigner 接口
|
||||
- **D-09:** HTTP 客户端统一封装为一个 `HTTPClient` 类,内置代理、重试、日志功能
|
||||
- **D-10:** 内部目录按平台分层:`crawler_core/boss/`、`crawler_core/qcwy/`、`crawler_core/zhilian/` + `crawler_core/base/`
|
||||
|
||||
### Claude's Discretion
|
||||
- `pyproject.toml` 的具体配置(build-system、版本号等)
|
||||
- `Result[T]` dataclass 的具体字段设计
|
||||
- HTTPClient 的重试策略参数(次数、间隔等)
|
||||
- 测试文件的组织方式
|
||||
|
||||
</decisions>
|
||||
|
||||
<specifics>
|
||||
## Specific Ideas
|
||||
|
||||
- 现有 `spiderJobs/core/base.py`(154行)比 `app/services/crawler/_base.py`(111行)更完整,以前者为基础提取
|
||||
- 现有 `spiderJobs/core/http_client.py` 已有 `TLS_CHROME_LATEST + random_ja3=True` 反爬配置,直接复用
|
||||
- 签名模块已有 6 个副本(`app/services/crawler/_*_sign.py` + `spiderJobs/platforms/*/sign.py`),内容一致,取一份即可
|
||||
- `spiderJobs/platforms/boss/sign.py` 的注释 "复制自" 已经标明了源和副本的关系
|
||||
|
||||
</specifics>
|
||||
|
||||
<canonical_refs>
|
||||
## Canonical References
|
||||
|
||||
### 架构研究
|
||||
- `.planning/research/ARCHITECTURE.md` — 共享包提取方案、同步/异步边界设计、构建顺序
|
||||
- `.planning/research/STACK.md` — HTTP 客户端选择(requests_go vs httpx)、测试栈推荐
|
||||
|
||||
### 风险规避
|
||||
- `.planning/research/PITFALLS.md` — Pitfall 1(重构期间打断在线爬虫)、Pitfall 2(签名代码复制粘贴)
|
||||
|
||||
### 现有代码
|
||||
- `spiderJobs/core/base.py` — 现有 BaseFetcher/BaseSearcher 实现,提取基础
|
||||
- `spiderJobs/core/http_client.py` — 现有 HTTPClient 实现,TLS 指纹配置
|
||||
- `spiderJobs/platforms/boss/sign.py` — Boss 签名算法参考实现
|
||||
- `spiderJobs/platforms/job51/sign.py` — Job51 签名算法参考实现
|
||||
- `spiderJobs/platforms/zhilian/sign.py` — 智联签名算法参考实现
|
||||
|
||||
</canonical_refs>
|
||||
|
||||
<code_context>
|
||||
## Existing Code Insights
|
||||
|
||||
### Reusable Assets
|
||||
- `spiderJobs/core/base.py` — BaseFetcher/BaseSearcher 模板方法基类,直接迁移到 crawler_core/base/
|
||||
- `spiderJobs/core/http_client.py` — HTTPClient 封装,含 TLS 指纹、代理、重试逻辑
|
||||
- `spiderJobs/platforms/*/sign.py` — 三个平台的签名算法,纯函数,可直接迁移并加测试
|
||||
|
||||
### Established Patterns
|
||||
- 模板方法模式:基类调 `_build_params()` → HTTP 请求 → `_parse_response()`
|
||||
- ApiResult dataclass:`success`, `data`, `error`, `list`, `is_end_page` 字段
|
||||
- 签名类独立实例化,注入到 Client/Fetcher 中
|
||||
|
||||
### Integration Points
|
||||
- Phase 2/3 的平台重写将 `from crawler_core.boss import BossSign, BossClient` 导入
|
||||
- Phase 4 的后端 facade 将通过 `asyncio.to_thread()` 调用 crawler_core 的同步方法
|
||||
- 现有 `app/services/crawler/_*.py` 文件在 Phase 4 之前保持不变,旧代码继续运行
|
||||
|
||||
</code_context>
|
||||
|
||||
<deferred>
|
||||
## Deferred Ideas
|
||||
|
||||
- loguru 是否应该作为 crawler_core 的可选依赖 — Phase 5 统一日志时再决定
|
||||
- 是否需要 BaseSigner 统一接口 — 如果 Phase 2/3 发现需要,再回来加
|
||||
|
||||
</deferred>
|
||||
|
||||
---
|
||||
|
||||
*Phase: 01-shared-core*
|
||||
*Context gathered: 2026-03-21*
|
||||
Loading…
x
Reference in New Issue
Block a user