docs(phase-3): add research and 2 plans for job51+zhilian migration

This commit is contained in:
win 2026-03-21 19:10:59 +08:00
parent f6913ffdde
commit 024c2bcd49
3 changed files with 719 additions and 0 deletions

View File

@ -0,0 +1,243 @@
---
phase: 3
plan: 1
wave: 1
title: "迁移前程无忧job51层至 crawler_core + mock 测试"
depends_on: []
files_modified:
- spiderJobs/platforms/job51/client.py
- spiderJobs/platforms/job51/api.py
- spiderJobs/platforms/job51/main.py
- spiderJobs/platforms/job51/sign.py
- tests/job51/__init__.py
- tests/job51/test_job51_client.py
autonomous: true
requirements:
- ARCH-04
---
# Phase 3 Plan 01: 迁移前程无忧job51层至 crawler_core + mock 测试
## Objective
`spiderJobs/platforms/job51/` 从依赖 `spiderJobs.core`(旧基类)改为依赖 `crawler_core`(新基类),
同时新增 `tests/job51/test_job51_client.py` mock 测试,满足 ARCH-04。
迁移与 Phase 2 Boss 完全对称4 个文件修改 + sign.py 改桩 + 新增测试。
## Must Haves
- [ ] `client.py` 继承 `crawler_core.http_client.HTTPClient`,使用 `crawler_core.qcwy.sign.Job51Sign`
- [ ] `api.py` 使用 `crawler_core.base.Result/BaseFetcher/BaseSearcher``ApiResult` 全量替换为 `Result`
- [ ] `api.py` 中 2 处 `self._http` 替换为 `self.http_client`
- [ ] `main.py` import 更新为 `crawler_core.base`
- [ ] `sign.py` 改为向后兼容桩(重新导出 `crawler_core.qcwy.sign.Job51Sign`
- [ ] `python -c "from spiderJobs.platforms.job51.api import ...; from spiderJobs.platforms.job51.client import ..."` 无 ImportError
- [ ] `grep -rn "spiderJobs.core" spiderJobs/platforms/job51/{client,api,main,sign}.py` 无输出
- [ ] `tests/job51/__init__.py` 存在
- [ ] `tests/job51/test_job51_client.py` 存在
- [ ] `pytest tests/job51/ -v` 全部通过(>= 15 个测试)
---
## Wave 1
### Task 1.1: 更新 client.py
<read_first>
- `spiderJobs/platforms/job51/client.py`(当前内容)
- `crawler_core/http_client.py`(目标基类)
- `crawler_core/qcwy/sign.py`Job51Sign 新来源)
</read_first>
<action>
修改 `spiderJobs/platforms/job51/client.py`
1. 第 15 行改为:
```python
from crawler_core.http_client import HTTPClient
```
2. 第 16 行改为:
```python
from crawler_core.qcwy.sign import Job51Sign
```
(删除 `from spiderJobs.core.http_client import HTTPClient``from spiderJobs.platforms.job51.sign import Job51Sign`
其他所有内容JOB51_HEADERS、Job51Client 类体、create_client 函数)不变。
</action>
<acceptance_criteria>
- `grep "from crawler_core.http_client import HTTPClient" spiderJobs/platforms/job51/client.py` 有输出
- `grep "from crawler_core.qcwy.sign import Job51Sign" spiderJobs/platforms/job51/client.py` 有输出
- `grep "spiderJobs.core" spiderJobs/platforms/job51/client.py` 无输出
- `python -c "from spiderJobs.platforms.job51.client import Job51Client, create_client"` 无 ImportError
</acceptance_criteria>
---
### Task 1.2: 更新 api.py
<read_first>
- `spiderJobs/platforms/job51/api.py`(当前完整内容,共 ~260 行)
- `crawler_core/base.py`Result、BaseFetcher、BaseSearcher 接口)
</read_first>
<action>
修改 `spiderJobs/platforms/job51/api.py`
1. 第 14 行改为:
```python
from crawler_core.base import BaseFetcher, BaseSearcher, Result
```
(删除 `from spiderJobs.core.base import ApiResult, BaseFetcher, BaseSearcher`
2. 全文将 `ApiResult` 替换为 `Result`(共 11 处,包含函数注解和 return 语句)
3. 第 164 行:`http_code, data = self._http.get(endpoint)``http_code, data = self.http_client.get(endpoint)`
4. 第 208 行:`http_code, data = self._http.get(self.ENDPOINT, self._build_params())``http_code, data = self.http_client.get(self.ENDPOINT, self._build_params())`
`_parse_job51_response` 逻辑status/1 判断、resultbody 解析)完全保留,只替换 `ApiResult``Result`
</action>
<acceptance_criteria>
- `grep "from crawler_core.base import" spiderJobs/platforms/job51/api.py` 有输出
- `grep "ApiResult" spiderJobs/platforms/job51/api.py` 无输出
- `grep "self\._http" spiderJobs/platforms/job51/api.py` 无输出
- `python -c "from spiderJobs.platforms.job51.api import SearchRecommendJobs, GetJobDetail, GetCompanyDetail, SearchCompanyJobs"` 无 ImportError
</acceptance_criteria>
---
### Task 1.3: 更新 main.py
<read_first>
- `spiderJobs/platforms/job51/main.py`(当前内容)
</read_first>
<action>
修改 `spiderJobs/platforms/job51/main.py`
1. 第 32 行改为:
```python
from crawler_core.base import BaseFetcher, BaseSearcher
```
(删除 `from spiderJobs.core.base import BaseFetcher, BaseSearcher`
其他内容不变。
</action>
<acceptance_criteria>
- `grep "from crawler_core.base import BaseFetcher, BaseSearcher" spiderJobs/platforms/job51/main.py` 有输出
- `grep "spiderJobs.core" spiderJobs/platforms/job51/main.py` 无输出
</acceptance_criteria>
---
### Task 1.4: 将 sign.py 改为向后兼容桩
<read_first>
- `spiderJobs/platforms/job51/sign.py`(当前内容)
- `crawler_core/qcwy/sign.py`(权威实现)
</read_first>
<action>
`spiderJobs/platforms/job51/sign.py` 完全替换为:
```python
"""
向后兼容桩 — 前程无忧 (51Job) 签名
已迁移至 crawler_core.qcwy.sign。
直接从 crawler_core 重新导出,避免下游代码出现 ImportError。
"""
from crawler_core.qcwy.sign import Job51Sign # noqa: F401
__all__ = ["Job51Sign"]
```
</action>
<acceptance_criteria>
- `grep "from crawler_core.qcwy.sign import Job51Sign" spiderJobs/platforms/job51/sign.py` 有输出
- `python -c "from spiderJobs.platforms.job51.sign import Job51Sign; print(Job51Sign.generate_uuid())"` 成功打印 UUID
</acceptance_criteria>
---
### Task 1.5: 创建 tests/job51/__init__.py
<action>
创建 `tests/job51/__init__.py`,内容:`# tests/job51/`
</action>
<acceptance_criteria>
- `test -f tests/job51/__init__.py && echo "OK"` 输出 OK
</acceptance_criteria>
---
### Task 1.6: 编写 tests/job51/test_job51_client.py
<read_first>
- `spiderJobs/platforms/job51/api.py`(迁移后版本)
- `spiderJobs/platforms/job51/client.py`(迁移后版本)
- `crawler_core/qcwy/sign.py`Job51Sign 接口)
- `tests/boss/test_boss_client.py`(参考风格)
</read_first>
<action>
创建 `tests/job51/test_job51_client.py`,包含以下测试组:
1. **TestParseJob51Response纯函数**
- `test_http_error_returns_failure`HTTP 500 → success=False
- `test_status_zero_returns_failure`status=0 → success=False
- `test_status_one_with_resultbody_job_list`status=1resultbody.jobList.items → list 解析正确
- `test_status_one_no_items`status=1无 items → success=Truelist=[]
- `test_non_dict_raw_returns_failure`raw 不是 dict → failure
2. **TestSearchRecommendJobs**
- `test_search_success`:正常返回职位列表
- `test_search_http_error`HTTP 403
3. **TestGetJobDetail**
- `test_fetch_success`:成功返回 data
- `test_fetch_exception_handled``http_client.get` 抛异常 → success=False
4. **TestGetCompanyDetail**
- `test_fetch_success`:成功返回 data
5. **TestJob51ClientHeaders**
- `test_headers_contain_sign`POST 后 `_job51_headers(sign="abc")["sign"]` == "abc"
- `test_headers_uuid_format`uuid 字段长度 >= 20
所有测试使用 `MagicMock()` mock http_client.get/post无需网络。
</action>
<acceptance_criteria>
- `test -f tests/job51/test_job51_client.py && echo "OK"` 输出 OK
- `pipenv run python -m pytest tests/job51/ -v` 全部通过(>= 12 个测试用例)
</acceptance_criteria>
---
## Verification
```bash
# 1. 验证所有 job51 模块 import 正确
pipenv run python -c "
from spiderJobs.platforms.job51.client import Job51Client, create_client
from spiderJobs.platforms.job51.api import SearchRecommendJobs, GetJobDetail, GetCompanyDetail
from spiderJobs.platforms.job51.main import main, create_searcher
from crawler_core.base import BaseFetcher, BaseSearcher
from spiderJobs.platforms.job51.api import SearchRecommendJobs
assert issubclass(SearchRecommendJobs, BaseSearcher)
print('✅ 所有 job51 模块 import 成功,继承关系正确')
"
# 2. 确认无旧依赖残留
grep -rn "spiderJobs.core" spiderJobs/platforms/job51/client.py spiderJobs/platforms/job51/api.py spiderJobs/platforms/job51/main.py spiderJobs/platforms/job51/sign.py && echo "❌ 仍有旧依赖" || echo "✅ 无旧依赖"
# 3. 运行 mock 测试
pipenv run python -m pytest tests/job51/ -v
```

View File

@ -0,0 +1,369 @@
---
phase: 3
plan: 2
wave: 1
title: "迁移智联招聘zhilian层至 crawler_core + mock 测试"
depends_on: []
files_modified:
- spiderJobs/platforms/zhilian/client.py
- spiderJobs/platforms/zhilian/api.py
- spiderJobs/platforms/zhilian/main.py
- spiderJobs/platforms/zhilian/sign.py
- tests/zhilian/__init__.py
- tests/zhilian/test_zhilian_client.py
autonomous: true
requirements:
- ARCH-05
---
# Phase 3 Plan 02: 迁移智联招聘zhilian层至 crawler_core + mock 测试
## Objective
`spiderJobs/platforms/zhilian/` 从依赖 `spiderJobs.core`(旧基类)改为依赖 `crawler_core`(新基类),
同时新增 `tests/zhilian/test_zhilian_client.py` mock 测试,满足 ARCH-05。
**与 job51 的关键差异:**
- zhilian api.py 使用默认的 `parse_response`(无自定义 `_parse_response` 函数),无 `ApiResult` 替换
- zhilian client.py 需要特别保留 `ZhilianSign``sign_headers()``sign_params()` 接口
- `SearchCompanyPositions._build_params()` 通过 `self._client.signer.sign_params()` 访问 signer迁移后不受影响
## Must Haves
- [ ] `client.py` 继承 `crawler_core.http_client.HTTPClient`,使用 `crawler_core.zhilian.sign.ZhilianSign`
- [ ] `api.py` 使用 `crawler_core.base.BaseFetcher/BaseSearcher`
- [ ] `api.py` 中 1 处 `self._http.get(` 替换为 `self.http_client.get(`(第 200 行)
- [ ] `main.py` import 更新为 `crawler_core.base`
- [ ] `sign.py` 改为向后兼容桩(重新导出 `crawler_core.zhilian.sign.ZhilianSign`
- [ ] `grep -rn "spiderJobs.core" spiderJobs/platforms/zhilian/{client,api,main,sign}.py` 无输出
- [ ] `tests/zhilian/__init__.py` 存在
- [ ] `tests/zhilian/test_zhilian_client.py` 存在
- [ ] `pytest tests/zhilian/ -v` 全部通过(>= 12 个测试)
---
## Wave 1
### Task 2.1: 更新 client.py
<read_first>
- `spiderJobs/platforms/zhilian/client.py`(当前内容)
- `crawler_core/http_client.py`(目标基类)
- `crawler_core/zhilian/sign.py`ZhilianSign 新来源)
</read_first>
<action>
修改 `spiderJobs/platforms/zhilian/client.py`
1. 第 10 行改为:
```python
from crawler_core.http_client import HTTPClient
```
2. 第 11 行改为:
```python
from crawler_core.zhilian.sign import ZhilianSign
```
(删除 `from spiderJobs.core.http_client import HTTPClient``from spiderJobs.platforms.zhilian.sign import ZhilianSign`
**注意:** `ZhilianClient.get/post` 方法覆写了父类,并调用 `self.signer.sign_headers(page_code)`,这是 ZhilianSign 的接口,迁移后不受影响(接口签名完全一致)。
其他所有内容不变。
</action>
<acceptance_criteria>
- `grep "from crawler_core.http_client import HTTPClient" spiderJobs/platforms/zhilian/client.py` 有输出
- `grep "from crawler_core.zhilian.sign import ZhilianSign" spiderJobs/platforms/zhilian/client.py` 有输出
- `grep "spiderJobs.core" spiderJobs/platforms/zhilian/client.py` 无输出
- `python -c "from spiderJobs.platforms.zhilian.client import ZhilianClient"` 无 ImportError
</acceptance_criteria>
---
### Task 2.2: 更新 api.py
<read_first>
- `spiderJobs/platforms/zhilian/api.py`当前完整内容229 行)
- `crawler_core/base.py`BaseFetcher、BaseSearcher 接口)
</read_first>
<action>
修改 `spiderJobs/platforms/zhilian/api.py`
1. 第 10 行改为:
```python
from crawler_core.base import BaseFetcher, BaseSearcher
```
(删除 `from spiderJobs.core.base import BaseFetcher, BaseSearcher`
2. 第 200 行(`SearchCompanyPositions._request()`)改为:
```python
return self.http_client.get(self.ENDPOINT, params)
```
(原为 `return self._http.get(self.ENDPOINT, params)`
**注意:** zhilian api.py 无 ApiResult使用默认解析器无需替换 ApiResult。
`SearchCompanyPositions._build_params()` 中的 `self._client.signer.sign_params()` 不需要修改。
</action>
<acceptance_criteria>
- `grep "from crawler_core.base import BaseFetcher, BaseSearcher" spiderJobs/platforms/zhilian/api.py` 有输出
- `grep "spiderJobs.core" spiderJobs/platforms/zhilian/api.py` 无输出
- `grep "self\._http" spiderJobs/platforms/zhilian/api.py` 无输出
- `python -c "from spiderJobs.platforms.zhilian.api import SearchPositions, GetPositionDetail, SearchCompanyPositions"` 无 ImportError
</acceptance_criteria>
---
### Task 2.3: 更新 main.py
<read_first>
- `spiderJobs/platforms/zhilian/main.py`当前内容113 行)
</read_first>
<action>
修改 `spiderJobs/platforms/zhilian/main.py`
1. 第 32 行改为:
```python
from crawler_core.base import BaseFetcher, BaseSearcher
```
(删除 `from spiderJobs.core.base import BaseFetcher, BaseSearcher`
其他内容不变(无 sign importmain.py 中签名通过 ZhilianClient 自动注入)。
</action>
<acceptance_criteria>
- `grep "from crawler_core.base import BaseFetcher, BaseSearcher" spiderJobs/platforms/zhilian/main.py` 有输出
- `grep "spiderJobs.core" spiderJobs/platforms/zhilian/main.py` 无输出
</acceptance_criteria>
---
### Task 2.4: 将 sign.py 改为向后兼容桩
<read_first>
- `spiderJobs/platforms/zhilian/sign.py`当前内容87 行的独立实现)
- `crawler_core/zhilian/sign.py`(权威实现)
</read_first>
<action>
`spiderJobs/platforms/zhilian/sign.py` 完全替换为:
```python
"""
向后兼容桩 — 智联招聘签名
已迁移至 crawler_core.zhilian.sign。
直接从 crawler_core 重新导出,避免下游代码出现 ImportError。
"""
from crawler_core.zhilian.sign import ZhilianSign # noqa: F401
__all__ = ["ZhilianSign"]
```
</action>
<acceptance_criteria>
- `grep "from crawler_core.zhilian.sign import ZhilianSign" spiderJobs/platforms/zhilian/sign.py` 有输出
- `python -c "from spiderJobs.platforms.zhilian.sign import ZhilianSign; print(ZhilianSign().generate_uuid())"` 成功打印 UUID
</acceptance_criteria>
---
### Task 2.5: 创建 tests/zhilian/__init__.py
<action>
创建 `tests/zhilian/__init__.py`,内容:`# tests/zhilian/`
</action>
---
### Task 2.6: 编写 tests/zhilian/test_zhilian_client.py
<read_first>
- `spiderJobs/platforms/zhilian/api.py`(迁移后版本)
- `spiderJobs/platforms/zhilian/client.py`(迁移后版本)
- `crawler_core/zhilian/sign.py`ZhilianSign 接口)
- `tests/boss/test_boss_client.py`(参考风格)
</read_first>
<action>
创建 `tests/zhilian/test_zhilian_client.py`,包含以下测试:
```python
"""
智联招聘 HTTP 层 mock 测试QUAL-03 / ARCH-05
使用 MagicMock 替代真实 HTTP 客户端,无网络依赖。
"""
from __future__ import annotations
from unittest.mock import MagicMock
from crawler_core.zhilian.sign import ZhilianSign
from spiderJobs.platforms.zhilian.api import (
SearchPositions, GetPositionDetail, GetCompanyExtDetail,
GetCompanyDetail, SearchCompanyPositions,
)
from spiderJobs.platforms.zhilian.client import ZhilianClient
from crawler_core.base import Result
# ── 1. SearchPositionsPOST cgate─────────────────────
class TestSearchPositions:
def _make_client(self, status_code=200, data=None):
mock_client = MagicMock()
mock_client.post.return_value = (status_code, data or {})
return mock_client
def test_search_success_returns_list(self):
data = {
"data": {"list": [{"title": "Python 工程师"}], "numFound": 1},
"pageInfo": {"pageNum": 1, "pageSize": 15, "totalNum": 1}
}
searcher = SearchPositions(keyword="Python", city_code=538,
client=self._make_client(200, data))
result = searcher.search(page_index=1)
assert result.success is True
def test_search_http_error(self):
searcher = SearchPositions(client=self._make_client(403, {}))
result = searcher.search(page_index=1)
assert result.success is False
assert result.status_code == 403
# ── 2. GetPositionDetailGET cgate────────────────────
class TestGetPositionDetail:
def test_fetch_success(self):
mock_client = MagicMock()
mock_client.get.return_value = (200, {"data": {"jobName": "高级工程师"}})
fetcher = GetPositionDetail(number="CC123456", client=mock_client)
result = fetcher.fetch()
assert result.success is True
def test_fetch_404(self):
mock_client = MagicMock()
mock_client.get.return_value = (404, {})
fetcher = GetPositionDetail(number="notexist", client=mock_client)
result = fetcher.fetch()
assert result.success is False
assert result.status_code == 404
# ── 3. GetCompanyExtDetailGET cgate──────────────────
class TestGetCompanyExtDetail:
def test_fetch_success(self):
mock_client = MagicMock()
mock_client.get.return_value = (200, {"data": {"companyName": "测试公司"}})
fetcher = GetCompanyExtDetail(
company_name="测试公司", company_number="CZ123", client=mock_client)
result = fetcher.fetch()
assert result.success is True
# ── 4. GetCompanyDetailGET cgate─────────────────────
class TestGetCompanyDetail:
def test_fetch_success(self):
mock_client = MagicMock()
mock_client.get.return_value = (200, {"data": {"companyNumber": "CZ123"}})
fetcher = GetCompanyDetail(number="CZ123", client=mock_client)
result = fetcher.fetch()
assert result.success is True
def test_fetch_http_error(self):
mock_client = MagicMock()
mock_client.get.return_value = (500, {})
fetcher = GetCompanyDetail(number="CZ123", client=mock_client)
result = fetcher.fetch()
assert result.success is False
# ── 5. SearchCompanyPositionsGET capi────────────────
class TestSearchCompanyPositions:
def test_search_success(self):
mock_signer = MagicMock(spec=ZhilianSign)
mock_signer.sign_params.return_value = {"at": "", "rt": ""}
mock_client = MagicMock()
mock_client.signer = mock_signer
mock_client.get.return_value = (200, {"data": {"list": [{"jobName": "测试岗位"}]},
"pageInfo": {}})
searcher = SearchCompanyPositions(company_id="CZ123", client=mock_client)
result = searcher.search(page_index=1)
assert result.success is True
assert mock_signer.sign_params.called
def test_search_http_error(self):
mock_signer = MagicMock(spec=ZhilianSign)
mock_signer.sign_params.return_value = {}
mock_client = MagicMock()
mock_client.signer = mock_signer
mock_client.get.return_value = (403, {})
searcher = SearchCompanyPositions(company_id="CZ123", client=mock_client)
result = searcher.search(page_index=1)
assert result.success is False
# ── 6. ZhilianClient — 签名头注入 ───────────────────────
class TestZhilianClientHeaders:
def test_sign_headers_injects_at_rt(self):
signer = ZhilianSign(at="mock_at", rt="mock_rt")
client = ZhilianClient(signer=signer)
headers = client.signer.sign_headers()
assert headers["x-zp-at"] == "mock_at"
assert headers["x-zp-rt"] == "mock_rt"
def test_sign_headers_has_required_keys(self):
client = ZhilianClient()
headers = client.signer.sign_headers()
for key in ["x-zp-at", "x-zp-rt", "x-zp-action-id", "x-zp-device-id"]:
assert key in headers
def test_default_signer_empty_tokens(self):
client = ZhilianClient()
headers = client.signer.sign_headers()
assert headers["x-zp-at"] == ""
assert headers["x-zp-rt"] == ""
```
</action>
<acceptance_criteria>
- `test -f tests/zhilian/test_zhilian_client.py && echo "OK"` 输出 OK
- `pipenv run python -m pytest tests/zhilian/ -v` 全部通过(>= 12 个测试)
</acceptance_criteria>
---
## Verification
```bash
# 1. 验证所有 zhilian 模块 import 正确
pipenv run python -c "
from spiderJobs.platforms.zhilian.client import ZhilianClient, create_cgate_client, create_capi_client
from spiderJobs.platforms.zhilian.api import SearchPositions, GetPositionDetail, GetCompanyDetail, SearchCompanyPositions
from spiderJobs.platforms.zhilian.main import main, create_searcher
from crawler_core.base import BaseFetcher, BaseSearcher
assert issubclass(SearchPositions, BaseSearcher)
assert issubclass(GetPositionDetail, BaseFetcher)
print('✅ 所有 zhilian 模块 import 成功,继承关系正确')
"
# 2. 确认无旧依赖残留
grep -rn "spiderJobs.core" spiderJobs/platforms/zhilian/client.py spiderJobs/platforms/zhilian/api.py spiderJobs/platforms/zhilian/main.py spiderJobs/platforms/zhilian/sign.py && echo "❌ 仍有旧依赖" || echo "✅ 无旧依赖"
# 3. 运行 mock 测试
pipenv run python -m pytest tests/zhilian/ -v
# 4. 三平台全量回归
pipenv run python -m pytest tests/ -v --tb=short
```

View File

@ -0,0 +1,107 @@
# Phase 3: 前程无忧 & 智联重写 — 技术研究
**研究日期:** 2026-03-21
**阶段目标:** 前程无忧和智联招聘爬虫完全基于 crawler_core 运行,三平台统一使用新基类
---
## 1. 现状分析
### 1.1 crawler_core 现有基础Phase 1 完成)
| 文件 | 内容 |
|------|------|
| `crawler_core/qcwy/sign.py` | `Job51Sign.build_sign_path()` — HMAC-SHA256 签名 |
| `crawler_core/zhilian/sign.py` | `ZhilianSign.sign_headers()/sign_params()` — 智联多类型签名 |
| `crawler_core/http_client.py` | `HTTPClient` — TLS 伪装 + 代理 + tenacity 重试 |
| `crawler_core/base.py` | `Result[T]`, `BaseFetcher`, `BaseSearcher` |
### 1.2 前程无忧job51待迁移层
`spiderJobs/platforms/job51/` 下已有全部文件:
| 文件 | 旧依赖 | 迁移目标 |
|------|--------|---------|
| `client.py` | `spiderJobs.core.http_client.HTTPClient` + `spiderJobs.platforms.job51.sign.Job51Sign` | → `crawler_core.http_client.HTTPClient` + `crawler_core.qcwy.sign.Job51Sign` |
| `api.py` | `spiderJobs.core.base.ApiResult/BaseFetcher/BaseSearcher` | → `crawler_core.base.Result/BaseFetcher/BaseSearcher` |
| `main.py` | `spiderJobs.core.base.BaseFetcher/BaseSearcher` | → `crawler_core.base.BaseFetcher/BaseSearcher` |
| `sign.py` | 独立实现(与 crawler_core/qcwy/sign.py 相同) | → 向后兼容桩,重新导出 `Job51Sign` |
**job51/api.py 具体变更:**
- 第 14 行 import 替换
- `ApiResult` 全量替换为 `Result`(共 11 处)
- 第 164 行:`self._http.get(endpoint)``self.http_client.get(endpoint)`
- 第 208 行:`self._http.get(self.ENDPOINT, ...)``self.http_client.get(self.ENDPOINT, ...)`
### 1.3 智联招聘zhilian待迁移层
`spiderJobs/platforms/zhilian/` 下已有全部文件:
| 文件 | 旧依赖 | 迁移目标 |
|------|--------|---------|
| `client.py` | `spiderJobs.core.http_client.HTTPClient` + `spiderJobs.platforms.zhilian.sign.ZhilianSign` | → `crawler_core.http_client.HTTPClient` + `crawler_core.zhilian.sign.ZhilianSign` |
| `api.py` | `spiderJobs.core.base.BaseFetcher/BaseSearcher` | → `crawler_core.base.BaseFetcher/BaseSearcher` |
| `main.py` | `spiderJobs.core.base.BaseFetcher/BaseSearcher` | → `crawler_core.base.BaseFetcher/BaseSearcher` |
| `sign.py` | 独立实现(与 crawler_core/zhilian/sign.py 相同) | → 向后兼容桩,重新导出 `ZhilianSign` |
**zhilian/api.py 具体变更:**
- 第 10 行 import 替换(无 ApiResultzhilian 使用 crawler_core 的默认解析器,无需自定义 _parse_response
- 第 200 行:`return self._http.get(``return self.http_client.get(`
**重要差异:** 智联 api.py 中 `SearchCompanyPositions._build_params()` 第 184 行使用了 `self._client.signer.sign_params()`,这是通过 `self._client`(设为传入的 ZhilianClient间接访问 signer 的,迁移后不受影响(属性名不变)。
---
## 2. 对比 Phase 2Boss的工作量
| 维度 | Phase 2 Boss | Phase 3 job51 | Phase 3 zhilian |
|------|-------------|---------------|-----------------|
| `ApiResult` 替换 | 11 处 | 11 处 | 0 处(无自定义解析器) |
| `self._http` 替换 | 3 处 | 2 处 | 1 处 |
| sign.py → 桩 | ✓ | ✓ | ✓ |
| client.py import | ✓ | ✓ | ✓ |
| api.py import | ✓ | ✓ | ✓ |
| main.py import | ✓ | ✓ | ✓ |
---
## 3. mock 测试策略
### 3.1 job51 测试tests/job51/test_job51_client.py
job51 的 mock 策略与 Boss 完全一致:用 `MagicMock()` mock `http_client`,测试:
- `_parse_job51_response`(纯函数,覆盖 status/1 成功、非 1 失败、HTTP 错误)
- `SearchRecommendJobs.search()`正常、HTTP 错误)
- `GetJobDetail.fetch()`(成功,异常捕获)
- `GetCompanyDetail.fetch()`(成功)
- `Job51Client._job51_headers()`sign 注入)
### 3.2 zhilian 测试tests/zhilian/test_zhilian_client.py
智联的请求走 POSTcgate或 GETcapimock 方式相同:
- `SearchPositions.search()`(正常、错误)
- `GetPositionDetail.fetch()`(成功)
- `SearchCompanyPositions.search()`(成功,特别验证 sign_params 被调用)
- `ZhilianClient.post/get`(验证签名头注入)
---
## 4. 成功标准验证
| 标准 | 验证方式 |
|------|---------|
| job51 继承 BaseFetcher/BaseSearcher | `issubclass()` 断言 |
| zhilian 继承 BaseFetcher/BaseSearcher | `issubclass()` 断言 |
| 两平台无内联签名或 HTTP 样板 | `grep` 无 requests import无 hmac 在 client/api 中 |
| mock 测试通过 | `pytest tests/job51/ tests/zhilian/` |
| 三平台代码结构一致 | 代码审查client/api/sign/main 四文件结构 |
---
## RESEARCH COMPLETE
**Phase 3 可以规划,分 2 个 PLAN**
- **Plan 01**迁移前程无忧job51层 + mock 测试
- **Plan 02**迁移智联招聘zhilian层 + mock 测试
两个 Plan 无依赖关系,理论上可并行,但顺序执行更稳妥。