7.2 KiB
phase, plan, wave, title, depends_on, files_modified, autonomous, requirements
| phase | plan | wave | title | depends_on | files_modified | autonomous | requirements | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 1 | 迁移 Boss 爬虫层至 crawler_core |
|
true |
|
Phase 2 Plan 01: 迁移 Boss 爬虫层至 crawler_core
Objective
将 spiderJobs/platforms/boss/ 下的 client.py、api.py、main.py 从依赖
spiderJobs.core(旧基类)改为依赖 crawler_core(新基类),
同时删除 spiderJobs 版的冗余 sign.py(改从 crawler_core 导入)。
迁移完成后,Boss 爬虫满足 ARCH-03:不含内联签名或 HTTP 样板代码。
Must Haves
spiderJobs/platforms/boss/client.py继承crawler_core.http_client.HTTPClientspiderJobs/platforms/boss/api.py使用crawler_core.base.Result、BaseFetcher、BaseSearcherspiderJobs/platforms/boss/api.py中self._http全部替换为self.http_clientspiderJobs/platforms/boss/main.pyimport 更新,功能不变spiderJobs/platforms/boss/sign.py改为从crawler_core.boss.sign重新导出(向后兼容层)python -m spiderJobs.platforms.boss.main可启动,无 ImportError
Wave 1(仅一波,任务间有顺序依赖)
Task 1.1: 更新 client.py
<read_first>
spiderJobs/platforms/boss/client.py(当前完整内容)crawler_core/http_client.py(目标基类接口)crawler_core/boss/sign.py(BossSign 新来源) </read_first>
-
将第 10-11 行的 import 改为:
from crawler_core.http_client import HTTPClient from crawler_core.boss.sign import BossSign(删除
from spiderJobs.core.http_client import HTTPClient和from spiderJobs.platforms.boss.sign import BossSign) -
BossClient(HTTPClient)继承关系不变,无需修改类体(两个 HTTPClient 接口完全一致)。 -
create_client()工厂函数无需改动。
注意:BASE_URL、BOSS_HEADERS 和所有方法体内容均保持不变。
<acceptance_criteria>
grep "from crawler_core.http_client import HTTPClient" spiderJobs/platforms/boss/client.py输出该行grep "from crawler_core.boss.sign import BossSign" spiderJobs/platforms/boss/client.py输出该行grep "spiderJobs.core" spiderJobs/platforms/boss/client.py无输出(空)python -c "from spiderJobs.platforms.boss.client import BossClient"无 ImportError </acceptance_criteria>
Task 1.2: 更新 api.py
<read_first>
spiderJobs/platforms/boss/api.py(当前完整内容)crawler_core/base.py(Result、BaseFetcher、BaseSearcher 接口定义)spiderJobs/platforms/boss/client.py(迁移后版本,Task 1.1 产物) </read_first>
-
将第 15 行的 import 改为:
from crawler_core.base import BaseFetcher, BaseSearcher, Result(删除
from spiderJobs.core.base import ApiResult, BaseFetcher, BaseSearcher) -
全文替换
ApiResult→Result(出现在_parse_boss_response返回类型注解和函数体中) -
在
SearchRecJobs._request()方法中,将:return self._http.get(self.ENDPOINT, params)改为:
return self.http_client.get(self.ENDPOINT, params) -
在
GetJobDetail.fetch()方法中,将:client: BossClient = self._http改为:
client: BossClient = self.http_client -
在
SearchBrandJobs._request()方法中,将:return self._http.get(self.ENDPOINT, params)改为:
return self.http_client.get(self.ENDPOINT, params)
_parse_boss_response 的逻辑(code/zpData 解析)、所有参数、ENDPOINT 字符串均保持不变。
<acceptance_criteria>
grep "from crawler_core.base import" spiderJobs/platforms/boss/api.py输出该行grep "ApiResult" spiderJobs/platforms/boss/api.py无输出(已全部替换为 Result)grep "spiderJobs.core" spiderJobs/platforms/boss/api.py无输出grep "self\._http" spiderJobs/platforms/boss/api.py无输出(全替换为 self.http_client)python -c "from spiderJobs.platforms.boss.api import SearchRecJobs, GetJobDetail, GetBrandDetail, SearchBrandJobs"无 ImportError </acceptance_criteria>
Task 1.3: 更新 main.py
<read_first>
spiderJobs/platforms/boss/main.py(当前完整内容)crawler_core/base.py(BaseFetcher、BaseSearcher 新接口) </read_first>
-
将第 35 行的 import 改为:
from crawler_core.base import BaseFetcher, BaseSearcher(删除
from spiderJobs.core.base import BaseFetcher, BaseSearcher) -
将第 38 行的 import 改为:
from crawler_core.boss.sign import BossSign(删除
from spiderJobs.platforms.boss.sign import BossSign) -
其他所有内容(CITY_CODE_MAP、create_searcher、extract_company_id、create_company_fetcher、main)保持不变。
<acceptance_criteria>
grep "from crawler_core.base import BaseFetcher, BaseSearcher" spiderJobs/platforms/boss/main.py输出该行grep "from crawler_core.boss.sign import BossSign" spiderJobs/platforms/boss/main.py输出该行grep "spiderJobs.core" spiderJobs/platforms/boss/main.py无输出python -c "from spiderJobs.platforms.boss.main import main"无 ImportError </acceptance_criteria>
Task 1.4: 将 spiderJobs 版 sign.py 改为向后兼容桩
<read_first>
spiderJobs/platforms/boss/sign.py(当前完整内容)crawler_core/boss/sign.py(权威实现) </read_first>
"""
向后兼容桩 — Boss直聘签名
已迁移至 crawler_core.boss.sign。
直接从 crawler_core 重新导出,避免下游代码出现 ImportError。
"""
from crawler_core.boss.sign import BossSign # noqa: F401
__all__ = ["BossSign"]
<acceptance_criteria>
cat spiderJobs/platforms/boss/sign.py仅包含导入和__all__声明,不含任何 Boss 签名算法实现grep "from crawler_core.boss.sign import BossSign" spiderJobs/platforms/boss/sign.py输出该行python -c "from spiderJobs.platforms.boss.sign import BossSign; print(BossSign.generate_traceid())"成功打印 Traceid </acceptance_criteria>
Verification
# 验证所有 Boss 层 import 正确
python -c "
from spiderJobs.platforms.boss.client import BossClient, create_client
from spiderJobs.platforms.boss.api import SearchRecJobs, GetJobDetail, GetBrandDetail, SearchBrandJobs
from spiderJobs.platforms.boss.main import main, create_searcher
print('✅ 所有 Boss 模块 import 成功')
"
# 确认无旧依赖残留
grep -rn "spiderJobs.core" spiderJobs/platforms/boss/ && echo "❌ 仍有旧依赖" || echo "✅ 无旧依赖"
# 确认 sign 桩正常工作
python -c "from spiderJobs.platforms.boss.sign import BossSign; print('Traceid:', BossSign.generate_traceid())"