7.2 KiB
Raw Blame History

phase, plan, wave, title, depends_on, files_modified, autonomous, requirements
phase plan wave title depends_on files_modified autonomous requirements
2 1 1 迁移 Boss 爬虫层至 crawler_core
spiderJobs/platforms/boss/client.py
spiderJobs/platforms/boss/api.py
spiderJobs/platforms/boss/main.py
spiderJobs/platforms/boss/sign.py
true
ARCH-03

Phase 2 Plan 01: 迁移 Boss 爬虫层至 crawler_core

Objective

spiderJobs/platforms/boss/ 下的 client.py、api.py、main.py 从依赖 spiderJobs.core(旧基类)改为依赖 crawler_core(新基类), 同时删除 spiderJobs 版的冗余 sign.py(改从 crawler_core 导入)。

迁移完成后Boss 爬虫满足 ARCH-03不含内联签名或 HTTP 样板代码。

Must Haves

  • spiderJobs/platforms/boss/client.py 继承 crawler_core.http_client.HTTPClient
  • spiderJobs/platforms/boss/api.py 使用 crawler_core.base.ResultBaseFetcherBaseSearcher
  • spiderJobs/platforms/boss/api.pyself._http 全部替换为 self.http_client
  • spiderJobs/platforms/boss/main.py import 更新,功能不变
  • spiderJobs/platforms/boss/sign.py 改为从 crawler_core.boss.sign 重新导出(向后兼容层)
  • python -m spiderJobs.platforms.boss.main 可启动,无 ImportError

Wave 1仅一波任务间有顺序依赖

Task 1.1: 更新 client.py

<read_first>

  • spiderJobs/platforms/boss/client.py(当前完整内容)
  • crawler_core/http_client.py(目标基类接口)
  • crawler_core/boss/sign.pyBossSign 新来源) </read_first>
修改 `spiderJobs/platforms/boss/client.py`
  1. 将第 10-11 行的 import 改为:

    from crawler_core.http_client import HTTPClient
    from crawler_core.boss.sign import BossSign
    

    (删除 from spiderJobs.core.http_client import HTTPClientfrom spiderJobs.platforms.boss.sign import BossSign

  2. BossClient(HTTPClient) 继承关系不变,无需修改类体(两个 HTTPClient 接口完全一致)。

  3. create_client() 工厂函数无需改动。

注意:BASE_URLBOSS_HEADERS 和所有方法体内容均保持不变。

<acceptance_criteria>

  • grep "from crawler_core.http_client import HTTPClient" spiderJobs/platforms/boss/client.py 输出该行
  • grep "from crawler_core.boss.sign import BossSign" spiderJobs/platforms/boss/client.py 输出该行
  • grep "spiderJobs.core" spiderJobs/platforms/boss/client.py 无输出(空)
  • python -c "from spiderJobs.platforms.boss.client import BossClient" 无 ImportError </acceptance_criteria>

Task 1.2: 更新 api.py

<read_first>

  • spiderJobs/platforms/boss/api.py(当前完整内容)
  • crawler_core/base.pyResult、BaseFetcher、BaseSearcher 接口定义)
  • spiderJobs/platforms/boss/client.py迁移后版本Task 1.1 产物) </read_first>
修改 `spiderJobs/platforms/boss/api.py`
  1. 将第 15 行的 import 改为:

    from crawler_core.base import BaseFetcher, BaseSearcher, Result
    

    (删除 from spiderJobs.core.base import ApiResult, BaseFetcher, BaseSearcher

  2. 全文替换 ApiResultResult(出现在 _parse_boss_response 返回类型注解和函数体中)

  3. SearchRecJobs._request() 方法中,将:

    return self._http.get(self.ENDPOINT, params)
    

    改为:

    return self.http_client.get(self.ENDPOINT, params)
    
  4. GetJobDetail.fetch() 方法中,将:

    client: BossClient = self._http
    

    改为:

    client: BossClient = self.http_client
    
  5. SearchBrandJobs._request() 方法中,将:

    return self._http.get(self.ENDPOINT, params)
    

    改为:

    return self.http_client.get(self.ENDPOINT, params)
    

_parse_boss_response 的逻辑code/zpData 解析、所有参数、ENDPOINT 字符串均保持不变。

<acceptance_criteria>

  • grep "from crawler_core.base import" spiderJobs/platforms/boss/api.py 输出该行
  • grep "ApiResult" spiderJobs/platforms/boss/api.py 无输出(已全部替换为 Result
  • grep "spiderJobs.core" spiderJobs/platforms/boss/api.py 无输出
  • grep "self\._http" spiderJobs/platforms/boss/api.py 无输出(全替换为 self.http_client
  • python -c "from spiderJobs.platforms.boss.api import SearchRecJobs, GetJobDetail, GetBrandDetail, SearchBrandJobs" 无 ImportError </acceptance_criteria>

Task 1.3: 更新 main.py

<read_first>

  • spiderJobs/platforms/boss/main.py(当前完整内容)
  • crawler_core/base.pyBaseFetcher、BaseSearcher 新接口) </read_first>
修改 `spiderJobs/platforms/boss/main.py`
  1. 将第 35 行的 import 改为:

    from crawler_core.base import BaseFetcher, BaseSearcher
    

    (删除 from spiderJobs.core.base import BaseFetcher, BaseSearcher

  2. 将第 38 行的 import 改为:

    from crawler_core.boss.sign import BossSign
    

    (删除 from spiderJobs.platforms.boss.sign import BossSign

  3. 其他所有内容CITY_CODE_MAP、create_searcher、extract_company_id、create_company_fetcher、main保持不变。

<acceptance_criteria>

  • grep "from crawler_core.base import BaseFetcher, BaseSearcher" spiderJobs/platforms/boss/main.py 输出该行
  • grep "from crawler_core.boss.sign import BossSign" spiderJobs/platforms/boss/main.py 输出该行
  • grep "spiderJobs.core" spiderJobs/platforms/boss/main.py 无输出
  • python -c "from spiderJobs.platforms.boss.main import main" 无 ImportError </acceptance_criteria>

Task 1.4: 将 spiderJobs 版 sign.py 改为向后兼容桩

<read_first>

  • spiderJobs/platforms/boss/sign.py(当前完整内容)
  • crawler_core/boss/sign.py(权威实现) </read_first>
将 `spiderJobs/platforms/boss/sign.py` 内容完全替换为以下向后兼容桩, 保留 `BossSign` 名称以防现有代码仍直接 import
"""
向后兼容桩 — Boss直聘签名

已迁移至 crawler_core.boss.sign。
直接从 crawler_core 重新导出,避免下游代码出现 ImportError。
"""

from crawler_core.boss.sign import BossSign  # noqa: F401

__all__ = ["BossSign"]

<acceptance_criteria>

  • cat spiderJobs/platforms/boss/sign.py 仅包含导入和 __all__ 声明,不含任何 Boss 签名算法实现
  • grep "from crawler_core.boss.sign import BossSign" spiderJobs/platforms/boss/sign.py 输出该行
  • python -c "from spiderJobs.platforms.boss.sign import BossSign; print(BossSign.generate_traceid())" 成功打印 Traceid </acceptance_criteria>

Verification

# 验证所有 Boss 层 import 正确
python -c "
from spiderJobs.platforms.boss.client import BossClient, create_client
from spiderJobs.platforms.boss.api import SearchRecJobs, GetJobDetail, GetBrandDetail, SearchBrandJobs
from spiderJobs.platforms.boss.main import main, create_searcher
print('✅ 所有 Boss 模块 import 成功')
"

# 确认无旧依赖残留
grep -rn "spiderJobs.core" spiderJobs/platforms/boss/ && echo "❌ 仍有旧依赖" || echo "✅ 无旧依赖"

# 确认 sign 桩正常工作
python -c "from spiderJobs.platforms.boss.sign import BossSign; print('Traceid:', BossSign.generate_traceid())"