8.2 KiB
Raw Blame History

phase, plan, wave, title, depends_on, files_modified, autonomous, requirements
phase plan wave title depends_on files_modified autonomous requirements
3 1 1 迁移前程无忧job51层至 crawler_core + mock 测试
spiderJobs/platforms/job51/client.py
spiderJobs/platforms/job51/api.py
spiderJobs/platforms/job51/main.py
spiderJobs/platforms/job51/sign.py
tests/job51/__init__.py
tests/job51/test_job51_client.py
true
ARCH-04

Phase 3 Plan 01: 迁移前程无忧job51层至 crawler_core + mock 测试

Objective

spiderJobs/platforms/job51/ 从依赖 spiderJobs.core(旧基类)改为依赖 crawler_core(新基类), 同时新增 tests/job51/test_job51_client.py mock 测试,满足 ARCH-04。

迁移与 Phase 2 Boss 完全对称4 个文件修改 + sign.py 改桩 + 新增测试。

Must Haves

  • client.py 继承 crawler_core.http_client.HTTPClient,使用 crawler_core.qcwy.sign.Job51Sign
  • api.py 使用 crawler_core.base.Result/BaseFetcher/BaseSearcherApiResult 全量替换为 Result
  • api.py 中 2 处 self._http 替换为 self.http_client
  • main.py import 更新为 crawler_core.base
  • sign.py 改为向后兼容桩(重新导出 crawler_core.qcwy.sign.Job51Sign
  • python -c "from spiderJobs.platforms.job51.api import ...; from spiderJobs.platforms.job51.client import ..." 无 ImportError
  • grep -rn "spiderJobs.core" spiderJobs/platforms/job51/{client,api,main,sign}.py 无输出
  • tests/job51/__init__.py 存在
  • tests/job51/test_job51_client.py 存在
  • pytest tests/job51/ -v 全部通过(>= 15 个测试)

Wave 1

Task 1.1: 更新 client.py

<read_first>

  • spiderJobs/platforms/job51/client.py(当前内容)
  • crawler_core/http_client.py(目标基类)
  • crawler_core/qcwy/sign.pyJob51Sign 新来源) </read_first>
修改 `spiderJobs/platforms/job51/client.py`
  1. 第 15 行改为:
    from crawler_core.http_client import HTTPClient
    
  2. 第 16 行改为:
    from crawler_core.qcwy.sign import Job51Sign
    
    (删除 from spiderJobs.core.http_client import HTTPClientfrom spiderJobs.platforms.job51.sign import Job51Sign

其他所有内容JOB51_HEADERS、Job51Client 类体、create_client 函数)不变。

<acceptance_criteria>

  • grep "from crawler_core.http_client import HTTPClient" spiderJobs/platforms/job51/client.py 有输出
  • grep "from crawler_core.qcwy.sign import Job51Sign" spiderJobs/platforms/job51/client.py 有输出
  • grep "spiderJobs.core" spiderJobs/platforms/job51/client.py 无输出
  • python -c "from spiderJobs.platforms.job51.client import Job51Client, create_client" 无 ImportError </acceptance_criteria>

Task 1.2: 更新 api.py

<read_first>

  • spiderJobs/platforms/job51/api.py(当前完整内容,共 ~260 行)
  • crawler_core/base.pyResult、BaseFetcher、BaseSearcher 接口) </read_first>
修改 `spiderJobs/platforms/job51/api.py`
  1. 第 14 行改为:

    from crawler_core.base import BaseFetcher, BaseSearcher, Result
    

    (删除 from spiderJobs.core.base import ApiResult, BaseFetcher, BaseSearcher

  2. 全文将 ApiResult 替换为 Result(共 11 处,包含函数注解和 return 语句)

  3. 第 164 行:http_code, data = self._http.get(endpoint)http_code, data = self.http_client.get(endpoint)

  4. 第 208 行:http_code, data = self._http.get(self.ENDPOINT, self._build_params())http_code, data = self.http_client.get(self.ENDPOINT, self._build_params())

_parse_job51_response 逻辑status/1 判断、resultbody 解析)完全保留,只替换 ApiResultResult

<acceptance_criteria>

  • grep "from crawler_core.base import" spiderJobs/platforms/job51/api.py 有输出
  • grep "ApiResult" spiderJobs/platforms/job51/api.py 无输出
  • grep "self\._http" spiderJobs/platforms/job51/api.py 无输出
  • python -c "from spiderJobs.platforms.job51.api import SearchRecommendJobs, GetJobDetail, GetCompanyDetail, SearchCompanyJobs" 无 ImportError </acceptance_criteria>

Task 1.3: 更新 main.py

<read_first>

  • spiderJobs/platforms/job51/main.py(当前内容) </read_first>
修改 `spiderJobs/platforms/job51/main.py`
  1. 第 32 行改为:
    from crawler_core.base import BaseFetcher, BaseSearcher
    
    (删除 from spiderJobs.core.base import BaseFetcher, BaseSearcher

其他内容不变。

<acceptance_criteria>

  • grep "from crawler_core.base import BaseFetcher, BaseSearcher" spiderJobs/platforms/job51/main.py 有输出
  • grep "spiderJobs.core" spiderJobs/platforms/job51/main.py 无输出 </acceptance_criteria>

Task 1.4: 将 sign.py 改为向后兼容桩

<read_first>

  • spiderJobs/platforms/job51/sign.py(当前内容)
  • crawler_core/qcwy/sign.py(权威实现) </read_first>
将 `spiderJobs/platforms/job51/sign.py` 完全替换为:
"""
向后兼容桩 — 前程无忧 (51Job) 签名

已迁移至 crawler_core.qcwy.sign。
直接从 crawler_core 重新导出,避免下游代码出现 ImportError。
"""

from crawler_core.qcwy.sign import Job51Sign  # noqa: F401

__all__ = ["Job51Sign"]

<acceptance_criteria>

  • grep "from crawler_core.qcwy.sign import Job51Sign" spiderJobs/platforms/job51/sign.py 有输出
  • python -c "from spiderJobs.platforms.job51.sign import Job51Sign; print(Job51Sign.generate_uuid())" 成功打印 UUID </acceptance_criteria>

Task 1.5: 创建 tests/job51/init.py

创建 `tests/job51/__init__.py`,内容:`# tests/job51/`

<acceptance_criteria>

  • test -f tests/job51/__init__.py && echo "OK" 输出 OK </acceptance_criteria>

Task 1.6: 编写 tests/job51/test_job51_client.py

<read_first>

  • spiderJobs/platforms/job51/api.py(迁移后版本)
  • spiderJobs/platforms/job51/client.py(迁移后版本)
  • crawler_core/qcwy/sign.pyJob51Sign 接口)
  • tests/boss/test_boss_client.py(参考风格) </read_first>
创建 `tests/job51/test_job51_client.py`,包含以下测试组:
  1. TestParseJob51Response纯函数

    • test_http_error_returns_failureHTTP 500 → success=False
    • test_status_zero_returns_failurestatus=0 → success=False
    • test_status_one_with_resultbody_job_liststatus=1resultbody.jobList.items → list 解析正确
    • test_status_one_no_itemsstatus=1无 items → success=Truelist=[]
    • test_non_dict_raw_returns_failureraw 不是 dict → failure
  2. TestSearchRecommendJobs

    • test_search_success:正常返回职位列表
    • test_search_http_errorHTTP 403
  3. TestGetJobDetail

    • test_fetch_success:成功返回 data
    • test_fetch_exception_handledhttp_client.get 抛异常 → success=False
  4. TestGetCompanyDetail

    • test_fetch_success:成功返回 data
  5. TestJob51ClientHeaders

    • test_headers_contain_signPOST 后 _job51_headers(sign="abc")["sign"] == "abc"
    • test_headers_uuid_formatuuid 字段长度 >= 20

所有测试使用 MagicMock() mock http_client.get/post无需网络。

<acceptance_criteria>

  • test -f tests/job51/test_job51_client.py && echo "OK" 输出 OK
  • pipenv run python -m pytest tests/job51/ -v 全部通过(>= 12 个测试用例) </acceptance_criteria>

Verification

# 1. 验证所有 job51 模块 import 正确
pipenv run python -c "
from spiderJobs.platforms.job51.client import Job51Client, create_client
from spiderJobs.platforms.job51.api import SearchRecommendJobs, GetJobDetail, GetCompanyDetail
from spiderJobs.platforms.job51.main import main, create_searcher
from crawler_core.base import BaseFetcher, BaseSearcher
from spiderJobs.platforms.job51.api import SearchRecommendJobs
assert issubclass(SearchRecommendJobs, BaseSearcher)
print('✅ 所有 job51 模块 import 成功,继承关系正确')
"

# 2. 确认无旧依赖残留
grep -rn "spiderJobs.core" spiderJobs/platforms/job51/client.py spiderJobs/platforms/job51/api.py spiderJobs/platforms/job51/main.py spiderJobs/platforms/job51/sign.py && echo "❌ 仍有旧依赖" || echo "✅ 无旧依赖"

# 3. 运行 mock 测试
pipenv run python -m pytest tests/job51/ -v