8.9 KiB
Raw Blame History

phase, plan, wave, title, depends_on, files_modified, autonomous, requirements
phase plan wave title depends_on files_modified autonomous requirements
4 1 1 迁移 facade 层 import 至 spiderJobs.platforms.* + asyncio.to_thread 桥接
app/services/crawler/boss.py
app/services/crawler/qcwy.py
app/services/crawler/zhilian.py
true
ARCH-06
ARCH-07

Phase 4 Plan 01: 迁移 facade 层 import 至 spiderJobs.platforms.* + asyncio.to_thread 桥接

Objective

app/services/crawler/ 的三个 facade 文件boss.py/qcwy.py/zhilian.py从 引用内部私有复制文件(_boss_api.py_boss_client.py 等)改为直接引用 spiderJobs.platforms.*(已基于 crawler_core满足 ARCH-06/ARCH-07。

对外接口(set_proxy()get_job_detail() 等)完全不变。

同时为每个 Service 添加 asyncio.to_thread() 异步包装方法ARCH-06

Must Haves

  • boss.py 改导入 spiderJobs.platforms.boss.{api,client,sign}
  • qcwy.py 改导入 spiderJobs.platforms.job51.{api,client}
  • zhilian.py 改导入 spiderJobs.platforms.zhilian.{api,client,sign}
  • 三个 Service 各添加 async 方法(asyncio.to_thread 包装)
  • python -c "from app.services.crawler.boss import BossService" 无 ImportError
  • pytest tests/ -v 全部通过(无回归)

Wave 1

Task 1.1: 更新 boss.py

<read_first>

  • app/services/crawler/boss.py当前内容116 行)
  • spiderJobs/platforms/boss/api.pyGetBrandDetail/GetJobDetail/SearchBrandJobs/SearchRecJobs 导出)
  • spiderJobs/platforms/boss/client.pyBossClient/create_client 导出,含 batch()
  • spiderJobs/platforms/boss/sign.pyBossSign → crawler_core 桩) </read_first>
修改 `app/services/crawler/boss.py`
  1. 将 import 块(第 12-19 行)替换为:

    from spiderJobs.platforms.boss.api import (
        GetBrandDetail,
        GetJobDetail,
        SearchBrandJobs,
        SearchRecJobs,
    )
    from spiderJobs.platforms.boss.client import BossClient, create_client
    from spiderJobs.platforms.boss.sign import BossSign
    
  2. BossService 类末尾添加异步包装方法:

    # ── asyncio.to_thread 桥接ARCH-06────────
    
    async def async_get_job_detail(
        self, job_id: str, lid: str = "", security_id: str = ""
    ) -> Optional[Dict]:
        import asyncio
        return await asyncio.to_thread(self.get_job_detail_by_id, job_id, lid, security_id)
    
    async def async_get_company_detail(self, company_id: str) -> Optional[Dict]:
        import asyncio
        return await asyncio.to_thread(self.get_company_detail_by_id, company_id)
    
    async def async_get_company_jobs(
        self, company_id: str, page: int = 1
    ) -> Optional[Dict]:
        import asyncio
        return await asyncio.to_thread(self.get_company_jobs_by_id, company_id, page)
    
    async def async_search_jobs(
        self, keyword: str, city_code: str = "101010100", page: int = 1
    ) -> Optional[Dict]:
        import asyncio
        return await asyncio.to_thread(self.search_jobs, keyword, city_code, page)
    

<acceptance_criteria>

  • grep "from spiderJobs.platforms.boss" app/services/crawler/boss.py 有输出
  • grep "app.services.crawler._boss" app/services/crawler/boss.py 无输出
  • grep "asyncio.to_thread" app/services/crawler/boss.py 有输出
  • pipenv run python -c "from app.services.crawler.boss import BossService" 成功 </acceptance_criteria>

Task 1.2: 更新 qcwy.py

<read_first>

  • app/services/crawler/qcwy.py当前内容103 行)
  • spiderJobs/platforms/job51/api.pyGetCompanyInfo/GetJobDetail/SearchCompanyJobs/SearchRecommendJobs 导出)
  • spiderJobs/platforms/job51/client.pyJob51Client/create_client 导出) </read_first>
修改 `app/services/crawler/qcwy.py`
  1. 将 import 块(第 12-18 行)替换为:

    from spiderJobs.platforms.job51.api import (
        GetCompanyInfo,
        GetJobDetail,
        SearchCompanyJobs,
        SearchRecommendJobs,
    )
    from spiderJobs.platforms.job51.client import Job51Client, create_client
    
  2. QcwyService 类末尾添加异步包装方法:

    # ── asyncio.to_thread 桥接ARCH-06────────
    
    async def async_get_job_detail(self, job_id: str) -> Dict:
        import asyncio
        return await asyncio.to_thread(self.get_job_detail, job_id)
    
    async def async_get_company_info(self, company_id: str) -> Dict:
        import asyncio
        return await asyncio.to_thread(self.get_company_info, company_id)
    
    async def async_get_company_jobs(
        self, company_id: str, page: int = 1, page_size: int = 30, **kwargs
    ) -> Dict:
        import asyncio
        return await asyncio.to_thread(
            self.get_company_jobs_by_id, company_id, page, page_size
        )
    
    async def async_search_jobs(
        self, keyword: str, job_area: str = "020000", page: int = 1
    ) -> List:
        import asyncio
        return await asyncio.to_thread(self.search_jobs, keyword, job_area, page)
    

<acceptance_criteria>

  • grep "from spiderJobs.platforms.job51" app/services/crawler/qcwy.py 有输出
  • grep "app.services.crawler._job51" app/services/crawler/qcwy.py 无输出
  • grep "asyncio.to_thread" app/services/crawler/qcwy.py 有输出
  • pipenv run python -c "from app.services.crawler.qcwy import QcwyService" 成功 </acceptance_criteria>

Task 1.3: 更新 zhilian.py

<read_first>

  • app/services/crawler/zhilian.py当前内容143 行)
  • spiderJobs/platforms/zhilian/api.pyGetCompanyDetail/GetPositionDetail/SearchCompanyPositions/SearchPositions 导出)
  • spiderJobs/platforms/zhilian/client.pyZhilianClient/create_capi_client/create_cgate_client 导出)
  • spiderJobs/platforms/zhilian/sign.pyZhilianSign → crawler_core 桩) </read_first>
修改 `app/services/crawler/zhilian.py`
  1. 将 import 块(第 12-23 行)替换为:

    from spiderJobs.platforms.zhilian.api import (
        GetCompanyDetail,
        GetPositionDetail,
        SearchCompanyPositions,
        SearchPositions,
    )
    from spiderJobs.platforms.zhilian.client import (
        ZhilianClient,
        create_capi_client,
        create_cgate_client,
    )
    from spiderJobs.platforms.zhilian.sign import ZhilianSign
    
  2. ZhilianService 类末尾添加异步包装方法:

    # ── asyncio.to_thread 桥接ARCH-06────────
    
    async def async_get_job_detail(self, job_number: str) -> Optional[Dict]:
        import asyncio
        return await asyncio.to_thread(self.get_job_detail, job_number)
    
    async def async_get_company_detail(self, company_number: str) -> Optional[Dict]:
        import asyncio
        return await asyncio.to_thread(self.get_company_detail, company_number)
    
    async def async_get_company_jobs(
        self, company_number: str, page_index: int = 1, page_size: int = 30,
        work_city: Optional[int] = None,
    ) -> Optional[Dict]:
        import asyncio
        return await asyncio.to_thread(
            self.get_company_jobs_by_id, company_number, page_index, page_size, work_city
        )
    
    async def async_search_jobs(
        self, city_id: int = 801, page_size: int = 15, page_index: int = 1,
        job_level3_code: Optional[str] = None,
    ) -> List:
        import asyncio
        return await asyncio.to_thread(
            self.search_jobs, city_id, page_size, page_index, job_level3_code
        )
    

<acceptance_criteria>

  • grep "from spiderJobs.platforms.zhilian" app/services/crawler/zhilian.py 有输出
  • grep "app.services.crawler._zhilian" app/services/crawler/zhilian.py 无输出
  • grep "asyncio.to_thread" app/services/crawler/zhilian.py 有输出
  • pipenv run python -c "from app.services.crawler.zhilian import ZhilianService" 成功 </acceptance_criteria>

Verification

# 1. 验证三个 facade 模块 import 正确
pipenv run python -c "
from app.services.crawler.boss import BossService
from app.services.crawler.qcwy import QcwyService
from app.services.crawler.zhilian import ZhilianService
print('✅ 三个 facade 模块 import 成功')

# 验证无旧导入
import inspect, sys
for svc in [BossService, QcwyService, ZhilianService]:
    src = inspect.getsourcefile(svc)
    with open(src) as f:
        content = f.read()
    assert '_boss_' not in content and '_job51_' not in content and '_zhilian_' not in content, f'{src} 仍有旧导入!'
print('✅ 无旧导入残留')

# 验证 async 方法存在
assert hasattr(BossService, 'async_get_job_detail')
assert hasattr(QcwyService, 'async_get_company_info')
assert hasattr(ZhilianService, 'async_get_company_detail')
print('✅ asyncio 桥接方法存在')
"

# 2. 验证旧导入无残留
grep -rn "from app.services.crawler._" app/services/crawler/boss.py app/services/crawler/qcwy.py app/services/crawler/zhilian.py && echo "❌ 旧导入残留" || echo "✅ 无旧导入"

# 3. 全量回归
pipenv run python -m pytest tests/ -v --tb=short