JobData/01-01-PLAN.md at 024c2bcd496079ac3414f5a10fdfe9de2d337ec7

zfc/JobData

Fork 0

win fe9a6d1403 docs(phase-1): create plans (2 plans, 2 waves) with checker revision

2026-03-21 17:53:13 +08:00

27 KiB

Raw Blame History

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves

phase

plan

type

wave

depends_on

files_modified

autonomous

requirements

must_haves

01-shared-core

execute

crawler_core/__init__.py

crawler_core/http_client.py

crawler_core/base.py

crawler_core/boss/__init__.py

crawler_core/qcwy/__init__.py

crawler_core/zhilian/__init__.py

crawler_core/pyproject.toml

Pipfile

true

ARCH-01

ARCH-02

QUAL-04

QUAL-05

truths

artifacts

key_links

`pip install -e ./crawler_core` succeeds without errors

`from crawler_core import BaseFetcher, BaseSearcher, Result, HTTPClient` imports cleanly

HTTPClient retries failed requests up to 3 times with exponential backoff (minimum 10s wait)

All HTTP errors are logged to stderr via stdlib logging.getLogger('crawler_core.*') in place; loguru bridge deferred to Phase 5

Old spiderJobs/ and jobs_spider/ code is NOT modified — feature flag isolation holds

path	provides	contains
crawler_core/pyproject.toml	Package metadata for editable install	name = "crawler_core"

path

provides

exports

crawler_core/__init__.py

Public API surface

BaseFetcher

BaseSearcher

Result

HTTPClient

path

provides

exports

crawler_core/http_client.py

TLS-fingerprinted HTTP client with retry and logging

HTTPClient

path

provides

exports

crawler_core/base.py

Template-method base classes with generic Result[T] return type

Result

BaseFetcher

BaseSearcher

parse_response

from	to	via
crawler_core/__init__.py	crawler_core/http_client.py	from crawler_core.http_client import HTTPClient

from	to	via
crawler_core/__init__.py	crawler_core/base.py	from crawler_core.base import BaseFetcher, BaseSearcher, Result

from	to	via
crawler_core/base.py	crawler_core/http_client.py	from crawler_core.http_client import HTTPClient

Create the crawler_core/ installable shared package with its core infrastructure: HTTP client with TLS fingerprint, retry logic, stdlib logging, and the BaseFetcher/BaseSearcher template-method base classes.

Purpose: This is the foundation everything else depends on. Once installed with pip install -e ./crawler_core, Phase 2/3 platform rewrites can import from it instead of copying code.

Output: A working Python package at crawler_core/ that installs cleanly and exposes BaseFetcher, BaseSearcher, Result[T], and HTTPClient.

<execution_context> @~~/.claude/get-shit-done/workflows/execute-plan.md @~~/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/phases/01-shared-core/1-CONTEXT.md

From spiderJobs/core/http_client.py:

class HTTPClient:
    def __init__(self, base_url, default_headers=None, proxy=None,
                 tunnel_proxy=None, proxy_pool=None, timeout=10): ...
    def _new_session(self) -> requests.Session: ...
    def _get_proxies(self) -> Optional[dict]: ...
    def _merge_headers(self, extra=None) -> dict: ...
    def post(self, path, body, headers=None) -> tuple[int, Any]: ...
    def get(self, path, params=None, headers=None) -> tuple[int, Any]: ...

From spiderJobs/core/base.py (reference only — DO NOT copy ApiResult; use Result[T] instead per D-07):

@dataclass
class ApiResult:   # <-- OLD: replaced by Result[T] in crawler_core/base.py
    success: bool
    status_code: int
    data: Any = None
    list: list[dict] = field(default_factory=list)
    count: int = 0
    is_end_page: bool = True
    error: Optional[str] = None

def parse_response(http_code: int, raw: Any) -> ApiResult: ...

class BaseFetcher:
    ENDPOINT: str = ""
    def __init__(self, http_client: HTTPClient): ...
    def _build_params(self) -> dict: raise NotImplementedError   # template method (required)
    def _parse(self, http_code, raw) -> ApiResult: ...
    def fetch(self) -> ApiResult: ...

class BaseSearcher:
    ENDPOINT: str = ""
    def __init__(self, page_size=15, http_client=None): ...
    def _build_params(self, page_index) -> dict: raise NotImplementedError   # template method (required)
    def _request(self, params) -> tuple[int, Any]: ...
    def _parse(self, http_code, raw) -> ApiResult: ...
    def search(self, page_index=1) -> ApiResult: ...
    def load_all(self, max_pages=10, on_page=None) -> list[dict]: ...

Task 1: Create crawler_core package scaffold and pyproject.toml - /Users/win/2025/AICoding/JobData/pyproject.toml (understand existing project config format) - /Users/win/2025/AICoding/JobData/Pipfile (understand dependency structure to add entries) - /Users/win/2025/AICoding/JobData/.planning/phases/01-shared-core/1-CONTEXT.md (decisions D-01 through D-04) crawler_core/pyproject.toml crawler_core/boss/__init__.py crawler_core/qcwy/__init__.py crawler_core/zhilian/__init__.py Pipfile Create the crawler_core/ directory structure and configure it as an installable Python package.

Step 1: Create crawler_core/pyproject.toml

[build-system]
requires = ["setuptools>=68"]
build-backend = "setuptools.backends.legacy:build"

[project]
name = "crawler_core"
version = "0.1.0"
description = "Shared crawler core — sign algorithms, HTTP client, base classes"
requires-python = ">=3.11"
dependencies = [
    "requests_go==1.0.9",
    "tenacity>=8.0",
]

[tool.setuptools.packages.find]
where = [".."]
include = ["crawler_core*"]

NOTE: where = [".."] means setuptools finds the crawler_core package by looking one level up from the pyproject.toml, which is at the repo root. This makes pip install -e ./crawler_core resolve correctly.

Step 2: Create platform namespace init.py files (empty)

Create these three files with a single docstring only — NO imports, they are just namespace markers:

crawler_core/boss/__init__.py: """Boss直聘 platform module."""
crawler_core/qcwy/__init__.py: """前程无忧 (51Job) platform module."""
crawler_core/zhilian/__init__.py: """智联招聘 platform module."""

Step 3: Add dependencies to Pipfile

In the [packages] section (before [dev-packages]), add these two lines (after playwright = "==1.57.0"):

requests_go = "==1.0.9"
tenacity = ">=8.0"

In the [dev-packages] section, add:

pytest = ">=8.0"
pytest-cov = ">=4.0"
pytest-anyio = "*"

What NOT to do:

Do NOT create a crawler_core/init.py in this task (Task 2 creates it)
Do NOT create crawler_core/http_client.py or crawler_core/base.py (Task 2 and 3)
Do NOT run pip install — just write the files python -c "import tomllib; d=tomllib.load(open('/Users/win/2025/AICoding/JobData/crawler_core/pyproject.toml','rb')); assert d['project']['name']=='crawler_core'; print('pyproject.toml OK')" && grep -q "requests_go" /Users/win/2025/AICoding/JobData/Pipfile && grep -q "tenacity" /Users/win/2025/AICoding/JobData/Pipfile && grep -q "pytest" /Users/win/2025/AICoding/JobData/Pipfile && echo "Pipfile OK" <acceptance_criteria>
- crawler_core/pyproject.toml exists and contains name = "crawler_core", requires-python = ">=3.11", requests_go==1.0.9, tenacity>=8.0
- crawler_core/boss/__init__.py, crawler_core/qcwy/__init__.py, crawler_core/zhilian/__init__.py all exist (can be empty docstrings)
- Pipfile [packages] section contains requests_go = "==1.0.9" and tenacity = ">=8.0"
- Pipfile [dev-packages] section contains pytest, pytest-cov, pytest-anyio
- grep -c "requests_go" /Users/win/2025/AICoding/JobData/Pipfile outputs 1 (no duplicates) </acceptance_criteria> Package directory structure created, pyproject.toml valid, dependencies declared in Pipfile.

Task 2: Create crawler_core/http_client.py with tenacity retry and logging - /Users/win/2025/AICoding/JobData/spiderJobs/core/http_client.py (source to port — read every line) - /Users/win/2025/AICoding/JobData/.planning/research/STACK.md (tenacity config section, TLS fingerprint section) - /Users/win/2025/AICoding/JobData/.planning/phases/01-shared-core/1-CONTEXT.md (D-03: no loguru, stdlib only; D-09: one HTTPClient class) crawler_core/http_client.py Port `spiderJobs/core/http_client.py` to `crawler_core/http_client.py` with two additions: tenacity retry and stdlib logging.

The file must be exactly crawler_core/http_client.py — no subdirectory.

Imports to use (CRITICAL — per D-03, only requests_go + stdlib + tenacity):

from __future__ import annotations

import logging
import random
from typing import Any, Optional

import requests_go as requests
from requests_go.tls_config import TLS_CHROME_LATEST
from tenacity import (
    retry,
    retry_if_exception_type,
    stop_after_attempt,
    wait_random_exponential,
)

Logging setup (module-level, before the class):

logger = logging.getLogger("crawler_core.http_client")

This uses stdlib logging — NOT loguru (per D-03, loguru is excluded from crawler_core). Callers (app/services/crawler/) can configure loguru to bridge stdlib if desired.

Class structure: Copy the full HTTPClient class from spiderJobs/core/http_client.py EXACTLY, then make these targeted changes:

Keep all existing methods unchanged: __init__, _new_session, _get_proxies, _merge_headers
Wrap post() with tenacity retry decorator:

@retry(
    stop=stop_after_attempt(3),
    wait=wait_random_exponential(multiplier=1, min=10, max=30),
    retry=retry_if_exception_type((ConnectionError, TimeoutError, OSError)),
    reraise=True,
    before_sleep=lambda retry_state: logger.warning(
        "HTTP retry attempt=%d url=%s error=%s",
        retry_state.attempt_number,
        retry_state.args[1] if retry_state.args else "unknown",
        retry_state.outcome.exception(),
    ),
)
def post(self, path: str, body: dict, headers: Optional[dict] = None) -> tuple[int, Any]:
    """发送 POST 请求"""
    # ... existing body unchanged ...
    logger.debug("POST %s%s", self.base_url, path)
    # existing try/finally logic unchanged

Wrap get() with the same tenacity retry decorator (identical decorator, same pattern):

@retry(
    stop=stop_after_attempt(3),
    wait=wait_random_exponential(multiplier=1, min=10, max=30),
    retry=retry_if_exception_type((ConnectionError, TimeoutError, OSError)),
    reraise=True,
    before_sleep=lambda retry_state: logger.warning(
        "HTTP retry attempt=%d error=%s",
        retry_state.attempt_number,
        retry_state.outcome.exception(),
    ),
)
def get(self, path: str, params: Optional[dict] = None, headers: Optional[dict] = None) -> tuple[int, Any]:
    """发送 GET 请求"""
    logger.debug("GET %s%s", self.base_url, path)
    # ... existing body unchanged ...

Add module docstring at the top:

"""
crawler_core.http_client — 通用 HTTP 客户端

基于 requests-go，自带 Chrome TLS 指纹伪装（TLS_CHROME_LATEST + random_ja3=True）。
支持代理 IP / 隧道代理 / 代理池轮换。
内置 tenacity 重试（3次，指数退避，最小10秒间隔）。
使用 stdlib logging — 上层可通过 logging.getLogger('crawler_core') 配置。

不依赖 loguru / FastAPI / Tortoise-ORM 等应用框架。
"""

Minimum 10 second wait is MANDATORY — min=10 in wait_random_exponential preserves the anti-detection delay requirement from STACK.md.

Do NOT:

Change the proxy logic (keep tunnel_proxy / proxy_pool / fixed proxy logic identical)
Import loguru
Import anything from spiderJobs.* or app.* cd /Users/win/2025/AICoding/JobData && python -c " import sys sys.path.insert(0, '.') from crawler_core.http_client import HTTPClient import inspect, logging src = inspect.getsource(HTTPClient.post) assert 'retry' in src or '@retry' in dir(HTTPClient.post), 'tenacity decorator missing on post' assert 'logger' in src or 'logging' in src, 'logging missing in post' print('HTTPClient OK') " <acceptance_criteria>
- crawler_core/http_client.py exists and is importable: from crawler_core.http_client import HTTPClient succeeds (after adding crawler_core to sys.path)
- File contains from tenacity import retry in imports
- File contains logger = logging.getLogger("crawler_core.http_client")
- File contains wait_random_exponential(multiplier=1, min=10, max=30) — exact values
- File contains stop_after_attempt(3)
- File does NOT contain import loguru or from loguru anywhere
- File does NOT contain from spiderJobs or from app anywhere
- File is under 200 lines (source is 155 lines + ~30 lines of additions)
- grep -c "from tenacity" /Users/win/2025/AICoding/JobData/crawler_core/http_client.py outputs 1
- grep "min=10" /Users/win/2025/AICoding/JobData/crawler_core/http_client.py has output </acceptance_criteria> HTTPClient ported with retry (3 attempts, min=10s wait) and stdlib logging. No loguru. No spiderJobs imports.

Task 3: Create crawler_core/base.py with Result[T] and 4 template methods, plus crawler_core/__init__.py - /Users/win/2025/AICoding/JobData/spiderJobs/core/base.py (source to port — read every line) - /Users/win/2025/AICoding/JobData/.planning/research/ARCHITECTURE.md (abstract base class hierarchy section) - /Users/win/2025/AICoding/JobData/.planning/phases/01-shared-core/1-CONTEXT.md (D-05, D-06, D-07: base class interface decisions) crawler_core/base.py crawler_core/__init__.py Create `crawler_core/base.py` with the new `Result[T]` generic dataclass (replacing ApiResult per D-07) and four template methods (per D-06), then create the public `__init__.py`.

crawler_core/base.py:

Add module docstring at the top:

"""
crawler_core.base — 通用基类与数据结构

提供所有招聘平台共用的: Result, BaseFetcher, BaseSearcher, parse_response
不依赖任何平台特定代码。
"""

Step 1: Generic Result[T] dataclass (replaces ApiResult — per D-07)

from __future__ import annotations

import logging
from dataclasses import dataclass, field
from typing import Any, Generic, Optional, TypeVar

from crawler_core.http_client import HTTPClient

T = TypeVar("T")

_logger = logging.getLogger("crawler_core.base")


@dataclass
class Result(Generic[T]):
    """Typed result wrapper returned by all BaseFetcher and BaseSearcher methods.

    Replaces the untyped ApiResult. Callers annotate as Result[MyJobModel] etc.
    """
    success: bool
    status_code: int
    data: Optional[T] = None
    list: list[T] = field(default_factory=list)
    count: int = 0
    is_end_page: bool = True
    error: Optional[str] = None

Step 2: parse_response — adapt from spiderJobs/core/base.py but return Result[Any]

Port parse_response(http_code, raw) from spiderJobs/core/base.py verbatim, changing only the return type annotation from ApiResult to Result[Any].

Step 3: BaseFetcher — 4 template methods (per D-06)

class BaseFetcher:
    """Template-method base class for single-item fetchers.

    Required overrides: _build_params(), _parse()
    Optional overrides: _build_headers(), _check_blocked()
    """
    ENDPOINT: str = ""

    def __init__(self, http_client: HTTPClient) -> None:
        self.http_client = http_client

    # --- Required template methods ---

    def _build_params(self) -> dict:
        """Build query/body parameters for the request. MUST be overridden."""
        raise NotImplementedError(f"{type(self).__name__} must implement _build_params()")

    def _parse(self, http_code: int, raw: Any) -> Result:
        """Parse the HTTP response into a Result. MUST be overridden."""
        raise NotImplementedError(f"{type(self).__name__} must implement _parse()")

    # --- Optional template methods ---

    def _build_headers(self) -> dict:
        """Build extra request headers. Override to add platform-specific headers.

        Default: returns {} (no extra headers beyond HTTPClient defaults).
        """
        return {}

    def _check_blocked(self, status_code: int, body: str) -> bool:
        """Detect platform-specific anti-crawl blocks.

        Override to inspect response body/status for block signals.
        Default: returns False (assume not blocked).
        """
        return False

    # --- Orchestration ---

    def fetch(self) -> Result:
        """Execute the fetch: build params → request → check blocked → parse."""
        params = self._build_params()
        extra_headers = self._build_headers()
        http_code, raw = self.http_client.get(
            self.ENDPOINT, params=params, headers=extra_headers or None
        )
        raw_str = str(raw) if not isinstance(raw, str) else raw
        if self._check_blocked(http_code, raw_str):
            return Result(success=False, status_code=http_code, error="blocked")
        return self._parse(http_code, raw)

Step 4: BaseSearcher — 4 template methods (per D-06)

class BaseSearcher:
    """Template-method base class for paginated list searchers.

    Required overrides: _build_params(), _parse()
    Optional overrides: _build_headers(), _check_blocked()
    """
    ENDPOINT: str = ""

    def __init__(self, page_size: int = 15, http_client: Optional[HTTPClient] = None) -> None:
        self.page_size = page_size
        self.http_client = http_client

    # --- Required template methods ---

    def _build_params(self, page_index: int) -> dict:
        """Build pagination query params. MUST be overridden."""
        raise NotImplementedError(f"{type(self).__name__} must implement _build_params()")

    def _parse(self, http_code: int, raw: Any) -> Result:
        """Parse the HTTP response into a Result. MUST be overridden."""
        raise NotImplementedError(f"{type(self).__name__} must implement _parse()")

    # --- Optional template methods ---

    def _build_headers(self) -> dict:
        """Build extra request headers. Override for platform-specific headers.

        Default: returns {} (no extra headers beyond HTTPClient defaults).
        """
        return {}

    def _check_blocked(self, status_code: int, body: str) -> bool:
        """Detect platform-specific anti-crawl blocks.

        Override to inspect response body/status for block signals.
        Default: returns False (assume not blocked).
        """
        return False

    # --- Orchestration ---

    def _request(self, params: dict) -> tuple[int, Any]:
        """Execute a single HTTP request. Uses _build_headers() for extra headers."""
        extra_headers = self._build_headers()
        return self.http_client.get(
            self.ENDPOINT, params=params, headers=extra_headers or None
        )

    def search(self, page_index: int = 1) -> Result:
        """Fetch a single page: build params → request → check blocked → parse."""
        params = self._build_params(page_index)
        http_code, raw = self._request(params)
        raw_str = str(raw) if not isinstance(raw, str) else raw
        if self._check_blocked(http_code, raw_str):
            return Result(success=False, status_code=http_code, error="blocked")
        return self._parse(http_code, raw)

    def load_all(self, max_pages: int = 10, on_page=None) -> list:
        """Iterate pages until is_end_page=True or max_pages reached."""
        all_items: list = []
        for page_index in range(1, max_pages + 1):
            result = self.search(page_index)
            if not result.success:
                _logger.warning("第 %d 页失败: %s", page_index, result.error)
                break
            all_items.extend(result.list)
            if on_page:
                on_page(page_index, result)
            if result.is_end_page:
                break
        return all_items

crawler_core/init.py:

"""
crawler_core — 招聘爬虫共享核心包

安装方式: pip install -e ./crawler_core
使用方式: from crawler_core import BaseFetcher, BaseSearcher, Result, HTTPClient
"""

from crawler_core.base import Result, BaseFetcher, BaseSearcher, parse_response
from crawler_core.http_client import HTTPClient

__all__ = [
    "Result",
    "BaseFetcher",
    "BaseSearcher",
    "HTTPClient",
    "parse_response",
]

__version__ = "0.1.0"

Do NOT:

Keep the old ApiResult name anywhere in crawler_core (it's fully replaced by Result[T])
Import from spiderJobs.* or app.*
Import loguru
Add any platform-specific code to base.py or init.py cd /Users/win/2025/AICoding/JobData && python -c " import sys sys.path.insert(0, '.') from crawler_core import BaseFetcher, BaseSearcher, Result, HTTPClient, parse_response import dataclasses, typing fields = {f.name for f in dataclasses.fields(Result)} assert fields == {'success','status_code','data','list','count','is_end_page','error'}, f'Result fields wrong: {fields}' assert hasattr(BaseFetcher, 'fetch'), 'BaseFetcher.fetch missing' assert hasattr(BaseFetcher, '_build_headers'), 'BaseFetcher._build_headers missing' assert hasattr(BaseFetcher, '_check_blocked'), 'BaseFetcher._check_blocked missing' assert BaseFetcher._build_headers(object()) == {}, '_build_headers default must return {}' assert BaseFetcher._check_blocked(object(), 200, '') == False, '_check_blocked default must return False' assert hasattr(BaseSearcher, 'load_all'), 'BaseSearcher.load_all missing' assert hasattr(BaseSearcher, '_build_headers'), 'BaseSearcher._build_headers missing' assert hasattr(BaseSearcher, '_check_blocked'), 'BaseSearcher._check_blocked missing' print('All imports OK, Result fields OK, 4 template methods verified') " <acceptance_criteria>
- from crawler_core import BaseFetcher, BaseSearcher, Result, HTTPClient succeeds (with repo root on sys.path)
- crawler_core/base.py defines Result as a generic dataclass using TypeVar and Generic[T]
- crawler_core/base.py does NOT contain ApiResult anywhere: grep "ApiResult" crawler_core/base.py returns empty
- crawler_core/base.py does NOT contain from spiderJobs anywhere: grep "from spiderJobs" crawler_core/base.py returns empty
- crawler_core/base.py does NOT contain print( anywhere: grep "print(" crawler_core/base.py returns empty
- BaseFetcher._build_headers(self) exists and returns {} by default
- BaseFetcher._check_blocked(self, status_code, body) exists and returns False by default
- BaseFetcher.fetch() calls _build_headers() and _check_blocked() in its implementation
- BaseSearcher._build_headers(self) exists and returns {} by default
- BaseSearcher._check_blocked(self, status_code, body) exists and returns False by default
- BaseSearcher.search() calls _check_blocked() in its implementation
- crawler_core/__init__.py exports Result (not ApiResult) in __all__
- crawler_core/__init__.py contains __version__ = "0.1.0"
- Result dataclass has exactly 7 fields: success, status_code, data, list, count, is_end_page, error
- BaseFetcher._build_params raises NotImplementedError
- BaseSearcher._build_params raises NotImplementedError </acceptance_criteria> base.py uses Result[T] generic (no ApiResult), 4 template methods wired into fetch()/search(), init.py exports clean public API.

Run the full import chain to verify the package works end-to-end before moving to Plan 02:

cd /Users/win/2025/AICoding/JobData
python -c "
import sys
sys.path.insert(0, '.')
from crawler_core import BaseFetcher, BaseSearcher, Result, HTTPClient, parse_response

# Verify Result structure
r = Result(success=True, status_code=200)
assert r.success and r.list == [] and r.error is None

# Verify BaseFetcher requires _build_params
class TestFetcher(BaseFetcher):
    ENDPOINT = '/test'
    def _build_params(self):
        return {'q': 'test'}
    def _parse(self, http_code, raw):
        return Result(success=True, status_code=http_code)

# Verify default template method overrides
tf = TestFetcher(http_client=None)
assert tf._build_headers() == {}, '_build_headers default failed'
assert tf._check_blocked(200, '') == False, '_check_blocked default failed'

# Verify parse_response with dict input
result = parse_response(200, {'statusCode': 200, 'data': {'list': [{'id': 1}], 'count': 1, 'isEndPage': False}})
assert result.success
assert result.list == [{'id': 1}]
assert not result.is_end_page

print('All verification checks passed')
"

Also confirm no cross-contamination:

grep -r "from spiderJobs" /Users/win/2025/AICoding/JobData/crawler_core/ && echo "FAIL: found spiderJobs import" || echo "OK: no spiderJobs imports"
grep -r "from app" /Users/win/2025/AICoding/JobData/crawler_core/ && echo "FAIL: found app import" || echo "OK: no app imports"
grep -r "loguru" /Users/win/2025/AICoding/JobData/crawler_core/ && echo "FAIL: found loguru" || echo "OK: no loguru"
grep -r "ApiResult" /Users/win/2025/AICoding/JobData/crawler_core/ && echo "FAIL: ApiResult still present" || echo "OK: ApiResult fully replaced by Result[T]"

<success_criteria>

python -c "from crawler_core import BaseFetcher, BaseSearcher, Result, HTTPClient" exits 0 (with repo root on sys.path)
crawler_core/pyproject.toml passes python -c "import tomllib; tomllib.load(open('crawler_core/pyproject.toml','rb'))"
grep "requests_go" Pipfile has output — dependency declared
grep "tenacity" Pipfile has output — dependency declared
grep "pytest" Pipfile has output — dev dependency declared
grep -r "from spiderJobs" crawler_core/ has NO output
grep -r "loguru" crawler_core/ has NO output
grep "min=10" crawler_core/http_client.py has output — anti-detection delay preserved
grep -r "ApiResult" crawler_core/ has NO output — fully replaced by Result[T]
BaseFetcher._build_headers and BaseFetcher._check_blocked exist and are wired into fetch()
spiderJobs/ and jobs_spider/ directories are UNCHANGED (no files modified) </success_criteria>

After completion, create `.planning/phases/01-shared-core/01-01-SUMMARY.md` with: - What was created (file list with line counts) - Key decisions made (pyproject.toml structure, tenacity config values, logging approach) - Interface contracts (the public exports from crawler_core/__init__.py, Result[T] field list, 4 template method signatures) - Any deviations from this plan and why

27 KiB Raw Blame History Unescape Escape

27 KiB

Raw Blame History