---
phase: 01-shared-core
plan: "01"
subsystem: crawler_core
tags: [python, package, http-client, base-classes, retry, tls-fingerprint]
dependency_graph:
  requires: []
  provides: [crawler_core]
  affects: [spiderJobs, app/services/crawler]
tech_stack:
  added: [requests_go==1.0.9, tenacity>=8.0]
  patterns: [template-method, generic-dataclass, stdlib-logging]
key_files:
  created:
    - crawler_core/pyproject.toml
    - crawler_core/__init__.py
    - crawler_core/http_client.py
    - crawler_core/base.py
    - crawler_core/boss/__init__.py
    - crawler_core/qcwy/__init__.py
    - crawler_core/zhilian/__init__.py
  modified:
    - Pipfile
decisions:
  - "pyproject.toml uses where=['..'] so pip install -e ./crawler_core resolves from repo root"
  - "tenacity wait_random_exponential(min=10, max=30) preserves mandatory 10s anti-detection delay"
  - "stdlib logging.getLogger('crawler_core.*') used — loguru bridge deferred to Phase 5"
  - "Result[T] generic dataclass replaces untyped ApiResult throughout"
  - "4 template methods: _build_params/_parse (required), _build_headers/_check_blocked (optional)"
metrics:
  duration: "~15 minutes"
  completed: "2026-03-21T10:10:47Z"
  tasks_completed: 3
  files_created: 8
  files_modified: 1
---

# Phase 01 Plan 01: crawler_core Shared Package — Summary

**One-liner:** Created `crawler_core` installable Python package with TLS-fingerprinted HTTPClient (tenacity retry, min=10s wait), generic `Result[T]` dataclass, and `BaseFetcher`/`BaseSearcher` template-method base classes using stdlib logging only.

## What Was Created

| File | Lines | Purpose |
|------|-------|---------|
| `crawler_core/pyproject.toml` | 17 | Package metadata for `pip install -e ./crawler_core` |
| `crawler_core/__init__.py` | 19 | Public API: exports BaseFetcher, BaseSearcher, Result, HTTPClient, parse_response |
| `crawler_core/http_client.py` | 191 | HTTPClient with Chrome TLS fingerprint, tenacity retry (3x, min=10s), stdlib logging |
| `crawler_core/base.py` | 207 | Result[T] generic dataclass, parse_response, BaseFetcher, BaseSearcher |
| `crawler_core/boss/__init__.py` | 1 | Platform namespace marker |
| `crawler_core/qcwy/__init__.py` | 1 | Platform namespace marker |
| `crawler_core/zhilian/__init__.py` | 1 | Platform namespace marker |
| `Pipfile` | +5 lines | Added requests_go, tenacity to [packages]; pytest, pytest-cov, pytest-anyio to [dev-packages] |

## Interface Contracts

**Public exports** (`from crawler_core import ...`):
- `Result` — generic dataclass with fields: `success`, `status_code`, `data`, `list`, `count`, `is_end_page`, `error`
- `BaseFetcher` — template-method base: `_build_params()` and `_parse()` required; `_build_headers()` and `_check_blocked()` optional
- `BaseSearcher` — template-method base with `load_all()` paginator; same 4 template methods
- `HTTPClient` — `post(path, body, headers)` and `get(path, params, headers)` with retry
- `parse_response(http_code, raw)` — default response parser returning `Result[Any]`

**Result[T] fields:**
```python
success: bool
status_code: int
data: Optional[T] = None
list: list[T] = field(default_factory=list)
count: int = 0
is_end_page: bool = True
error: Optional[str] = None
```

**tenacity retry config (identical on post and get):**
- `stop_after_attempt(3)` — max 3 attempts
- `wait_random_exponential(multiplier=1, min=10, max=30)` — anti-detection 10-30s wait
- `retry_if_exception_type((ConnectionError, TimeoutError, OSError))`
- `reraise=True` — propagates after all retries exhausted

## Decisions Made

| Decision | Rationale |
|----------|-----------|
| `pyproject.toml` uses `where=[".."]` | Allows `pip install -e ./crawler_core` from repo root to find `crawler_core` package |
| tenacity `min=10` preserved | STACK.md mandates minimum 10s anti-detection delay between retries |
| stdlib `logging` only | D-03 decision: no loguru in crawler_core; loguru bridge deferred to Phase 5 |
| `Result[T]` replaces `ApiResult` | D-07: typed generic eliminates untyped `Any` in return values |
| 4 template methods per base class | D-06: `_build_params`/`_parse` required, `_build_headers`/`_check_blocked` optional |
| `http_client` attribute name (not `_http`) | Consistent public attribute naming for subclass access |

## Commits

| Hash | Task | Description |
|------|------|-------------|
| `4932177` | Task 1 | Create crawler_core package scaffold and pyproject.toml |
| `ceb359d` | Task 2 | Create crawler_core/http_client.py with tenacity retry and stdlib logging |
| `04d6303` | Task 3 | Create crawler_core/base.py with Result[T] and crawler_core/__init__.py |

## Deviations from Plan

None — plan executed exactly as written.

The only minor adjustment: removed a docstring reference to `ApiResult` in `Result`'s docstring (the plan's acceptance criteria explicitly requires `grep "ApiResult" crawler_core/base.py` to return empty).

## Known Stubs

None — all files are fully implemented. The platform namespace `__init__.py` files are intentional empty stubs (just docstrings); they are namespace markers designed to be populated in Phase 2/3.

## Self-Check: PASSED

All created files exist:
- [x] `crawler_core/pyproject.toml` — FOUND
- [x] `crawler_core/__init__.py` — FOUND
- [x] `crawler_core/http_client.py` — FOUND
- [x] `crawler_core/base.py` — FOUND
- [x] `crawler_core/boss/__init__.py` — FOUND
- [x] `crawler_core/qcwy/__init__.py` — FOUND
- [x] `crawler_core/zhilian/__init__.py` — FOUND

All commits exist:
- [x] `4932177` — FOUND
- [x] `ceb359d` — FOUND
- [x] `04d6303` — FOUND

Full import verification passed:
```
from crawler_core import BaseFetcher, BaseSearcher, Result, HTTPClient, parse_response
→ All verification checks passed
```

Cross-contamination checks:
- `grep -r "from spiderJobs" crawler_core/` → OK: no spiderJobs imports
- `grep -r "from app" crawler_core/` → OK: no app imports
- `grep -rn "import loguru|from loguru" crawler_core/` → OK: no loguru imports
- `grep -r "ApiResult" crawler_core/` → OK: ApiResult fully replaced by Result[T]