win 00a727519f docs(phase-3): complete execution — 2/2 plans, 98 tests passing
- ARCH-04: job51 migrated to crawler_core (no old deps)
- ARCH-05: zhilian migrated to crawler_core (no old deps)
- 34 new mock tests (17 job51 + 17 zhilian)
- Added _parse_zhilian_response custom parser for zhilian API format
- Fixed POST Searcher _request() overrides for job51/zhilian
- Full regression: 98 passed in 0.12s
2026-03-21 19:19:17 +08:00

237 lines
7.5 KiB
Markdown

# Testing Patterns
**Analysis Date:** 2026-03-21
## Test Framework
**Runner:**
- `unittest` (Python standard library) — the only test framework currently in use
- No pytest, no pytest-anyio, no httpx.AsyncClient test harness configured
- No test-specific dependencies in `Pipfile` `[dev-packages]` (section is empty)
**Assertion Library:**
- `unittest.TestCase` assertion methods: `assertEqual`, `assertIsNone`, etc.
**Run Commands:**
```bash
python -m unittest tests/test_company_storage.py # Run a specific test file
python -m unittest tests/test_company_jobs_sync.py # Run a specific test file
python -m unittest discover tests/ # Discover and run all tests
```
**Coverage:**
- No coverage tool configured (no `.coveragerc`, no `pytest-cov`, no coverage requirement enforced)
## Test File Organization
**Location:**
- Separate `tests/` directory at project root: `/Users/win/2025/AICoding/JobData/tests/`
- NOT co-located with source modules
**Naming:**
- Pattern: `test_{module_name}.py`
- Examples: `test_company_storage.py`, `test_company_jobs_sync.py`
**Structure:**
```
tests/
├── test_company_jobs_sync.py # Tests for app/services/company_jobs_sync.py
└── test_company_storage.py # Tests for app/services/company_storage.py
```
## Test Structure
**Suite Organization:**
```python
import unittest
from app.services.company_storage import extract_company_fields, normalize_company_id
class CompanyStorageTests(unittest.TestCase):
def test_normalize_qcwy_company_id(self):
self.assertEqual(normalize_company_id("qcwy", "co123"), "123")
self.assertEqual(normalize_company_id("qcwy", "123"), "123")
self.assertEqual(normalize_company_id("boss", "co123"), "co123")
def test_extract_boss_fields(self):
payload = { ... }
result = extract_company_fields("boss", payload, "boss-1")
self.assertEqual(result["source_company_id"], "boss-1")
if __name__ == "__main__":
unittest.main()
```
**Patterns:**
- One `TestCase` class per test file
- Test methods named `test_{what_is_being_tested}_{condition}` (e.g., `test_extract_boss_fields`, `test_normalize_qcwy_company_id`)
- No setUp/tearDown methods — tests are stateless and self-contained
- No fixtures or factories; inline dicts serve as test data
- Tests verify pure functions only (no database, no HTTP, no async)
## Mocking
**Framework:** None currently used (no `unittest.mock`, no `pytest-mock`)
**Current approach:** Tests only cover pure, synchronous functions that have no external dependencies. All existing tests call functions that:
- Accept plain dicts as input
- Return plain dicts or strings as output
- Have no database, network, or filesystem side effects
**What to Mock (when tests are added):**
- `clickhouse_connect.driver.AsyncClient` for repository tests
- `BossToken.filter(...)` ORM queries for service tests that check token loading
- `httpx` / `requests` for crawler service tests
- `asyncio.to_thread(...)` calls wrapping synchronous crawlers
**What NOT to Mock:**
- Pure data transformation functions (`extract_company_fields`, `normalize_company_id`, `_pick_first`, `_clean_text`)
- Enum definitions and schema validation logic
- URL pattern matching logic in `CleaningService`
## Fixtures and Factories
**Test Data:**
- Inline dicts constructed within each test method — no shared fixtures
- Platform-specific payloads mirror real API response shapes:
```python
# Boss payload structure (from test_company_storage.py)
payload = {
"zpData": {
"brandComInfoVO": {
"encryptBrandId": "boss-1",
"brandName": "Boss公司",
"industryName": "互联网",
"stageName": "B轮",
},
"companyFullInfoVO": {
"name": "Boss公司",
"typeName": "民营",
"cityName": "上海",
},
}
}
# Qcwy payload structure
payload = {
"coinfo": {
"coid": "123",
"coname": "前程公司",
"cotype": "民营",
"cosize": "500-999人",
},
"financingStage": {"name": "未融资"},
}
# Zhilian payload structure
payload = {
"data": {
"companyBase": {
"companyNumber": "zl-1",
"companyName": "智联公司",
"companyTypeName": "上市公司",
}
}
}
```
**Location:** Inline within test methods; no shared fixture file or factory module.
## Coverage
**Requirements:** None enforced — no coverage target, no CI gates
**Current coverage estimation:**
- `app/services/company_storage.py` — partially covered (extract functions and normalize_company_id)
- `app/services/company_jobs_sync.py` — partially covered (_extract_boss_jobs, _extract_qcwy_jobs, _extract_zhilian_jobs)
- All other modules — ZERO test coverage
**View Coverage (manual):**
```bash
pip install coverage
coverage run -m unittest discover tests/
coverage report
coverage html # generates htmlcov/index.html
```
## Test Types
**Unit Tests:**
- The only type present
- Scope: pure synchronous functions with no external dependencies
- Files: `tests/test_company_storage.py`, `tests/test_company_jobs_sync.py`
**Integration Tests:**
- Not present
- Needed for: API endpoint testing using `httpx.AsyncClient` with `TestClient` or `anyio`
- Critical gaps: `app/api/v1/job/job.py`, `app/api/v1/analytics.py`, `app/api/v1/keyword/keyword.py`
**E2E Tests:**
- Not present
- No E2E framework configured
## Critical Gaps
**Untested service logic (highest priority):**
1. `app/services/cleaning.py:CleaningService.clean_target_auto()` — URL platform detection logic (zhipin.com vs 51job.com vs zhaopin.com)
2. `app/services/cleaning.py:CleaningService.process_single_item()` — multi-branch dispatch based on `clean_type` and `platform`
3. `app/services/ingest/service.py:IngestService.store_batch()` — deduplication and batch insert flow
4. `app/services/ingest/dedup.py` — deduplication filter logic
5. `app/controllers/keyword.py:KeywordController.get_available()` — priority scheduling logic (partial > failed > fresh)
6. `app/repositories/clickhouse_repo.py:JobAnalyticsRepo` — query building and result mapping
**Untested URL parsing (medium priority):**
7. `app/services/cleaning.py:_process_boss_url()` — regex extraction of job/company ID from Boss URLs
8. `app/services/cleaning.py:_process_qcwy_url()` — regex extraction for qcwy URLs
9. `app/services/cleaning.py:_process_zhilian_url()` — regex extraction for zhilian URLs
**Untested API endpoints:**
10. All routes in `app/api/v1/` — no integration tests exist for any HTTP endpoints
## Recommended Test Setup (when adding tests)
**For async tests, add to `Pipfile [dev-packages]`:**
```toml
[dev-packages]
pytest = "*"
pytest-asyncio = "*"
anyio = {extras = ["trio"]}
httpx = "*" # already in packages, for AsyncClient
```
**Async test pattern (recommended):**
```python
import pytest
from httpx import AsyncClient
from app import app # FastAPI app instance
@pytest.mark.asyncio
async def test_ingest_endpoint():
async with AsyncClient(app=app, base_url="http://test") as client:
response = await client.post("/api/v1/ingest/data/store", json={...})
assert response.status_code == 200
```
**Synchronous unit test pattern (existing):**
```python
import unittest
from app.services.company_storage import normalize_company_id
class MyTests(unittest.TestCase):
def test_normalize(self):
self.assertEqual(normalize_company_id("qcwy", "co123"), "123")
if __name__ == "__main__":
unittest.main()
```
---
*Testing analysis: 2026-03-21*