- ARCH-04: job51 migrated to crawler_core (no old deps) - ARCH-05: zhilian migrated to crawler_core (no old deps) - 34 new mock tests (17 job51 + 17 zhilian) - Added _parse_zhilian_response custom parser for zhilian API format - Fixed POST Searcher _request() overrides for job51/zhilian - Full regression: 98 passed in 0.12s
7.5 KiB
Testing Patterns
Analysis Date: 2026-03-21
Test Framework
Runner:
unittest(Python standard library) — the only test framework currently in use- No pytest, no pytest-anyio, no httpx.AsyncClient test harness configured
- No test-specific dependencies in
Pipfile[dev-packages](section is empty)
Assertion Library:
unittest.TestCaseassertion methods:assertEqual,assertIsNone, etc.
Run Commands:
python -m unittest tests/test_company_storage.py # Run a specific test file
python -m unittest tests/test_company_jobs_sync.py # Run a specific test file
python -m unittest discover tests/ # Discover and run all tests
Coverage:
- No coverage tool configured (no
.coveragerc, nopytest-cov, no coverage requirement enforced)
Test File Organization
Location:
- Separate
tests/directory at project root:/Users/win/2025/AICoding/JobData/tests/ - NOT co-located with source modules
Naming:
- Pattern:
test_{module_name}.py - Examples:
test_company_storage.py,test_company_jobs_sync.py
Structure:
tests/
├── test_company_jobs_sync.py # Tests for app/services/company_jobs_sync.py
└── test_company_storage.py # Tests for app/services/company_storage.py
Test Structure
Suite Organization:
import unittest
from app.services.company_storage import extract_company_fields, normalize_company_id
class CompanyStorageTests(unittest.TestCase):
def test_normalize_qcwy_company_id(self):
self.assertEqual(normalize_company_id("qcwy", "co123"), "123")
self.assertEqual(normalize_company_id("qcwy", "123"), "123")
self.assertEqual(normalize_company_id("boss", "co123"), "co123")
def test_extract_boss_fields(self):
payload = { ... }
result = extract_company_fields("boss", payload, "boss-1")
self.assertEqual(result["source_company_id"], "boss-1")
if __name__ == "__main__":
unittest.main()
Patterns:
- One
TestCaseclass per test file - Test methods named
test_{what_is_being_tested}_{condition}(e.g.,test_extract_boss_fields,test_normalize_qcwy_company_id) - No setUp/tearDown methods — tests are stateless and self-contained
- No fixtures or factories; inline dicts serve as test data
- Tests verify pure functions only (no database, no HTTP, no async)
Mocking
Framework: None currently used (no unittest.mock, no pytest-mock)
Current approach: Tests only cover pure, synchronous functions that have no external dependencies. All existing tests call functions that:
- Accept plain dicts as input
- Return plain dicts or strings as output
- Have no database, network, or filesystem side effects
What to Mock (when tests are added):
clickhouse_connect.driver.AsyncClientfor repository testsBossToken.filter(...)ORM queries for service tests that check token loadinghttpx/requestsfor crawler service testsasyncio.to_thread(...)calls wrapping synchronous crawlers
What NOT to Mock:
- Pure data transformation functions (
extract_company_fields,normalize_company_id,_pick_first,_clean_text) - Enum definitions and schema validation logic
- URL pattern matching logic in
CleaningService
Fixtures and Factories
Test Data:
- Inline dicts constructed within each test method — no shared fixtures
- Platform-specific payloads mirror real API response shapes:
# Boss payload structure (from test_company_storage.py)
payload = {
"zpData": {
"brandComInfoVO": {
"encryptBrandId": "boss-1",
"brandName": "Boss公司",
"industryName": "互联网",
"stageName": "B轮",
},
"companyFullInfoVO": {
"name": "Boss公司",
"typeName": "民营",
"cityName": "上海",
},
}
}
# Qcwy payload structure
payload = {
"coinfo": {
"coid": "123",
"coname": "前程公司",
"cotype": "民营",
"cosize": "500-999人",
},
"financingStage": {"name": "未融资"},
}
# Zhilian payload structure
payload = {
"data": {
"companyBase": {
"companyNumber": "zl-1",
"companyName": "智联公司",
"companyTypeName": "上市公司",
}
}
}
Location: Inline within test methods; no shared fixture file or factory module.
Coverage
Requirements: None enforced — no coverage target, no CI gates
Current coverage estimation:
app/services/company_storage.py— partially covered (extract functions and normalize_company_id)app/services/company_jobs_sync.py— partially covered (_extract_boss_jobs, _extract_qcwy_jobs, _extract_zhilian_jobs)- All other modules — ZERO test coverage
View Coverage (manual):
pip install coverage
coverage run -m unittest discover tests/
coverage report
coverage html # generates htmlcov/index.html
Test Types
Unit Tests:
- The only type present
- Scope: pure synchronous functions with no external dependencies
- Files:
tests/test_company_storage.py,tests/test_company_jobs_sync.py
Integration Tests:
- Not present
- Needed for: API endpoint testing using
httpx.AsyncClientwithTestClientoranyio - Critical gaps:
app/api/v1/job/job.py,app/api/v1/analytics.py,app/api/v1/keyword/keyword.py
E2E Tests:
- Not present
- No E2E framework configured
Critical Gaps
Untested service logic (highest priority):
app/services/cleaning.py:CleaningService.clean_target_auto()— URL platform detection logic (zhipin.com vs 51job.com vs zhaopin.com)app/services/cleaning.py:CleaningService.process_single_item()— multi-branch dispatch based onclean_typeandplatformapp/services/ingest/service.py:IngestService.store_batch()— deduplication and batch insert flowapp/services/ingest/dedup.py— deduplication filter logicapp/controllers/keyword.py:KeywordController.get_available()— priority scheduling logic (partial > failed > fresh)app/repositories/clickhouse_repo.py:JobAnalyticsRepo— query building and result mapping
Untested URL parsing (medium priority):
app/services/cleaning.py:_process_boss_url()— regex extraction of job/company ID from Boss URLsapp/services/cleaning.py:_process_qcwy_url()— regex extraction for qcwy URLsapp/services/cleaning.py:_process_zhilian_url()— regex extraction for zhilian URLs
Untested API endpoints:
- All routes in
app/api/v1/— no integration tests exist for any HTTP endpoints
Recommended Test Setup (when adding tests)
For async tests, add to Pipfile [dev-packages]:
[dev-packages]
pytest = "*"
pytest-asyncio = "*"
anyio = {extras = ["trio"]}
httpx = "*" # already in packages, for AsyncClient
Async test pattern (recommended):
import pytest
from httpx import AsyncClient
from app import app # FastAPI app instance
@pytest.mark.asyncio
async def test_ingest_endpoint():
async with AsyncClient(app=app, base_url="http://test") as client:
response = await client.post("/api/v1/ingest/data/store", json={...})
assert response.status_code == 200
Synchronous unit test pattern (existing):
import unittest
from app.services.company_storage import normalize_company_id
class MyTests(unittest.TestCase):
def test_normalize(self):
self.assertEqual(normalize_company_id("qcwy", "co123"), "123")
if __name__ == "__main__":
unittest.main()
Testing analysis: 2026-03-21