win 00a727519f docs(phase-3): complete execution — 2/2 plans, 98 tests passing
- ARCH-04: job51 migrated to crawler_core (no old deps)
- ARCH-05: zhilian migrated to crawler_core (no old deps)
- 34 new mock tests (17 job51 + 17 zhilian)
- Added _parse_zhilian_response custom parser for zhilian API format
- Fixed POST Searcher _request() overrides for job51/zhilian
- Full regression: 98 passed in 0.12s
2026-03-21 19:19:17 +08:00

7.5 KiB

Testing Patterns

Analysis Date: 2026-03-21

Test Framework

Runner:

  • unittest (Python standard library) — the only test framework currently in use
  • No pytest, no pytest-anyio, no httpx.AsyncClient test harness configured
  • No test-specific dependencies in Pipfile [dev-packages] (section is empty)

Assertion Library:

  • unittest.TestCase assertion methods: assertEqual, assertIsNone, etc.

Run Commands:

python -m unittest tests/test_company_storage.py     # Run a specific test file
python -m unittest tests/test_company_jobs_sync.py   # Run a specific test file
python -m unittest discover tests/                   # Discover and run all tests

Coverage:

  • No coverage tool configured (no .coveragerc, no pytest-cov, no coverage requirement enforced)

Test File Organization

Location:

  • Separate tests/ directory at project root: /Users/win/2025/AICoding/JobData/tests/
  • NOT co-located with source modules

Naming:

  • Pattern: test_{module_name}.py
  • Examples: test_company_storage.py, test_company_jobs_sync.py

Structure:

tests/
├── test_company_jobs_sync.py   # Tests for app/services/company_jobs_sync.py
└── test_company_storage.py     # Tests for app/services/company_storage.py

Test Structure

Suite Organization:

import unittest

from app.services.company_storage import extract_company_fields, normalize_company_id


class CompanyStorageTests(unittest.TestCase):
    def test_normalize_qcwy_company_id(self):
        self.assertEqual(normalize_company_id("qcwy", "co123"), "123")
        self.assertEqual(normalize_company_id("qcwy", "123"), "123")
        self.assertEqual(normalize_company_id("boss", "co123"), "co123")

    def test_extract_boss_fields(self):
        payload = { ... }
        result = extract_company_fields("boss", payload, "boss-1")
        self.assertEqual(result["source_company_id"], "boss-1")


if __name__ == "__main__":
    unittest.main()

Patterns:

  • One TestCase class per test file
  • Test methods named test_{what_is_being_tested}_{condition} (e.g., test_extract_boss_fields, test_normalize_qcwy_company_id)
  • No setUp/tearDown methods — tests are stateless and self-contained
  • No fixtures or factories; inline dicts serve as test data
  • Tests verify pure functions only (no database, no HTTP, no async)

Mocking

Framework: None currently used (no unittest.mock, no pytest-mock)

Current approach: Tests only cover pure, synchronous functions that have no external dependencies. All existing tests call functions that:

  • Accept plain dicts as input
  • Return plain dicts or strings as output
  • Have no database, network, or filesystem side effects

What to Mock (when tests are added):

  • clickhouse_connect.driver.AsyncClient for repository tests
  • BossToken.filter(...) ORM queries for service tests that check token loading
  • httpx / requests for crawler service tests
  • asyncio.to_thread(...) calls wrapping synchronous crawlers

What NOT to Mock:

  • Pure data transformation functions (extract_company_fields, normalize_company_id, _pick_first, _clean_text)
  • Enum definitions and schema validation logic
  • URL pattern matching logic in CleaningService

Fixtures and Factories

Test Data:

  • Inline dicts constructed within each test method — no shared fixtures
  • Platform-specific payloads mirror real API response shapes:
# Boss payload structure (from test_company_storage.py)
payload = {
    "zpData": {
        "brandComInfoVO": {
            "encryptBrandId": "boss-1",
            "brandName": "Boss公司",
            "industryName": "互联网",
            "stageName": "B轮",
        },
        "companyFullInfoVO": {
            "name": "Boss公司",
            "typeName": "民营",
            "cityName": "上海",
        },
    }
}

# Qcwy payload structure
payload = {
    "coinfo": {
        "coid": "123",
        "coname": "前程公司",
        "cotype": "民营",
        "cosize": "500-999人",
    },
    "financingStage": {"name": "未融资"},
}

# Zhilian payload structure
payload = {
    "data": {
        "companyBase": {
            "companyNumber": "zl-1",
            "companyName": "智联公司",
            "companyTypeName": "上市公司",
        }
    }
}

Location: Inline within test methods; no shared fixture file or factory module.

Coverage

Requirements: None enforced — no coverage target, no CI gates

Current coverage estimation:

  • app/services/company_storage.py — partially covered (extract functions and normalize_company_id)
  • app/services/company_jobs_sync.py — partially covered (_extract_boss_jobs, _extract_qcwy_jobs, _extract_zhilian_jobs)
  • All other modules — ZERO test coverage

View Coverage (manual):

pip install coverage
coverage run -m unittest discover tests/
coverage report
coverage html  # generates htmlcov/index.html

Test Types

Unit Tests:

  • The only type present
  • Scope: pure synchronous functions with no external dependencies
  • Files: tests/test_company_storage.py, tests/test_company_jobs_sync.py

Integration Tests:

  • Not present
  • Needed for: API endpoint testing using httpx.AsyncClient with TestClient or anyio
  • Critical gaps: app/api/v1/job/job.py, app/api/v1/analytics.py, app/api/v1/keyword/keyword.py

E2E Tests:

  • Not present
  • No E2E framework configured

Critical Gaps

Untested service logic (highest priority):

  1. app/services/cleaning.py:CleaningService.clean_target_auto() — URL platform detection logic (zhipin.com vs 51job.com vs zhaopin.com)
  2. app/services/cleaning.py:CleaningService.process_single_item() — multi-branch dispatch based on clean_type and platform
  3. app/services/ingest/service.py:IngestService.store_batch() — deduplication and batch insert flow
  4. app/services/ingest/dedup.py — deduplication filter logic
  5. app/controllers/keyword.py:KeywordController.get_available() — priority scheduling logic (partial > failed > fresh)
  6. app/repositories/clickhouse_repo.py:JobAnalyticsRepo — query building and result mapping

Untested URL parsing (medium priority):

  1. app/services/cleaning.py:_process_boss_url() — regex extraction of job/company ID from Boss URLs
  2. app/services/cleaning.py:_process_qcwy_url() — regex extraction for qcwy URLs
  3. app/services/cleaning.py:_process_zhilian_url() — regex extraction for zhilian URLs

Untested API endpoints:

  1. All routes in app/api/v1/ — no integration tests exist for any HTTP endpoints

For async tests, add to Pipfile [dev-packages]:

[dev-packages]
pytest = "*"
pytest-asyncio = "*"
anyio = {extras = ["trio"]}
httpx = "*"  # already in packages, for AsyncClient

Async test pattern (recommended):

import pytest
from httpx import AsyncClient
from app import app  # FastAPI app instance

@pytest.mark.asyncio
async def test_ingest_endpoint():
    async with AsyncClient(app=app, base_url="http://test") as client:
        response = await client.post("/api/v1/ingest/data/store", json={...})
    assert response.status_code == 200

Synchronous unit test pattern (existing):

import unittest
from app.services.company_storage import normalize_company_id

class MyTests(unittest.TestCase):
    def test_normalize(self):
        self.assertEqual(normalize_company_id("qcwy", "co123"), "123")

if __name__ == "__main__":
    unittest.main()

Testing analysis: 2026-03-21