JobData/.planning/codebase/CONVENTIONS.md
win 00a727519f docs(phase-3): complete execution — 2/2 plans, 98 tests passing
- ARCH-04: job51 migrated to crawler_core (no old deps)
- ARCH-05: zhilian migrated to crawler_core (no old deps)
- 34 new mock tests (17 job51 + 17 zhilian)
- Added _parse_zhilian_response custom parser for zhilian API format
- Fixed POST Searcher _request() overrides for job51/zhilian
- Full regression: 98 passed in 0.12s
2026-03-21 19:19:17 +08:00

9.0 KiB

Coding Conventions

Analysis Date: 2026-03-21

Naming Patterns

Files:

  • Snake_case for all Python files: company_storage.py, company_cleaner.py, clickhouse_repo.py
  • Private/internal modules prefixed with underscore: _base.py, _boss_api.py, _boss_client.py, _boss_sign.py, _http_client.py
  • Platform-named service files: boss.py, qcwy.py, zhilian.py under app/services/crawler/
  • Router files named after domain: keyword.py, analytics.py, cleaning.py

Classes:

  • PascalCase throughout: CleaningService, KeywordController, ClickHouseBaseRepo, JobAnalyticsRepo
  • Services: {Domain}ServiceBossService, QcwyService, ZhilianService, IngestService, AnalyticsService
  • Controllers: {Domain}ControllerKeywordController
  • Repos: {Domain}Repo or {Domain}BaseRepoClickHouseBaseRepo, JobAnalyticsRepo
  • Models (Tortoise ORM): {Platform}{Entity}BossKeyword, QcwyCompany, ZhilianCompany
  • Schemas (Pydantic): {Entity}Base, {Entity}Create, {Entity}Update, {Entity}Out — see app/schemas/keyword.py

Functions and Methods:

  • Snake_case for all functions and methods: get_available, report_page_progress, store_batch, build_insert_row
  • Private helpers prefixed with underscore: _apply_proxy, _ensure_boss_token_loaded, _pick_first, _nested_get, _clean_text, _model_for_source
  • Async dependency factories follow pattern get_{service/controller}(): get_ingest_service, get_analytics_service, get_keyword_controller

Variables:

  • Snake_case: data_list, platform_type, check_duplicate, page_size
  • Module-level constants: UPPER_SNAKE_CASE — COMPANY_SOURCES, QUEUE_TERMINAL_STATUSES
  • Class-level constants: UPPER_SNAKE_CASE prefixed __TOKEN_REFRESH_INTERVAL = 3600

Types and Enums:

  • Enums use PascalCase class name, UPPER_SNAKE_CASE values: PlatformType.BOSS, ChannelType.MINI, DataType.JOB
  • Enum values are lowercase strings matching URL slugs: "boss", "mini", "job" — see app/schemas/ingest.py
  • Enums inherit from (str, Enum) enabling direct string comparison

Code Style

Formatting:

  • Tool: black v24.10.0
  • Line length: 120 characters (set in pyproject.toml [tool.black] and [tool.ruff])
  • Target Python versions: 3.10, 3.11 (black), 3.13 (Pipfile)

Linting:

  • Tool: ruff v0.9.1 (configured in pyproject.toml)
  • Ignored rules: F403 (star imports), F405 (may be undefined from star import)
  • Star imports from internal modules are allowed (used in app/models/__init__.py, app/services/ingest/__init__.py)

Import Sorting:

  • Tool: isort v5.13.2
  • No explicit isort config found; follows default ordering

Import Organization

Order:

  1. Standard library (from __future__, os, re, typing, datetime, json)
  2. Third-party (fastapi, pydantic, tortoise, loguru, clickhouse_connect)
  3. Internal app imports (from app.core., from app.models., from app.services., from app.schemas.)

Example from app/api/v1/analytics.py:

from typing import Optional
from datetime import datetime, date, timezone
from zoneinfo import ZoneInfo

from fastapi import APIRouter, Depends, Query
from app.core.clickhouse import clickhouse_manager
from app.services.analytics_service import AnalyticsService
from app.schemas.analytics import JobStatisticsResponse

Path Aliases:

  • None; all imports use full app. prefix paths
  • from app.log import logger is the canonical loguru import path

Star Imports:

  • Used only in __init__.py re-export files: from .admin import * in app/models/__init__.py
  • # noqa: F401, F403 comments suppress lint warnings for intentional star imports

Error Handling

Patterns:

  • Services return Dict[str, Any] result objects with "success", "code", "message" fields instead of raising exceptions to callers
  • Controllers return dict with "code": 200/400/404 and "message" for all outcomes
  • API route handlers do NOT use try/except — they rely on services returning structured results
  • Service methods wrap low-level calls in try/except Exception as e and log then return False or error dict

Service-level error handling example (app/services/cleaning.py):

except Exception as e:
    logger.error(f"Error processing item {target}: {e}")
    return {
        "success": False,
        "target": target,
        "error": str(e),
        "storage_status": "error",
        "remote_sent": False
    }

Repository-level: ClickHouseBaseRepo does not swallow exceptions; they propagate to the service layer.

Auth exceptions: app/core/dependency.py raises HTTPException(status_code=401/403) directly — the standard FastAPI pattern for auth failures.

Logging

Framework: loguru v0.7.3

Import: from app.log import logger (centralized re-export) or from loguru import logger (direct)

Patterns:

  • logger.info(f"...") for normal operation events
  • logger.warning(f"...") for non-fatal recoverable issues (e.g., token not found, API soft failures)
  • logger.error(f"...") for caught exceptions and operation failures
  • F-string interpolation used consistently for message formatting
  • No structured fields (no logger.bind() usage observed)

Example:

logger.info(f"获取招聘详情: {job_id}")
logger.warning(f"Boss get_job_detail failed: {result.error}")
logger.error(f"批量插入失败: {e}")

API Response Format

Two response styles coexist:

Style 1 — Direct dict return (most routes in new modules like app/api/v1/job/job.py, app/api/v1/analytics.py):

return {"code": 200, "data": result, "message": "ok"}

Style 2 — JSONResponse subclasses (older RBAC routes, defined in app/schemas/base.py):

Success(code=200, msg="OK", data=data)
Fail(code=400, msg="error message")
SuccessExtra(code=200, data=data, total=100, page=1, page_size=20)

Paginated responses include: code, data (list), total, page, page_size

Comments

When to Comment:

  • Docstrings on public methods describing purpose, not implementation: """获取可用关键词,优先返回断点续爬和失败重试的关键词"""
  • Inline comments for priority logic and algorithm steps: # 优先级 1: 断点续爬 (partial)
  • Module-level docstrings for context: """Boss直聘 Service — 基于新算法文件的封装"""
  • # noqa comments for intentional lint suppressions

JSDoc/TSDoc:

  • Not applicable (Python backend)
  • Docstrings are brief single-line or short multi-line Chinese descriptions

Function Design

Size: Functions tend to be 10-50 lines; service methods like process_single_item in app/services/cleaning.py grow to ~70 lines due to multi-platform dispatch

Parameters:

  • Keyword arguments with defaults preferred for optional params
  • Pydantic schemas used for HTTP request bodies (never raw dicts from router params)
  • Optional[str] with = None default for optional parameters

Return Values:

  • Services return Dict[str, Any] with consistent keys (code, message, data)
  • Private helpers return Optional[T] or primitive types
  • Async functions return awaitable results (no mixing of sync/async)

Module Design

Exports:

  • app/models/__init__.py uses from .{module} import * to flatten model imports
  • Router modules export a single named router variable: router = APIRouter(...) or {domain}_router = APIRouter(...)
  • Service classes are imported directly by name

Barrel Files (__init__.py):

  • app/models/__init__.py — re-exports all model classes
  • app/services/ingest/__init__.py — re-exports IngestService and config registrations
  • app/api/v1/__init__.py — aggregates all routers into v1_router

Dependency Injection:

  • FastAPI Depends() used for service/controller instantiation in route handlers
  • Dependency factory functions named get_{service}() and defined in the same file as the router
  • Shared auth dependencies: DependAuth, DependPermission in app/core/dependency.py

Tortoise ORM Model Conventions

Base class: All models inherit from app/models/base.py:BaseModel (which extends tortoise.models.Model with id = BigIntField(pk=True))

Timestamp mixin: TimestampMixin adds created_at (auto_now_add) and updated_at (auto_now) — applied via multiple inheritance

Abstract base models: Platform variants use abstract base + concrete subclasses:

class BaseKeyword(Model):      # abstract = True in Meta
    ...
class BossKeyword(BaseKeyword):
    class Meta:
        table = "boss_keyword"

Field descriptions: All fields include description= parameter for documentation

Pydantic Schema Conventions

  • All schemas inherit from pydantic.BaseModel
  • All fields use Field(...) with description= for documentation
  • Enums inherit from (str, Enum) for JSON serialization compatibility
  • Output schemas include class Config: from_attributes = True to support ORM mode
  • Validation patterns use Field(..., pattern="^(boss|qcwy|zhilian)$") for enum-like string fields

Convention analysis: 2026-03-21