- ARCH-04: job51 migrated to crawler_core (no old deps) - ARCH-05: zhilian migrated to crawler_core (no old deps) - 34 new mock tests (17 job51 + 17 zhilian) - Added _parse_zhilian_response custom parser for zhilian API format - Fixed POST Searcher _request() overrides for job51/zhilian - Full regression: 98 passed in 0.12s
9.0 KiB
Coding Conventions
Analysis Date: 2026-03-21
Naming Patterns
Files:
- Snake_case for all Python files:
company_storage.py,company_cleaner.py,clickhouse_repo.py - Private/internal modules prefixed with underscore:
_base.py,_boss_api.py,_boss_client.py,_boss_sign.py,_http_client.py - Platform-named service files:
boss.py,qcwy.py,zhilian.pyunderapp/services/crawler/ - Router files named after domain:
keyword.py,analytics.py,cleaning.py
Classes:
- PascalCase throughout:
CleaningService,KeywordController,ClickHouseBaseRepo,JobAnalyticsRepo - Services:
{Domain}Service—BossService,QcwyService,ZhilianService,IngestService,AnalyticsService - Controllers:
{Domain}Controller—KeywordController - Repos:
{Domain}Repoor{Domain}BaseRepo—ClickHouseBaseRepo,JobAnalyticsRepo - Models (Tortoise ORM):
{Platform}{Entity}—BossKeyword,QcwyCompany,ZhilianCompany - Schemas (Pydantic):
{Entity}Base,{Entity}Create,{Entity}Update,{Entity}Out— seeapp/schemas/keyword.py
Functions and Methods:
- Snake_case for all functions and methods:
get_available,report_page_progress,store_batch,build_insert_row - Private helpers prefixed with underscore:
_apply_proxy,_ensure_boss_token_loaded,_pick_first,_nested_get,_clean_text,_model_for_source - Async dependency factories follow pattern
get_{service/controller}():get_ingest_service,get_analytics_service,get_keyword_controller
Variables:
- Snake_case:
data_list,platform_type,check_duplicate,page_size - Module-level constants: UPPER_SNAKE_CASE —
COMPANY_SOURCES,QUEUE_TERMINAL_STATUSES - Class-level constants: UPPER_SNAKE_CASE prefixed
_—_TOKEN_REFRESH_INTERVAL = 3600
Types and Enums:
- Enums use PascalCase class name, UPPER_SNAKE_CASE values:
PlatformType.BOSS,ChannelType.MINI,DataType.JOB - Enum values are lowercase strings matching URL slugs:
"boss","mini","job"— seeapp/schemas/ingest.py - Enums inherit from
(str, Enum)enabling direct string comparison
Code Style
Formatting:
- Tool:
blackv24.10.0 - Line length: 120 characters (set in
pyproject.toml[tool.black]and[tool.ruff]) - Target Python versions: 3.10, 3.11 (black), 3.13 (Pipfile)
Linting:
- Tool:
ruffv0.9.1 (configured inpyproject.toml) - Ignored rules:
F403(star imports),F405(may be undefined from star import) - Star imports from internal modules are allowed (used in
app/models/__init__.py,app/services/ingest/__init__.py)
Import Sorting:
- Tool:
isortv5.13.2 - No explicit isort config found; follows default ordering
Import Organization
Order:
- Standard library (
from __future__,os,re,typing,datetime,json) - Third-party (
fastapi,pydantic,tortoise,loguru,clickhouse_connect) - Internal app imports (
from app.core.,from app.models.,from app.services.,from app.schemas.)
Example from app/api/v1/analytics.py:
from typing import Optional
from datetime import datetime, date, timezone
from zoneinfo import ZoneInfo
from fastapi import APIRouter, Depends, Query
from app.core.clickhouse import clickhouse_manager
from app.services.analytics_service import AnalyticsService
from app.schemas.analytics import JobStatisticsResponse
Path Aliases:
- None; all imports use full
app.prefix paths from app.log import loggeris the canonical loguru import path
Star Imports:
- Used only in
__init__.pyre-export files:from .admin import *inapp/models/__init__.py # noqa: F401, F403comments suppress lint warnings for intentional star imports
Error Handling
Patterns:
- Services return
Dict[str, Any]result objects with"success","code","message"fields instead of raising exceptions to callers - Controllers return dict with
"code": 200/400/404and"message"for all outcomes - API route handlers do NOT use try/except — they rely on services returning structured results
- Service methods wrap low-level calls in
try/except Exception as eand log then returnFalseor error dict
Service-level error handling example (app/services/cleaning.py):
except Exception as e:
logger.error(f"Error processing item {target}: {e}")
return {
"success": False,
"target": target,
"error": str(e),
"storage_status": "error",
"remote_sent": False
}
Repository-level: ClickHouseBaseRepo does not swallow exceptions; they propagate to the service layer.
Auth exceptions: app/core/dependency.py raises HTTPException(status_code=401/403) directly — the standard FastAPI pattern for auth failures.
Logging
Framework: loguru v0.7.3
Import: from app.log import logger (centralized re-export) or from loguru import logger (direct)
Patterns:
logger.info(f"...")for normal operation eventslogger.warning(f"...")for non-fatal recoverable issues (e.g., token not found, API soft failures)logger.error(f"...")for caught exceptions and operation failures- F-string interpolation used consistently for message formatting
- No structured fields (no
logger.bind()usage observed)
Example:
logger.info(f"获取招聘详情: {job_id}")
logger.warning(f"Boss get_job_detail failed: {result.error}")
logger.error(f"批量插入失败: {e}")
API Response Format
Two response styles coexist:
Style 1 — Direct dict return (most routes in new modules like app/api/v1/job/job.py, app/api/v1/analytics.py):
return {"code": 200, "data": result, "message": "ok"}
Style 2 — JSONResponse subclasses (older RBAC routes, defined in app/schemas/base.py):
Success(code=200, msg="OK", data=data)
Fail(code=400, msg="error message")
SuccessExtra(code=200, data=data, total=100, page=1, page_size=20)
Paginated responses include: code, data (list), total, page, page_size
Comments
When to Comment:
- Docstrings on public methods describing purpose, not implementation:
"""获取可用关键词,优先返回断点续爬和失败重试的关键词""" - Inline comments for priority logic and algorithm steps:
# 优先级 1: 断点续爬 (partial) - Module-level docstrings for context:
"""Boss直聘 Service — 基于新算法文件的封装""" # noqacomments for intentional lint suppressions
JSDoc/TSDoc:
- Not applicable (Python backend)
- Docstrings are brief single-line or short multi-line Chinese descriptions
Function Design
Size: Functions tend to be 10-50 lines; service methods like process_single_item in app/services/cleaning.py grow to ~70 lines due to multi-platform dispatch
Parameters:
- Keyword arguments with defaults preferred for optional params
- Pydantic schemas used for HTTP request bodies (never raw dicts from router params)
Optional[str]with= Nonedefault for optional parameters
Return Values:
- Services return
Dict[str, Any]with consistent keys (code,message,data) - Private helpers return
Optional[T]or primitive types - Async functions return awaitable results (no mixing of sync/async)
Module Design
Exports:
app/models/__init__.pyusesfrom .{module} import *to flatten model imports- Router modules export a single named router variable:
router = APIRouter(...)or{domain}_router = APIRouter(...) - Service classes are imported directly by name
Barrel Files (__init__.py):
app/models/__init__.py— re-exports all model classesapp/services/ingest/__init__.py— re-exportsIngestServiceand config registrationsapp/api/v1/__init__.py— aggregates all routers intov1_router
Dependency Injection:
- FastAPI
Depends()used for service/controller instantiation in route handlers - Dependency factory functions named
get_{service}()and defined in the same file as the router - Shared auth dependencies:
DependAuth,DependPermissioninapp/core/dependency.py
Tortoise ORM Model Conventions
Base class: All models inherit from app/models/base.py:BaseModel (which extends tortoise.models.Model with id = BigIntField(pk=True))
Timestamp mixin: TimestampMixin adds created_at (auto_now_add) and updated_at (auto_now) — applied via multiple inheritance
Abstract base models: Platform variants use abstract base + concrete subclasses:
class BaseKeyword(Model): # abstract = True in Meta
...
class BossKeyword(BaseKeyword):
class Meta:
table = "boss_keyword"
Field descriptions: All fields include description= parameter for documentation
Pydantic Schema Conventions
- All schemas inherit from
pydantic.BaseModel - All fields use
Field(...)withdescription=for documentation - Enums inherit from
(str, Enum)for JSON serialization compatibility - Output schemas include
class Config: from_attributes = Trueto support ORM mode - Validation patterns use
Field(..., pattern="^(boss|qcwy|zhilian)$")for enum-like string fields
Convention analysis: 2026-03-21