JobData/.planning/codebase/CONVENTIONS.md
win 00a727519f docs(phase-3): complete execution — 2/2 plans, 98 tests passing
- ARCH-04: job51 migrated to crawler_core (no old deps)
- ARCH-05: zhilian migrated to crawler_core (no old deps)
- 34 new mock tests (17 job51 + 17 zhilian)
- Added _parse_zhilian_response custom parser for zhilian API format
- Fixed POST Searcher _request() overrides for job51/zhilian
- Full regression: 98 passed in 0.12s
2026-03-21 19:19:17 +08:00

213 lines
9.0 KiB
Markdown

# Coding Conventions
**Analysis Date:** 2026-03-21
## Naming Patterns
**Files:**
- Snake_case for all Python files: `company_storage.py`, `company_cleaner.py`, `clickhouse_repo.py`
- Private/internal modules prefixed with underscore: `_base.py`, `_boss_api.py`, `_boss_client.py`, `_boss_sign.py`, `_http_client.py`
- Platform-named service files: `boss.py`, `qcwy.py`, `zhilian.py` under `app/services/crawler/`
- Router files named after domain: `keyword.py`, `analytics.py`, `cleaning.py`
**Classes:**
- PascalCase throughout: `CleaningService`, `KeywordController`, `ClickHouseBaseRepo`, `JobAnalyticsRepo`
- Services: `{Domain}Service``BossService`, `QcwyService`, `ZhilianService`, `IngestService`, `AnalyticsService`
- Controllers: `{Domain}Controller``KeywordController`
- Repos: `{Domain}Repo` or `{Domain}BaseRepo``ClickHouseBaseRepo`, `JobAnalyticsRepo`
- Models (Tortoise ORM): `{Platform}{Entity}``BossKeyword`, `QcwyCompany`, `ZhilianCompany`
- Schemas (Pydantic): `{Entity}Base`, `{Entity}Create`, `{Entity}Update`, `{Entity}Out` — see `app/schemas/keyword.py`
**Functions and Methods:**
- Snake_case for all functions and methods: `get_available`, `report_page_progress`, `store_batch`, `build_insert_row`
- Private helpers prefixed with underscore: `_apply_proxy`, `_ensure_boss_token_loaded`, `_pick_first`, `_nested_get`, `_clean_text`, `_model_for_source`
- Async dependency factories follow pattern `get_{service/controller}()`: `get_ingest_service`, `get_analytics_service`, `get_keyword_controller`
**Variables:**
- Snake_case: `data_list`, `platform_type`, `check_duplicate`, `page_size`
- Module-level constants: UPPER_SNAKE_CASE — `COMPANY_SOURCES`, `QUEUE_TERMINAL_STATUSES`
- Class-level constants: UPPER_SNAKE_CASE prefixed `_``_TOKEN_REFRESH_INTERVAL = 3600`
**Types and Enums:**
- Enums use PascalCase class name, UPPER_SNAKE_CASE values: `PlatformType.BOSS`, `ChannelType.MINI`, `DataType.JOB`
- Enum values are lowercase strings matching URL slugs: `"boss"`, `"mini"`, `"job"` — see `app/schemas/ingest.py`
- Enums inherit from `(str, Enum)` enabling direct string comparison
## Code Style
**Formatting:**
- Tool: `black` v24.10.0
- Line length: 120 characters (set in `pyproject.toml` `[tool.black]` and `[tool.ruff]`)
- Target Python versions: 3.10, 3.11 (black), 3.13 (Pipfile)
**Linting:**
- Tool: `ruff` v0.9.1 (configured in `pyproject.toml`)
- Ignored rules: `F403` (star imports), `F405` (may be undefined from star import)
- Star imports from internal modules are allowed (used in `app/models/__init__.py`, `app/services/ingest/__init__.py`)
**Import Sorting:**
- Tool: `isort` v5.13.2
- No explicit isort config found; follows default ordering
## Import Organization
**Order:**
1. Standard library (`from __future__`, `os`, `re`, `typing`, `datetime`, `json`)
2. Third-party (`fastapi`, `pydantic`, `tortoise`, `loguru`, `clickhouse_connect`)
3. Internal app imports (`from app.core.`, `from app.models.`, `from app.services.`, `from app.schemas.`)
**Example from `app/api/v1/analytics.py`:**
```python
from typing import Optional
from datetime import datetime, date, timezone
from zoneinfo import ZoneInfo
from fastapi import APIRouter, Depends, Query
from app.core.clickhouse import clickhouse_manager
from app.services.analytics_service import AnalyticsService
from app.schemas.analytics import JobStatisticsResponse
```
**Path Aliases:**
- None; all imports use full `app.` prefix paths
- `from app.log import logger` is the canonical loguru import path
**Star Imports:**
- Used only in `__init__.py` re-export files: `from .admin import *` in `app/models/__init__.py`
- `# noqa: F401, F403` comments suppress lint warnings for intentional star imports
## Error Handling
**Patterns:**
- Services return `Dict[str, Any]` result objects with `"success"`, `"code"`, `"message"` fields instead of raising exceptions to callers
- Controllers return dict with `"code": 200/400/404` and `"message"` for all outcomes
- API route handlers do NOT use try/except — they rely on services returning structured results
- Service methods wrap low-level calls in `try/except Exception as e` and log then return `False` or error dict
**Service-level error handling example** (`app/services/cleaning.py`):
```python
except Exception as e:
logger.error(f"Error processing item {target}: {e}")
return {
"success": False,
"target": target,
"error": str(e),
"storage_status": "error",
"remote_sent": False
}
```
**Repository-level:** `ClickHouseBaseRepo` does not swallow exceptions; they propagate to the service layer.
**Auth exceptions:** `app/core/dependency.py` raises `HTTPException(status_code=401/403)` directly — the standard FastAPI pattern for auth failures.
## Logging
**Framework:** `loguru` v0.7.3
**Import:** `from app.log import logger` (centralized re-export) or `from loguru import logger` (direct)
**Patterns:**
- `logger.info(f"...")` for normal operation events
- `logger.warning(f"...")` for non-fatal recoverable issues (e.g., token not found, API soft failures)
- `logger.error(f"...")` for caught exceptions and operation failures
- F-string interpolation used consistently for message formatting
- No structured fields (no `logger.bind()` usage observed)
**Example:**
```python
logger.info(f"获取招聘详情: {job_id}")
logger.warning(f"Boss get_job_detail failed: {result.error}")
logger.error(f"批量插入失败: {e}")
```
## API Response Format
**Two response styles coexist:**
**Style 1 — Direct dict return** (most routes in new modules like `app/api/v1/job/job.py`, `app/api/v1/analytics.py`):
```python
return {"code": 200, "data": result, "message": "ok"}
```
**Style 2 — JSONResponse subclasses** (older RBAC routes, defined in `app/schemas/base.py`):
```python
Success(code=200, msg="OK", data=data)
Fail(code=400, msg="error message")
SuccessExtra(code=200, data=data, total=100, page=1, page_size=20)
```
**Paginated responses** include: `code`, `data` (list), `total`, `page`, `page_size`
## Comments
**When to Comment:**
- Docstrings on public methods describing purpose, not implementation: `"""获取可用关键词,优先返回断点续爬和失败重试的关键词"""`
- Inline comments for priority logic and algorithm steps: `# 优先级 1: 断点续爬 (partial)`
- Module-level docstrings for context: `"""Boss直聘 Service — 基于新算法文件的封装"""`
- `# noqa` comments for intentional lint suppressions
**JSDoc/TSDoc:**
- Not applicable (Python backend)
- Docstrings are brief single-line or short multi-line Chinese descriptions
## Function Design
**Size:** Functions tend to be 10-50 lines; service methods like `process_single_item` in `app/services/cleaning.py` grow to ~70 lines due to multi-platform dispatch
**Parameters:**
- Keyword arguments with defaults preferred for optional params
- Pydantic schemas used for HTTP request bodies (never raw dicts from router params)
- `Optional[str]` with `= None` default for optional parameters
**Return Values:**
- Services return `Dict[str, Any]` with consistent keys (`code`, `message`, `data`)
- Private helpers return `Optional[T]` or primitive types
- Async functions return awaitable results (no mixing of sync/async)
## Module Design
**Exports:**
- `app/models/__init__.py` uses `from .{module} import *` to flatten model imports
- Router modules export a single named router variable: `router = APIRouter(...)` or `{domain}_router = APIRouter(...)`
- Service classes are imported directly by name
**Barrel Files (`__init__.py`):**
- `app/models/__init__.py` — re-exports all model classes
- `app/services/ingest/__init__.py` — re-exports `IngestService` and config registrations
- `app/api/v1/__init__.py` — aggregates all routers into `v1_router`
**Dependency Injection:**
- FastAPI `Depends()` used for service/controller instantiation in route handlers
- Dependency factory functions named `get_{service}()` and defined in the same file as the router
- Shared auth dependencies: `DependAuth`, `DependPermission` in `app/core/dependency.py`
## Tortoise ORM Model Conventions
**Base class:** All models inherit from `app/models/base.py:BaseModel` (which extends `tortoise.models.Model` with `id = BigIntField(pk=True)`)
**Timestamp mixin:** `TimestampMixin` adds `created_at` (auto_now_add) and `updated_at` (auto_now) — applied via multiple inheritance
**Abstract base models:** Platform variants use abstract base + concrete subclasses:
```python
class BaseKeyword(Model): # abstract = True in Meta
...
class BossKeyword(BaseKeyword):
class Meta:
table = "boss_keyword"
```
**Field descriptions:** All fields include `description=` parameter for documentation
## Pydantic Schema Conventions
- All schemas inherit from `pydantic.BaseModel`
- All fields use `Field(...)` with `description=` for documentation
- Enums inherit from `(str, Enum)` for JSON serialization compatibility
- Output schemas include `class Config: from_attributes = True` to support ORM mode
- Validation patterns use `Field(..., pattern="^(boss|qcwy|zhilian)$")` for enum-like string fields
---
*Convention analysis: 2026-03-21*