win 00a727519f docs(phase-3): complete execution — 2/2 plans, 98 tests passing
- ARCH-04: job51 migrated to crawler_core (no old deps)
- ARCH-05: zhilian migrated to crawler_core (no old deps)
- 34 new mock tests (17 job51 + 17 zhilian)
- Added _parse_zhilian_response custom parser for zhilian API format
- Fixed POST Searcher _request() overrides for job51/zhilian
- Full regression: 98 passed in 0.12s
2026-03-21 19:19:17 +08:00

144 lines
6.0 KiB
Markdown

# Technology Stack
**Analysis Date:** 2026-03-21
## Languages
**Primary:**
- Python 3.13 - Backend API, crawlers, ECS pipeline scripts
- JavaScript (ES Modules) - Vue3 frontend (no strict TypeScript; TS installed but JS used in `.vue` and `.js` files)
**Secondary:**
- TypeScript 5.1.6 - Type definitions referenced in frontend toolchain only
## Runtime
**Environment:**
- Python 3.13 (required: `>=3.13` per `Pipfile`; Dockerfile uses `python:3.11-slim-bullseye` for container build)
- Node.js 18 (Dockerfile `node:18-alpine` for frontend build stage)
**Package Manager:**
- Python: `pipenv` (development), `pip` / `requirements.txt` (Docker production)
- Frontend: `pnpm` (lockfile: `web/pnpm-lock.yaml` present)
- Lockfiles: `Pipfile.lock` (Python), `web/pnpm-lock.yaml` (frontend), `uv.lock` (uv-compatible)
## Frameworks
**Core Backend:**
- FastAPI 0.111.0 - REST API framework (`app/__init__.py` factory via `create_app()`)
- Starlette 0.37.2 - ASGI underpinning (middleware, static files)
- Uvicorn 0.34.0 - ASGI server (20 workers default, configurable via `UVICORN_WORKERS`)
- Uvloop 0.21.0 - High-performance event loop (non-Windows only)
**ORM / Database:**
- Tortoise-ORM 0.23.0 - Async ORM for MySQL (`app/models/`)
- Aerich 0.8.1 - Database migrations for Tortoise-ORM (`migrations/` directory)
- aiomysql - Async MySQL driver (Tortoise backend)
- clickhouse-connect 0.7.19 - ClickHouse async client (`app/core/clickhouse.py`)
- asyncpg - Async PostgreSQL driver (listed in Pipfile but not actively used)
**Validation / Serialization:**
- Pydantic 2.10.5 - Request/response schemas (`app/schemas/`)
- pydantic-settings 2.7.1 - Settings management via env vars (`app/settings/config.py`)
- orjson 3.10.14 - Fast JSON serialization
- ujson 5.10.0 - Alternative JSON library
**Authentication / Security:**
- PyJWT 2.10.1 - JWT token generation/verification (HS256, 7-day expiry)
- passlib 1.7.4 - Password hashing
- argon2-cffi 23.1.0 - Argon2 password hasher
**Scheduling:**
- APScheduler - Async job scheduler (`app/core/scheduler.py`, 6 registered cron tasks)
**HTTP Clients:**
- httpx 0.28.1 - Async HTTP client (backend service-to-service calls)
- requests - Sync HTTP client (spider scripts in `jobs_spider/`)
**Logging:**
- loguru 0.7.3 - Structured logging (unified throughout backend and spiders)
**Core Frontend:**
- Vue 3.3.4 - SPA framework (`web/src/main.js`)
- Vue Router 4.2.4 - Client-side routing (`web/src/router/index.js`)
- Pinia 2.1.6 - State management (`web/src/store/`)
- Naive UI 2.34.4 - Component library (admin UI)
- ECharts 6.0.0 - Data visualization charts (`web/src/views/analytics/`)
- axios 1.4.0 - HTTP client (`web/src/utils/http/`)
- vue-i18n 9 - Internationalization
**Build / Dev Tools:**
- Vite 4.4.6 - Frontend bundler (`web/vite.config.js`)
- UnoCSS 66.5.10 - Atomic CSS engine
- unplugin-auto-import 20.3.0 - Auto-imports for Vue APIs
- unplugin-vue-components 30.0.0 - Auto-imports for components
- @iconify/vue + @iconify/json - Icon library
**Spider-specific:**
- playwright 1.57.0 - Browser automation (anti-detection in crawlers)
- PyExecJS 1.5.1 - Execute JavaScript signing algorithms from Python (`jobs_spider/boss/`)
- PySocks - SOCKS proxy support for spider requests
- tenacity - Retry logic in crawlers
- pandas + openpyxl - Data processing and Excel export
**Cloud:**
- alibabacloud_ecs20140526 - Alibaba Cloud ECS SDK (`ecs_full_pipeline.py`)
- alibabacloud_credentials - AliCloud credential management
**Code Quality:**
- ruff 0.9.1 - Python linter (`pyproject.toml` config: line-length 120, ignores F403/F405)
- black 24.10.0 - Python formatter (line-length 120, target py310/py311)
- isort 5.13.2 - Import sorter
- ESLint 8.46.0 - Frontend linter (`@zclzone` + `@unocss` rule sets)
- prettier - Frontend formatter
## Key Dependencies
**Critical:**
- `clickhouse-connect==0.7.19` - All analytics and job data storage; loss means no data read/write
- `tortoise-orm==0.23.0` - All business data (users, roles, keywords, tokens); paired with `aerich` for migrations
- `fastapi==0.111.0` - API layer; version-pinned for stability
- `APScheduler` - 6 scheduled tasks including ECS pipeline and IP alerting
- `alibabacloud_ecs20140526` - ECS node management; required for crawler scaling
**Infrastructure:**
- `uvicorn==0.34.0` + `uvloop==0.21.0` - Production ASGI server stack
- `pydantic==2.10.5` - All input validation; v2 API used throughout
- `loguru==0.7.3` - Unified logging across all modules
- `redis` - Optional distributed lock backend (`app/core/locks.py`; falls back to file locks if Redis unavailable)
## Configuration
**Environment:**
- All settings in `app/settings/config.py` via `pydantic-settings.BaseSettings`
- Environment variables override defaults at startup
- No `.env` file detected (not committed); variables set at OS/container level
- Key variables: `APP_HOST`, `APP_PORT`, `UVICORN_WORKERS`, `CLICKHOUSE_HOST`, `CLICKHOUSE_USER`, `CLICKHOUSE_PASS`, `SECRET_KEY`, `SMTP_HOST`, `SMTP_USER`, `SMTP_PASS`, `REDIS_HOST`, `REPORT_ENDPOINT`
- **Security warning**: `config.py` contains hardcoded default values for MySQL, ClickHouse, and SMTP credentials; must be overridden in production
**Build:**
- `pyproject.toml` - Python project metadata, black/ruff tool config
- `Pipfile` / `Pipfile.lock` - Development dependency management
- `requirements.txt` - Production pip install (used in Dockerfile)
- `web/vite.config.js` - Vite build config with proxy support via `VITE_USE_PROXY` env var
- `Dockerfile` - Multi-stage build: Node 18 (frontend) + Python 3.11 (backend) + nginx
## Platform Requirements
**Development:**
- Python 3.13+ (Pipfile requirement)
- Node.js 18+ with pnpm
- MySQL server (Tortoise-ORM connection)
- ClickHouse server (analytics data)
- Optional: Redis (distributed locking upgrade)
**Production:**
- Docker-based deployment (see `Dockerfile`, `deploy/entrypoint.sh`, `deploy/web.conf`)
- Nginx serves frontend static files and proxies API requests
- Alibaba Cloud ECS (cn-shanghai-b zone) for crawler nodes
- Ports: 80 (nginx), 9999 (FastAPI backend direct)
---
*Stack analysis: 2026-03-21*