- ARCH-04: job51 migrated to crawler_core (no old deps) - ARCH-05: zhilian migrated to crawler_core (no old deps) - 34 new mock tests (17 job51 + 17 zhilian) - Added _parse_zhilian_response custom parser for zhilian API format - Fixed POST Searcher _request() overrides for job51/zhilian - Full regression: 98 passed in 0.12s
144 lines
6.0 KiB
Markdown
144 lines
6.0 KiB
Markdown
# Technology Stack
|
|
|
|
**Analysis Date:** 2026-03-21
|
|
|
|
## Languages
|
|
|
|
**Primary:**
|
|
- Python 3.13 - Backend API, crawlers, ECS pipeline scripts
|
|
- JavaScript (ES Modules) - Vue3 frontend (no strict TypeScript; TS installed but JS used in `.vue` and `.js` files)
|
|
|
|
**Secondary:**
|
|
- TypeScript 5.1.6 - Type definitions referenced in frontend toolchain only
|
|
|
|
## Runtime
|
|
|
|
**Environment:**
|
|
- Python 3.13 (required: `>=3.13` per `Pipfile`; Dockerfile uses `python:3.11-slim-bullseye` for container build)
|
|
- Node.js 18 (Dockerfile `node:18-alpine` for frontend build stage)
|
|
|
|
**Package Manager:**
|
|
- Python: `pipenv` (development), `pip` / `requirements.txt` (Docker production)
|
|
- Frontend: `pnpm` (lockfile: `web/pnpm-lock.yaml` present)
|
|
- Lockfiles: `Pipfile.lock` (Python), `web/pnpm-lock.yaml` (frontend), `uv.lock` (uv-compatible)
|
|
|
|
## Frameworks
|
|
|
|
**Core Backend:**
|
|
- FastAPI 0.111.0 - REST API framework (`app/__init__.py` factory via `create_app()`)
|
|
- Starlette 0.37.2 - ASGI underpinning (middleware, static files)
|
|
- Uvicorn 0.34.0 - ASGI server (20 workers default, configurable via `UVICORN_WORKERS`)
|
|
- Uvloop 0.21.0 - High-performance event loop (non-Windows only)
|
|
|
|
**ORM / Database:**
|
|
- Tortoise-ORM 0.23.0 - Async ORM for MySQL (`app/models/`)
|
|
- Aerich 0.8.1 - Database migrations for Tortoise-ORM (`migrations/` directory)
|
|
- aiomysql - Async MySQL driver (Tortoise backend)
|
|
- clickhouse-connect 0.7.19 - ClickHouse async client (`app/core/clickhouse.py`)
|
|
- asyncpg - Async PostgreSQL driver (listed in Pipfile but not actively used)
|
|
|
|
**Validation / Serialization:**
|
|
- Pydantic 2.10.5 - Request/response schemas (`app/schemas/`)
|
|
- pydantic-settings 2.7.1 - Settings management via env vars (`app/settings/config.py`)
|
|
- orjson 3.10.14 - Fast JSON serialization
|
|
- ujson 5.10.0 - Alternative JSON library
|
|
|
|
**Authentication / Security:**
|
|
- PyJWT 2.10.1 - JWT token generation/verification (HS256, 7-day expiry)
|
|
- passlib 1.7.4 - Password hashing
|
|
- argon2-cffi 23.1.0 - Argon2 password hasher
|
|
|
|
**Scheduling:**
|
|
- APScheduler - Async job scheduler (`app/core/scheduler.py`, 6 registered cron tasks)
|
|
|
|
**HTTP Clients:**
|
|
- httpx 0.28.1 - Async HTTP client (backend service-to-service calls)
|
|
- requests - Sync HTTP client (spider scripts in `jobs_spider/`)
|
|
|
|
**Logging:**
|
|
- loguru 0.7.3 - Structured logging (unified throughout backend and spiders)
|
|
|
|
**Core Frontend:**
|
|
- Vue 3.3.4 - SPA framework (`web/src/main.js`)
|
|
- Vue Router 4.2.4 - Client-side routing (`web/src/router/index.js`)
|
|
- Pinia 2.1.6 - State management (`web/src/store/`)
|
|
- Naive UI 2.34.4 - Component library (admin UI)
|
|
- ECharts 6.0.0 - Data visualization charts (`web/src/views/analytics/`)
|
|
- axios 1.4.0 - HTTP client (`web/src/utils/http/`)
|
|
- vue-i18n 9 - Internationalization
|
|
|
|
**Build / Dev Tools:**
|
|
- Vite 4.4.6 - Frontend bundler (`web/vite.config.js`)
|
|
- UnoCSS 66.5.10 - Atomic CSS engine
|
|
- unplugin-auto-import 20.3.0 - Auto-imports for Vue APIs
|
|
- unplugin-vue-components 30.0.0 - Auto-imports for components
|
|
- @iconify/vue + @iconify/json - Icon library
|
|
|
|
**Spider-specific:**
|
|
- playwright 1.57.0 - Browser automation (anti-detection in crawlers)
|
|
- PyExecJS 1.5.1 - Execute JavaScript signing algorithms from Python (`jobs_spider/boss/`)
|
|
- PySocks - SOCKS proxy support for spider requests
|
|
- tenacity - Retry logic in crawlers
|
|
- pandas + openpyxl - Data processing and Excel export
|
|
|
|
**Cloud:**
|
|
- alibabacloud_ecs20140526 - Alibaba Cloud ECS SDK (`ecs_full_pipeline.py`)
|
|
- alibabacloud_credentials - AliCloud credential management
|
|
|
|
**Code Quality:**
|
|
- ruff 0.9.1 - Python linter (`pyproject.toml` config: line-length 120, ignores F403/F405)
|
|
- black 24.10.0 - Python formatter (line-length 120, target py310/py311)
|
|
- isort 5.13.2 - Import sorter
|
|
- ESLint 8.46.0 - Frontend linter (`@zclzone` + `@unocss` rule sets)
|
|
- prettier - Frontend formatter
|
|
|
|
## Key Dependencies
|
|
|
|
**Critical:**
|
|
- `clickhouse-connect==0.7.19` - All analytics and job data storage; loss means no data read/write
|
|
- `tortoise-orm==0.23.0` - All business data (users, roles, keywords, tokens); paired with `aerich` for migrations
|
|
- `fastapi==0.111.0` - API layer; version-pinned for stability
|
|
- `APScheduler` - 6 scheduled tasks including ECS pipeline and IP alerting
|
|
- `alibabacloud_ecs20140526` - ECS node management; required for crawler scaling
|
|
|
|
**Infrastructure:**
|
|
- `uvicorn==0.34.0` + `uvloop==0.21.0` - Production ASGI server stack
|
|
- `pydantic==2.10.5` - All input validation; v2 API used throughout
|
|
- `loguru==0.7.3` - Unified logging across all modules
|
|
- `redis` - Optional distributed lock backend (`app/core/locks.py`; falls back to file locks if Redis unavailable)
|
|
|
|
## Configuration
|
|
|
|
**Environment:**
|
|
- All settings in `app/settings/config.py` via `pydantic-settings.BaseSettings`
|
|
- Environment variables override defaults at startup
|
|
- No `.env` file detected (not committed); variables set at OS/container level
|
|
- Key variables: `APP_HOST`, `APP_PORT`, `UVICORN_WORKERS`, `CLICKHOUSE_HOST`, `CLICKHOUSE_USER`, `CLICKHOUSE_PASS`, `SECRET_KEY`, `SMTP_HOST`, `SMTP_USER`, `SMTP_PASS`, `REDIS_HOST`, `REPORT_ENDPOINT`
|
|
- **Security warning**: `config.py` contains hardcoded default values for MySQL, ClickHouse, and SMTP credentials; must be overridden in production
|
|
|
|
**Build:**
|
|
- `pyproject.toml` - Python project metadata, black/ruff tool config
|
|
- `Pipfile` / `Pipfile.lock` - Development dependency management
|
|
- `requirements.txt` - Production pip install (used in Dockerfile)
|
|
- `web/vite.config.js` - Vite build config with proxy support via `VITE_USE_PROXY` env var
|
|
- `Dockerfile` - Multi-stage build: Node 18 (frontend) + Python 3.11 (backend) + nginx
|
|
|
|
## Platform Requirements
|
|
|
|
**Development:**
|
|
- Python 3.13+ (Pipfile requirement)
|
|
- Node.js 18+ with pnpm
|
|
- MySQL server (Tortoise-ORM connection)
|
|
- ClickHouse server (analytics data)
|
|
- Optional: Redis (distributed locking upgrade)
|
|
|
|
**Production:**
|
|
- Docker-based deployment (see `Dockerfile`, `deploy/entrypoint.sh`, `deploy/web.conf`)
|
|
- Nginx serves frontend static files and proxies API requests
|
|
- Alibaba Cloud ECS (cn-shanghai-b zone) for crawler nodes
|
|
- Ports: 80 (nginx), 9999 (FastAPI backend direct)
|
|
|
|
---
|
|
|
|
*Stack analysis: 2026-03-21*
|