24 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
| phase | plan | type | wave | depends_on | files_modified | autonomous | requirements | must_haves | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 01-shared-core | 02 | execute | 2 |
|
|
true |
|
|
Purpose: QUAL-01 requires unit test coverage for core signing algorithms. Tests validate that the ported algorithms match the originals. Pure functions mean no HTTP mocking needed.
Output: Three sign.py files in crawler_core/{boss,qcwy,zhilian}/ and three test files in tests/crawler_core/ — all tests pass with pytest tests/crawler_core/.
<execution_context>
@/.claude/get-shit-done/workflows/execute-plan.md
@/.claude/get-shit-done/templates/summary.md
</execution_context>
From spiderJobs/platforms/boss/sign.py:
# Module-level constants:
_CHARS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
# Module-level private functions:
def _to_u32(n: int) -> int: ... # Masks to 32-bit unsigned
def _compute_checksum(uuid_str: str) -> str: ... # 3-char checksum from 19-char uuid
def _generate_uuid() -> str: ... # 13-char hex timestamp + 6-char base62 random
class BossSign:
def __init__(self, *, mpt: str = "", wt2: str = ""): ...
@staticmethod
def generate_traceid(prefix: str = "M-W") -> str: ...
# Result format: "{prefix}{19-char uuid}{3-char checksum}" e.g. "M-W0019d0a8af5f32gtVvnD4M"
From spiderJobs/platforms/job51/sign.py:
SIGN_KEY = "abfc8f9dcf8c3f3d8aa294ac5f2cf2cc7767e5592590f39c3f503271dd68562b"
class Job51Sign:
def __init__(self, *, sign_key: str = SIGN_KEY): ...
@staticmethod
def generate_uuid() -> str: ... # 13-char timestamp + 10-char random int
def build_sign_path(
self,
endpoint: str,
method: str = "GET",
params: dict | None = None,
body: dict | None = None,
) -> tuple[str, str]: ... # Returns (url_path, sign_hex)
# url_path format: "/{endpoint}?api_key=51job×tamp={ts}[¶m=val...]"
# sign_hex: HMAC-SHA256 hex string, 64 chars
From spiderJobs/platforms/zhilian/sign.py:
class ZhilianSign:
def __init__(self, *, at="", rt="", device_id=None, version="4.1.259",
channel="wxxiaochengxu", platform="12"): ...
@staticmethod
def generate_uuid() -> str: ... # UUID4-format with uppercase hex
def sign_headers(self, page_code: str = "0") -> dict: ...
# Returns dict with keys: x-zp-at, x-zp-rt, x-zp-action-id, x-zp-page-code,
# x-zp-version, x-zp-channel, x-zp-platform, x-zp-device-id, x-zp-business-system
def sign_params(self) -> dict: ...
# Returns dict with keys: at, rt, channel, platform, version, d
Task 1: Port sign algorithms to crawler_core/ platform directories
- /Users/win/2025/AICoding/JobData/spiderJobs/platforms/boss/sign.py (source — read every line before writing)
- /Users/win/2025/AICoding/JobData/spiderJobs/platforms/job51/sign.py (source — read every line before writing)
- /Users/win/2025/AICoding/JobData/spiderJobs/platforms/zhilian/sign.py (source — read every line before writing)
- /Users/win/2025/AICoding/JobData/crawler_core/boss/__init__.py (confirm it exists from Plan 01)
crawler_core/boss/sign.py
crawler_core/qcwy/sign.py
crawler_core/zhilian/sign.py
Copy the three sign algorithm files to their new locations under crawler_core/, making only one change per file: update the module docstring to reference crawler_core.
crawler_core/boss/sign.py:
Copy spiderJobs/platforms/boss/sign.py verbatim. Update the module docstring first line to:
Boss直聘 Traceid 生成算法 (crawler_core)
No other changes. ALL private functions (_to_u32, _compute_checksum, _generate_uuid) and the BossSign class stay exactly as-is.
crawler_core/qcwy/sign.py:
Copy spiderJobs/platforms/job51/sign.py verbatim. Update the module docstring first line to:
前程无忧 (51Job) 签名算法 (crawler_core)
No other changes. SIGN_KEY constant and Job51Sign class stay exactly as-is. The import json inside build_sign_path stays inside the method (do not hoist it).
crawler_core/zhilian/sign.py:
Copy spiderJobs/platforms/zhilian/sign.py verbatim. Update the module docstring first line to:
智联招聘签名算法 (crawler_core)
No other changes. ZhilianSign class stays exactly as-is.
Verification after writing each file:
- No file imports from
spiderJobs.* - No file imports from
app.* - No file uses
requestsor any HTTP library - Each file is entirely self-contained with only stdlib imports (random, time, math, hmac, hashlib, urllib.parse)
DO NOT delete or modify:
spiderJobs/platforms/boss/sign.pyspiderJobs/platforms/job51/sign.pyspiderJobs/platforms/zhilian/sign.py
These old files remain in place until Phase 4 cleanup. cd /Users/win/2025/AICoding/JobData && python -c " import sys sys.path.insert(0, '.') import re
from crawler_core.boss.sign import BossSign, _compute_checksum, _generate_uuid from crawler_core.qcwy.sign import Job51Sign, SIGN_KEY from crawler_core.zhilian.sign import ZhilianSign
Boss
tid = BossSign.generate_traceid() assert re.match(r'^M-W[0-9a-f]{13}[0-9a-zA-Z]{6}[0-9a-zA-Z]{3}$', tid), f'Traceid format wrong: {tid}' print(f'BossSign OK: {tid}')
Job51
signer = Job51Sign() path, sign = signer.build_sign_path('open/test', 'GET', params={'key': 'val'}) assert path.startswith('/open/test?api_key=51job×tamp='), f'Path format wrong: {path}' assert len(sign) == 64, f'Sign length wrong: {len(sign)}' print(f'Job51Sign OK: sign={sign[:8]}...')
Zhilian
zs = ZhilianSign(at='token123', rt='refresh456')
headers = zs.sign_headers()
assert 'x-zp-device-id' in headers, 'device-id missing'
assert headers['x-zp-business-system'] == '73', 'business-system wrong'
assert len(headers) == 9, f'Expected 9 header keys, got {len(headers)}'
params = zs.sign_params()
assert params['at'] == 'token123', f'at field wrong: {params["at"]}'
assert len(params) == 6, f'Expected 6 param keys, got {len(params)}'
print('ZhilianSign OK')
print('All sign algorithms imported and validated')
"
<acceptance_criteria>
- crawler_core/boss/sign.py exists: ls crawler_core/boss/sign.py exits 0
- crawler_core/qcwy/sign.py exists: ls crawler_core/qcwy/sign.py exits 0
- crawler_core/zhilian/sign.py exists: ls crawler_core/zhilian/sign.py exits 0
- grep -r "from spiderJobs" crawler_core/ returns empty (no cross-imports)
- grep -r "import requests" crawler_core/boss/sign.py crawler_core/qcwy/sign.py crawler_core/zhilian/sign.py returns empty (no HTTP imports in sign files)
- BossSign.generate_traceid() returns a string of length 25 (3-char prefix + 19-char uuid + 3-char checksum)
- Job51Sign().build_sign_path("test", "GET") returns tuple where [1] is 64-char string
- ZhilianSign().sign_headers() returns dict with x-zp-business-system == "73"
- Old files UNCHANGED: diff spiderJobs/platforms/boss/sign.py crawler_core/boss/sign.py shows only docstring difference
</acceptance_criteria>
Three sign.py files in crawler_core/ — pure functions, no HTTP, no cross-imports from app or spiderJobs.
tests/crawler_core/init.py: Empty file (just creates the package).
Important: Add sys.path setup to each test file since crawler_core is not yet pip-installed:
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', '..'))
Place this at the TOP of each test file before any crawler_core imports.
tests/crawler_core/test_boss_sign.py:
"""Unit tests for crawler_core.boss.sign — BossSign and helper functions.
All tests are pure function assertions: no HTTP, no network, no mocks.
"""
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', '..'))
import re
import pytest
from crawler_core.boss.sign import BossSign, _compute_checksum, _generate_uuid, _CHARS
class TestBossSignGenerateTraceid:
def test_traceid_format(self):
tid = BossSign.generate_traceid()
assert re.match(r'^M-W[0-9a-f]{13}[0-9a-zA-Z]{6}[0-9a-zA-Z]{3}$', tid), \
f"Traceid format wrong: {tid}"
def test_traceid_length(self):
tid = BossSign.generate_traceid()
assert len(tid) == 25, f"Expected 25 chars, got {len(tid)}: {tid}"
def test_traceid_custom_prefix(self):
tid = BossSign.generate_traceid(prefix="X-Y")
assert tid.startswith("X-Y"), f"Expected X-Y prefix, got: {tid}"
def test_traceid_uniqueness(self):
t1 = BossSign.generate_traceid()
t2 = BossSign.generate_traceid()
assert t1 != t2, "Two calls should return different traceids"
def test_bosssign_init_defaults(self):
sign = BossSign()
assert sign.mpt == ""
assert sign.wt2 == ""
def test_bosssign_init_with_tokens(self):
sign = BossSign(mpt="mpt_token", wt2="wt2_token")
assert sign.mpt == "mpt_token"
assert sign.wt2 == "wt2_token"
class TestComputeChecksum:
def test_checksum_length(self):
checksum = _compute_checksum("1234567890abc456789") # 19 chars
assert len(checksum) == 3, f"Expected 3 chars, got {len(checksum)}"
def test_checksum_chars_in_base62(self):
checksum = _compute_checksum("1234567890abc456789")
for ch in checksum:
assert ch in _CHARS, f"Char {ch!r} not in base62 set"
def test_checksum_deterministic(self):
uuid_str = "1234567890abc456789"
c1 = _compute_checksum(uuid_str)
c2 = _compute_checksum(uuid_str)
assert c1 == c2, "Same input must produce same checksum"
def test_checksum_differs_for_different_input(self):
# Different inputs should (almost always) produce different checksums
c1 = _compute_checksum("1234567890abc456789")
c2 = _compute_checksum("9876543210xyz456789")
# Not guaranteed to differ but extremely likely
# We test at least that they are valid 3-char strings
assert len(c1) == 3 and len(c2) == 3
class TestGenerateUuid:
def test_generate_uuid_length(self):
uuid = _generate_uuid()
assert len(uuid) == 19, f"Expected 19 chars, got {len(uuid)}: {uuid}"
def test_generate_uuid_hex_prefix(self):
uuid = _generate_uuid()
hex_part = uuid[:13]
assert re.match(r'^[0-9a-f]{13}$', hex_part), \
f"First 13 chars should be hex: {hex_part}"
def test_generate_uuid_base62_suffix(self):
uuid = _generate_uuid()
rand_part = uuid[13:]
for ch in rand_part:
assert ch in _CHARS, f"Char {ch!r} in random suffix not in base62"
tests/crawler_core/test_qcwy_sign.py:
"""Unit tests for crawler_core.qcwy.sign — Job51Sign.
All tests are pure function assertions: no HTTP, no network, no mocks.
"""
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', '..'))
import re
import pytest
from crawler_core.qcwy.sign import Job51Sign, SIGN_KEY
class TestJob51SignInit:
def test_default_sign_key(self):
signer = Job51Sign()
assert signer.sign_key == SIGN_KEY
assert len(SIGN_KEY) == 64 # 64-char hex key
def test_custom_sign_key(self):
custom_key = "a" * 64
signer = Job51Sign(sign_key=custom_key)
assert signer.sign_key == custom_key
class TestJob51SignBuildSignPath:
def setup_method(self):
self.signer = Job51Sign()
def test_returns_tuple_of_two_strings(self):
result = self.signer.build_sign_path("open/test")
assert isinstance(result, tuple)
assert len(result) == 2
assert all(isinstance(s, str) for s in result)
def test_get_path_format(self):
path, sign = self.signer.build_sign_path("open/test", "GET")
assert path.startswith("/open/test?api_key=51job×tamp="), \
f"Path format wrong: {path}"
def test_sign_hex_length(self):
_, sign = self.signer.build_sign_path("open/test")
assert len(sign) == 64, f"Sign should be 64-char hex, got {len(sign)}: {sign}"
def test_sign_hex_format(self):
_, sign = self.signer.build_sign_path("open/test")
assert re.match(r'^[0-9a-f]{64}$', sign), f"Sign not hex: {sign}"
def test_get_vs_post_different_sign(self):
_, get_sign = self.signer.build_sign_path("open/test", "GET")
_, post_sign = self.signer.build_sign_path("open/test", "POST", body={"k": "v"})
assert get_sign != post_sign, "GET and POST should produce different signatures"
def test_get_with_params_includes_params_in_path(self):
path, _ = self.signer.build_sign_path("open/test", "GET", params={"city": "shanghai"})
assert "city" in path and "shanghai" in path, \
f"Params should appear in path: {path}"
def test_sign_key_in_path(self):
path, _ = self.signer.build_sign_path("open/jobs")
assert "api_key=51job" in path, f"api_key=51job missing from path: {path}"
class TestJob51SignGenerateUuid:
def test_generate_uuid_is_string(self):
uuid = Job51Sign.generate_uuid()
assert isinstance(uuid, str)
def test_generate_uuid_length(self):
uuid = Job51Sign.generate_uuid()
# 13-char ms timestamp + 10-char random int = 23 chars
assert len(uuid) == 23, f"Expected 23 chars, got {len(uuid)}: {uuid}"
def test_generate_uuid_numeric(self):
uuid = Job51Sign.generate_uuid()
assert uuid.isdigit(), f"UUID should be all digits: {uuid}"
tests/crawler_core/test_zhilian_sign.py:
"""Unit tests for crawler_core.zhilian.sign — ZhilianSign.
All tests are pure function assertions: no HTTP, no network, no mocks.
"""
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', '..'))
import re
import pytest
from crawler_core.zhilian.sign import ZhilianSign
EXPECTED_HEADER_KEYS = {
"x-zp-at", "x-zp-rt", "x-zp-action-id", "x-zp-page-code",
"x-zp-version", "x-zp-channel", "x-zp-platform", "x-zp-device-id",
"x-zp-business-system",
}
EXPECTED_PARAM_KEYS = {"at", "rt", "channel", "platform", "version", "d"}
class TestZhilianSignInit:
def test_defaults(self):
sign = ZhilianSign()
assert sign.at == ""
assert sign.rt == ""
assert sign.version == "4.1.259"
assert sign.channel == "wxxiaochengxu"
assert sign.platform == "12"
assert sign.device_id # auto-generated, not empty
def test_custom_tokens(self):
sign = ZhilianSign(at="at_token", rt="rt_token")
assert sign.at == "at_token"
assert sign.rt == "rt_token"
def test_custom_device_id(self):
sign = ZhilianSign(device_id="CUSTOM-DEVICE-ID")
assert sign.device_id == "CUSTOM-DEVICE-ID"
def test_auto_device_id_is_uuid4_format(self):
sign = ZhilianSign()
uuid_pattern = r'^[0-9A-F]{8}-[0-9A-F]{4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$'
assert re.match(uuid_pattern, sign.device_id), \
f"device_id not UUID4 format: {sign.device_id}"
class TestZhilianSignHeaders:
def setup_method(self):
self.sign = ZhilianSign(at="at123", rt="rt456")
def test_keys_exactly_nine(self):
headers = self.sign.sign_headers()
assert set(headers.keys()) == EXPECTED_HEADER_KEYS, \
f"Header keys wrong: {set(headers.keys())}"
def test_business_system_is_73(self):
headers = self.sign.sign_headers()
assert headers["x-zp-business-system"] == "73"
def test_tokens_reflected(self):
headers = self.sign.sign_headers()
assert headers["x-zp-at"] == "at123"
assert headers["x-zp-rt"] == "rt456"
def test_action_id_is_uuid4_format(self):
headers = self.sign.sign_headers()
action_id = headers["x-zp-action-id"]
uuid_pattern = r'^[0-9A-F]{8}-[0-9A-F]{4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$'
assert re.match(uuid_pattern, action_id), \
f"action_id not UUID4 format: {action_id}"
def test_action_id_unique_per_call(self):
h1 = self.sign.sign_headers()
h2 = self.sign.sign_headers()
assert h1["x-zp-action-id"] != h2["x-zp-action-id"], \
"action_id must be freshly generated on each call"
def test_device_id_in_headers(self):
headers = self.sign.sign_headers()
assert headers["x-zp-device-id"] == self.sign.device_id
class TestZhilianSignParams:
def setup_method(self):
self.sign = ZhilianSign(at="at789", rt="rt012", device_id="DEV-ID")
def test_keys_exactly_six(self):
params = self.sign.sign_params()
assert set(params.keys()) == EXPECTED_PARAM_KEYS, \
f"Param keys wrong: {set(params.keys())}"
def test_device_id_as_d(self):
params = self.sign.sign_params()
assert params["d"] == "DEV-ID"
def test_tokens_reflected(self):
params = self.sign.sign_params()
assert params["at"] == "at789"
assert params["rt"] == "rt012"
class TestZhilianGenerateUuid:
def test_uuid4_format(self):
uuid = ZhilianSign.generate_uuid()
uuid_pattern = r'^[0-9A-F]{8}-[0-9A-F]{4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$'
assert re.match(uuid_pattern, uuid), \
f"UUID not UUID4 format: {uuid}"
def test_uuid_length(self):
uuid = ZhilianSign.generate_uuid()
assert len(uuid) == 36, f"Expected 36 chars, got {len(uuid)}"
def test_uuid_version_4(self):
uuid = ZhilianSign.generate_uuid()
assert uuid[14] == "4", f"Version digit should be 4, got: {uuid[14]}"
After creating all files, run the tests to verify they pass. If any test fails due to a mismatch between the test expectation and the actual sign algorithm behavior, fix the TEST (not the sign algorithm) to match the actual behavior — the sign algorithms are the ground truth.
cd /Users/win/2025/AICoding/JobData && python -m pytest tests/crawler_core/ -v --tb=short 2>&1 | tail -30
<acceptance_criteria>
- tests/crawler_core/__init__.py exists (even if empty)
- tests/crawler_core/test_boss_sign.py exists with at least 8 test functions
- tests/crawler_core/test_qcwy_sign.py exists with at least 7 test functions
- tests/crawler_core/test_zhilian_sign.py exists with at least 9 test functions
- python -m pytest tests/crawler_core/ -v exits 0 — ALL tests pass
- python -m pytest tests/crawler_core/ -v output contains "passed" and zero "failed" or "error"
- No test in any file makes HTTP requests or reads from files (pure function tests only)
- grep -r "requests\|httpx\|mock\|patch" tests/crawler_core/ returns empty (no mocking needed)
- Test count: python -m pytest tests/crawler_core/ --collect-only -q reports at least 24 tests collected
</acceptance_criteria>
Three test files cover all sign algorithm edge cases. pytest tests/crawler_core/ exits 0. No network access required.
cd /Users/win/2025/AICoding/JobData
# 1. All sign algorithms importable from crawler_core
python -c "
import sys
sys.path.insert(0, '.')
from crawler_core.boss.sign import BossSign
from crawler_core.qcwy.sign import Job51Sign
from crawler_core.zhilian.sign import ZhilianSign
print('All sign imports OK')
"
# 2. All tests pass
python -m pytest tests/crawler_core/ -v
# 3. No cross-contamination
grep -r "from spiderJobs" crawler_core/ && echo "FAIL" || echo "OK: no spiderJobs imports"
grep -r "import requests" crawler_core/boss/sign.py crawler_core/qcwy/sign.py crawler_core/zhilian/sign.py && echo "FAIL" || echo "OK: sign files have no HTTP imports"
# 4. Old files untouched
git diff --name-only spiderJobs/platforms/ && echo "FAIL: spiderJobs modified" || echo "OK: spiderJobs untouched"
<success_criteria>
python -m pytest tests/crawler_core/ -vexits 0 with at least 24 tests collected and passingfrom crawler_core.boss.sign import BossSignsucceeds (with repo root on sys.path)from crawler_core.qcwy.sign import Job51Signsucceedsfrom crawler_core.zhilian.sign import ZhilianSignsucceedsgrep -r "from spiderJobs" crawler_core/returns emptygrep -r "import requests" crawler_core/boss/sign.py crawler_core/qcwy/sign.py crawler_core/zhilian/sign.pyreturns emptygit diff --name-only spiderJobs/returns empty — old files untouched- All three sign.py files in crawler_core/ are under 120 lines each (no bloat) </success_criteria>