c2c8783fee
Phase-5 split of agent_compliance_check_routes.py — the 2700-line
monolith was decomposed into 19 modules in compliance/api/agent_check/:
- Phase A-F: resolve / profile+check / banner+TCF / vendors raw+finalize /
HTML blocks top+mid+bot / email / persist
- Helpers: _constants, _helpers, _fetch, _discovery, _single_check
- Schemas + State + thin _orchestrator
A1 ZIP-Anhang nativ in _phase_e_email: evidence_zip_builder.py bundles
slices + manifest.json + audit_metadata.json (SHA256 per slice +
build_sha + source_url). smtp_sender.py erweitert um attachments-Parameter.
B1 COOKIE-CONSENT-UX-001 (Mobile Reachability): consent_reachability_check.py
parses footer anchors, classifies intent (reopen_cmp / info_only /
browser_deflect) + target (same_page_cmp / new_tab / external).
_b1_wiring.py fetches homepage with iPhone-UA + renders Art-7-Abs-3
severity-coloured block.
B3 TH-RETENTION (Cross-Doc Speicherdauer): retention_comparator.py
compares DSI claim ↔ cookie-table duration ↔ actual Max-Age/expires
with 5% tolerance + severity hierarchy (dsi_under_actual HIGH,
table_under_actual HIGH, dsi_vs_table MEDIUM, actual_under_table LOW
Safari-ITP-Hint). _b3_wiring.py + Top-10 mismatches table in mail.
Side-effects:
- Fixed silent UnboundLocalError in original Step 5 (gf_one_pager used
audit_quality_findings before declaration, caught by surrounding
except → block never rendered). New _phase_d3_blocks_bot.py runs
audit-quality FIRST.
- agent_compliance_check_routes.py removed from loc-exceptions.txt
("Phase 5 split target" — done).
Tests: 55/55 grün (B1 22 + B3 27 + saving_scan 6).
E2E: smoke against Elli DSE+Cookie produced HIGH/missing B1 finding,
TH-RETENTION table (17 cookies / 3 ✓ / 3 ✗ / 11 ?), evidence-zip
with 2 slices + manifest + audit_metadata (12089B, SHA256-chained,
source verified), email sent (attachments=1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
59 lines
1.9 KiB
Python
59 lines
1.9 KiB
Python
"""Shared state for the compliance-check pipeline.
|
|
|
|
The 7-step pipeline accumulates ~60 named values that flow across
|
|
phases (doc_entries, profile, results, banner_result, cmp_vendors,
|
|
scorecard, HTML blocks, …). Rather than threading 60 parameters
|
|
through each function, we pass one mutable `CheckState` dict.
|
|
|
|
Phases read what they need with `state[key]` and write their outputs
|
|
with `state[key] = value`. This is intentionally untyped: enforcing
|
|
strict typing would require freezing the schema before all phases
|
|
landed, and the report-building phase routinely adds new optional
|
|
keys (P1, P10, P50, P59b, P82, P103, P104, P106, …).
|
|
|
|
`CheckState.new(check_id, req)` initialises the dict with the few
|
|
keys that must exist from the start.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
|
|
def new_state(check_id: str, req) -> dict:
|
|
"""Create a fresh state dict for a check run.
|
|
|
|
Pre-populates a few keys that downstream phases assume exist
|
|
(e.g. `cmp_vendors` defaulting to `[]`).
|
|
"""
|
|
return {
|
|
"check_id": check_id,
|
|
"req": req,
|
|
# Phase-1 outputs
|
|
"doc_texts": {},
|
|
"doc_entries": [],
|
|
"url_text_cache": {},
|
|
"pasted_table_vendors": [],
|
|
"placement_findings": [],
|
|
# Phase-2/3/4 outputs
|
|
"profile": None,
|
|
"profile_dict": {},
|
|
"results": [],
|
|
"total_findings": 0,
|
|
"business_scope": set(),
|
|
"banner_result": None,
|
|
"banner_url": "",
|
|
"tcf_vendors": [],
|
|
"vvt_entries": [],
|
|
"extracted_profile": {},
|
|
# Phase-5 outputs
|
|
"cmp_vendors": [],
|
|
"cookie_audit": {},
|
|
"cookie_evidence_slices": None,
|
|
"cookie_evidence_meta": None,
|
|
"scorecard": {},
|
|
"full_html": "",
|
|
"audit_quality_findings": [],
|
|
# Phase-6/7 outputs
|
|
"email_result": {"status": "skipped"},
|
|
"site_name": "",
|
|
}
|