Files
breakpilot-compliance/backend-compliance/compliance/api/agent_check/_state.py
T
Benjamin Admin c2c8783fee refactor(agent-check): split routes file (2692→347 LOC) + wire B1/B3/A1 [guardrail-change]
Phase-5 split of agent_compliance_check_routes.py — the 2700-line
monolith was decomposed into 19 modules in compliance/api/agent_check/:

  - Phase A-F: resolve / profile+check / banner+TCF / vendors raw+finalize /
    HTML blocks top+mid+bot / email / persist
  - Helpers: _constants, _helpers, _fetch, _discovery, _single_check
  - Schemas + State + thin _orchestrator

A1 ZIP-Anhang nativ in _phase_e_email: evidence_zip_builder.py bundles
slices + manifest.json + audit_metadata.json (SHA256 per slice +
build_sha + source_url). smtp_sender.py erweitert um attachments-Parameter.

B1 COOKIE-CONSENT-UX-001 (Mobile Reachability): consent_reachability_check.py
parses footer anchors, classifies intent (reopen_cmp / info_only /
browser_deflect) + target (same_page_cmp / new_tab / external).
_b1_wiring.py fetches homepage with iPhone-UA + renders Art-7-Abs-3
severity-coloured block.

B3 TH-RETENTION (Cross-Doc Speicherdauer): retention_comparator.py
compares DSI claim ↔ cookie-table duration ↔ actual Max-Age/expires
with 5% tolerance + severity hierarchy (dsi_under_actual HIGH,
table_under_actual HIGH, dsi_vs_table MEDIUM, actual_under_table LOW
Safari-ITP-Hint). _b3_wiring.py + Top-10 mismatches table in mail.

Side-effects:
- Fixed silent UnboundLocalError in original Step 5 (gf_one_pager used
  audit_quality_findings before declaration, caught by surrounding
  except → block never rendered). New _phase_d3_blocks_bot.py runs
  audit-quality FIRST.
- agent_compliance_check_routes.py removed from loc-exceptions.txt
  ("Phase 5 split target" — done).

Tests: 55/55 grün (B1 22 + B3 27 + saving_scan 6).
E2E: smoke against Elli DSE+Cookie produced HIGH/missing B1 finding,
TH-RETENTION table (17 cookies / 3 ✓ / 3 ✗ / 11 ?), evidence-zip
with 2 slices + manifest + audit_metadata (12089B, SHA256-chained,
source verified), email sent (attachments=1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 14:47:25 +02:00

59 lines
1.9 KiB
Python

"""Shared state for the compliance-check pipeline.
The 7-step pipeline accumulates ~60 named values that flow across
phases (doc_entries, profile, results, banner_result, cmp_vendors,
scorecard, HTML blocks, …). Rather than threading 60 parameters
through each function, we pass one mutable `CheckState` dict.
Phases read what they need with `state[key]` and write their outputs
with `state[key] = value`. This is intentionally untyped: enforcing
strict typing would require freezing the schema before all phases
landed, and the report-building phase routinely adds new optional
keys (P1, P10, P50, P59b, P82, P103, P104, P106, …).
`CheckState.new(check_id, req)` initialises the dict with the few
keys that must exist from the start.
"""
from __future__ import annotations
def new_state(check_id: str, req) -> dict:
"""Create a fresh state dict for a check run.
Pre-populates a few keys that downstream phases assume exist
(e.g. `cmp_vendors` defaulting to `[]`).
"""
return {
"check_id": check_id,
"req": req,
# Phase-1 outputs
"doc_texts": {},
"doc_entries": [],
"url_text_cache": {},
"pasted_table_vendors": [],
"placement_findings": [],
# Phase-2/3/4 outputs
"profile": None,
"profile_dict": {},
"results": [],
"total_findings": 0,
"business_scope": set(),
"banner_result": None,
"banner_url": "",
"tcf_vendors": [],
"vvt_entries": [],
"extracted_profile": {},
# Phase-5 outputs
"cmp_vendors": [],
"cookie_audit": {},
"cookie_evidence_slices": None,
"cookie_evidence_meta": None,
"scorecard": {},
"full_html": "",
"audit_quality_findings": [],
# Phase-6/7 outputs
"email_result": {"status": "skipped"},
"site_name": "",
}