Files
breakpilot-compliance/backend-compliance/compliance/api/agent_check/_b6b7b8_wiring.py
T
Benjamin Admin d0e3621192 feat(audit): V2 mail render + 5 new findings (B4/B5/B6/B7/B8) + LLM-Plausibility-Phase
Mail Render V2 (compliance/services/mail_render_v2/) — 11-Modul-Subpackage
das einen einheitlichen Audit-Mail-Output erzeugt mit:
  - Header + KPI-Kacheln (Score / Findings / Docs / Vendors)
  - TOC + Sprung-Links
  - 3-Bucket-Trennung: Kritische Befunde / Manuelle Prüfung / Interne Reminder
  - Cookie-Inventar (Name·Vendor·Kategorie·Speicherdauer·Löschfrist·Sitzland·Quelle·Status)
  - Sofortmaßnahmen-Aggregator ("Sitzland ergänzen für 11 Cookies")
  - 24 Legacy-Wrappers — alle alten build_*_html in V2-Sections
  - Scope-Filter: FIN/GOV/MED/INS/EDU/LEG aus Berichten wenn nicht relevant
  - Hint/Action-Dedup: keine doppelten Sätze pro Card mehr
Aktiviert via env MAIL_RENDER_V2=true (Default: legacy renderer).

5 neue deterministische Findings als Phase D-2b/B4/B5/B6/B7/B8:

  B4 vendor_consistency_check — Cross-Doc-Provider-Widerspruch
     (Elli: DSE nennt Vertex AI für Chatbot, /de/cookies nennt Iadvize → HIGH).
     6 Service-Types: chatbot/analytics/tag_manager/pixel/cdn/cmp.

  B5 ai_act_transparency_check — AI Act Art. 50 Transparenzpflicht
     (Elli: Vertex AI vorhanden ohne Pre-Chat-Disclosure → HIGH).
     Plus B5-Erweiterung: Rechtsgrundlage Art-6-Abs-1-lit-f bei AI → MED
     (Einwilligung empfehlen).

  B6 cross_doc_dpo_check — DPO in DSE genannt, nicht im Impressum (LOW).

  B7 doc_staleness_check — Datum-Extraktion aus DSE/AGB/Nutzungsbedingungen.
     Cap: AGB/NB 3y, DSE 2y. Älter → MEDIUM (Elli NB Stand 2018 → HIGH).

  B8 cmp_fingerprint_check — Banner detected, aber CMP-Provider generic
     (kein Usercentrics/OneTrust/Cookiebot/etc → MED).

  B3-Erweiterung detect_intra_doc_contradictions — Widersprüchliche
     Speicherdauer im SELBEN Doc (Elli: Logfile 7d vs 30d → HIGH).

LLM-Plausibility-Phase (Phase D-2b, finding_plausibility_check.py):
  - Läuft AFTER MC pipeline, BEFORE D3 render
  - Prompt mit Beispiel-IDs + 3-Phase-Mapping: exact-ID / position-fallback /
    fuzzy-tail-match
  - Stempelt llm_title / llm_severity / llm_recommendation / llm_drop auf
    jeden FAIL CheckItem
  - V2-Render zeigt "🤖 LLM-Plausibility:" Box pro Finding wenn gestempelt
  - KNOWN ISSUE: qwen3:30b-a3b liefert oft empty content auf format='json' +
    8000-char-excerpt prompts. Pipeline läuft mit stamped=0 weiter. Task #16.

Coverage gegen Elli Ground Truth (zeroclaw/docs/ground-truth/elli_eco_2026-06-06.json,
13 expected findings via WebFetch-Agent-Crawl):
  - 4/4 HIGH-Findings ✓ (COOKIE-CONSENT-UX-001 + WIDERRUFSBELEHRUNG-001 +
    VENDOR-CONSISTENCY-001 + AI-ACT-TRANSPARENCY-001)
  - 4/6 MEDIUM ✓
  - 2/3 LOW ✓
  - Total: 10/13 = 77% (Sprung von 4/13 = 31%)

Restliche 3 Gaps als Task #17: IMPRESSUM-001 (multi-entity USt-IdNr),
TRANSFER-001 (Vendor-Mechanismus DPF/SCC), TH-RETENTION-002 (AI-Retention
pro Datenkategorie).

V2-Mail-Preview in Mailpit: 'v2all@local.test' Subject '[V2 ALL] ELLI'.
Backend healthy, B1+B3+B4+B5+B6+B7+B8 alle live im Orchestrator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 21:19:49 +02:00

98 lines
3.6 KiB
Python

"""B6 / B7 / B8 wiring — DPO-cross-doc, doc-staleness, CMP-fingerprint.
Three small, deterministic checks added after B5. Each writes one or
more findings into `state["extra_findings"]` and a tiny HTML block
into `state["extra_findings_html"]` that the V2 renderer concatenates
between B5 (AI-Act) and the legacy section block.
"""
from __future__ import annotations
import html
import logging
from compliance.services.cmp_fingerprint_check import check_cmp_fingerprint
from compliance.services.cross_doc_dpo_check import check_dpo_cross_doc
from compliance.services.doc_staleness_check import check_staleness
logger = logging.getLogger(__name__)
def run_b6b7b8(state: dict) -> None:
findings: list[dict] = []
dpo = check_dpo_cross_doc(state)
if dpo:
findings.append(dpo)
stale = check_staleness(state)
findings.extend(stale)
cmp = check_cmp_fingerprint(state)
if cmp:
findings.append(cmp)
state["extra_findings"] = findings
if findings:
state["extra_findings_html"] = _render(findings)
logger.info(
"B6/B7/B8 extra: %d findings (DPO=%d, staleness=%d, CMP=%d)",
len(findings), 1 if dpo else 0, len(stale), 1 if cmp else 0,
)
def _render(findings: list[dict]) -> str:
cards = []
for f in findings:
sev = (f.get("severity") or "").upper()
color = "#dc2626" if sev == "HIGH" else (
"#f59e0b" if sev == "MEDIUM" else "#64748b"
)
evidence_html = ""
if f.get("evidence_dse"):
evidence_html = (
"<div style='font-size:12px;color:#475569;margin-top:6px;'>"
f"<em>In DSE: {html.escape(', '.join(f['evidence_dse']))}</em>"
"</div>"
)
if f.get("doc_date"):
evidence_html = (
"<div style='font-size:12px;color:#475569;margin-top:6px;'>"
f"<em>Stand: {html.escape(f['doc_date'])} "
f"({f.get('age_years','?')} Jahre alt, Cap "
f"{f.get('threshold_years','?')} Jahre)</em>"
"</div>"
)
if f.get("detected_provider"):
evidence_html = (
"<div style='font-size:12px;color:#475569;margin-top:6px;'>"
f"<em>Erkannter Provider: "
f"{html.escape(f['detected_provider'])}</em>"
"</div>"
)
cards.append(
f"<div style='margin:12px 0;padding:14px;background:#fff;"
f"border-left:3px solid {color};border-radius:4px;'>"
f"<div style='font-weight:600;color:{color};font-size:14px;'>"
f"{sev} · {html.escape(f.get('check_id') or '')}</div>"
f"<div style='font-size:14px;margin-top:4px;'>"
f"<strong>{html.escape(f.get('title') or '')}</strong></div>"
f"<div style='font-size:12px;color:#64748b;margin-top:2px;'>"
f"{html.escape(f.get('norm') or '')}</div>"
f"{evidence_html}"
f"<div style='font-size:13px;margin-top:8px;background:#dcfce7;"
f"padding:8px 10px;border-radius:4px;'>"
f"<strong>→ Empfehlung:</strong> "
f"{html.escape(f.get('action') or '')}</div>"
"</div>"
)
return (
"<div style='margin:24px 0;padding:16px;border-left:4px solid #f59e0b;"
"background:#fffbeb;border-radius:4px;'>"
"<h2 style='margin:0 0 8px;color:#92400e;font-size:16px;'>"
"📌 Zusätzliche Cross-Doc-Befunde (DPO / Staleness / CMP-Fingerprint)"
"</h2>"
+ "".join(cards) +
"</div>"
)