feat(audit-pipeline): P72-v2 Heuristik nachgeschaerft + P80 Mini-Replay-Endpoint

P72-v2 MC-Scope-Classifier Heuristik v2 — v1 hatte 79% 'other'-Bucket (Patterns zu strict). v2 deckt deutlich breiter ab: - DSE: Art. 13/14 + Betroffenenrechte (Art. 15-22) + DSB + Aufsichtsbehoerde + Speicherdauer + besondere Kategorien - TOM: Art. 32 + Verschluesselung/Backup/Pseudonymisierung + Zugriffskontrolle + ISO 27001 + BSI-Grundschutz + Audit-Log - cookie_richtlinie: Tracking-Pixel + Webstorage + GA/Matomo/ Hotjar/Pixel/GTM - process: VVT (Art. 30) + DSFA (Art. 35) + Datenpannen (Art. 33/34) + HinSchG + Schulungen + Loeschkonzept Script `backfill_mc_scope_v2.py` re-classifiziert NUR den 'other'-Bucket (spezifische v1-Buckets bleiben unangetastet). P80 Mini-Replay-Endpoint (v1): POST /compliance-check/snapshots/{id}/replay ?recipient=foo@bar.com & dry_run=false Laedt Snapshot, rendert Mail mit AKTUELLEM Render-Code (P63-P67, P59b/P61/P62). Sendet [REPLAY]-prefixed Mail oder gibt nur HTML-Stats zurueck (dry_run). Effekt: 7min Re-Scan -> 2-5sec fuer Mail-Layout-Iterationen. v2 (spaeter): MC-Scorecard mit aktuellem scope_doc_type-Filter ueber Snapshot — erfordert _run_compliance_check Refactoring. Plus Bugfix: GET /snapshots/{id} raised jetzt HTTPException statt Tuple-Return (FastAPI hat Tuple als JSON-Array zurueckgegeben). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 10:21:56 +02:00
parent cde670617e
commit 4946571863
3 changed files with 431 additions and 1 deletions
@@ -0,0 +1,147 @@
+"""
+P80 — Replay-Pipeline (Mini-Version v1).
+
+Lädt einen persistierten Snapshot und rendert die Audit-Mail mit dem
+AKTUELLEN Mail-Render-Code neu. Nutzbar fuer:
+  * Mail-Layout-Aenderungen (P63-P67, P82 1-Pager, P84 Diff-Mode) testen
+  * Action-Recipes anpassen
+  * Disclaimer-Text iterieren
+  * Pattern-Notice-Logik tunen
+
+NICHT enthalten (kommt in v2):
+  * MC-Scorecard re-run mit aktuellem scope_doc_type-Filter (P72) —
+    erfordert MC-Pipeline-Refactoring aus _run_compliance_check
+  * Vendor-Redundancy-Analyse re-run
+
+Effekt v1: 7min Re-Scan -> 2-5 Sek fuer Mail-Layout-Iterationen.
+Effekt v2 (spaeter): auch fuer MC-Filter-Tests.
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from sqlalchemy.orm import Session
+
+from compliance.services.check_snapshot import load_snapshot
+
+logger = logging.getLogger(__name__)
+
+
+def replay_from_snapshot(
+    db: Session,
+    snapshot_id: str,
+    recipient: str | None = None,
+    dry_run: bool = False,
+) -> dict:
+    """Replay audit mail render from snapshot.
+
+    Args:
+        db: SQLAlchemy session
+        snapshot_id: UUID of snapshot to replay
+        recipient: Override email recipient. None = skip send.
+        dry_run: If True, render HTML but do not send mail.
+
+    Returns:
+        {"snapshot_id", "html_size", "sections", "mail_sent", "preview"}
+    """
+    snap = load_snapshot(db, snapshot_id)
+    if not snap:
+        return {"error": "snapshot not found", "snapshot_id": snapshot_id}
+
+    doc_entries = snap.get("doc_entries") or []
+    banner_result = snap.get("banner_result") or {}
+    profile_dict = snap.get("profile") or {}
+    cmp_vendors = snap.get("cmp_vendors") or []
+    site_label = snap.get("site_label") or snap.get("site_domain")
+
+    # Reconstruct doc_texts mapping (was the input to mail-render)
+    doc_texts: dict[str, str] = {}
+    for e in doc_entries:
+        dt = e.get("doc_type", "")
+        txt = (e.get("full_text") or e.get("text_preview") or "").strip()
+        if dt and txt:
+            doc_texts[dt] = txt
+
+    # Build results list mock (just enough for mail-render)
+    from compliance.services.doc_checks.runner import DocCheckResult
+
+    def _dict_to_result(d: dict) -> Any:
+        """Best-effort reconstruction. Snapshot didn't persist DocCheckResult
+        so we fake minimal fields. For real MC-replay (v2) we'd re-run the
+        check_document_completeness function against the snapshot text."""
+        return type("R", (), {
+            "doc_type": d.get("doc_type", "other"),
+            "label": d.get("doc_type", "Dokument"),
+            "completeness_pct": d.get("completeness_pct", 0),
+            "correctness_pct": d.get("correctness_pct"),
+            "checks": [],
+            "error": d.get("error", ""),
+        })()
+
+    results = [_dict_to_result(e) for e in doc_entries]
+
+    # Render mail sections
+    section_sizes: dict[str, int] = {}
+    parts: list[str] = []
+
+    try:
+        from compliance.api.agent_doc_check_critical import build_critical_findings_html
+        critical_html = build_critical_findings_html(banner_result, None, results) or ""
+        parts.append(critical_html)
+        section_sizes["critical"] = len(critical_html)
+    except Exception as e:
+        logger.warning("Replay: critical-block failed: %s", e)
+
+    try:
+        from compliance.api.scope_disclaimer import build_scope_disclaimer_html
+        disclaimer = build_scope_disclaimer_html()
+        parts.append(disclaimer)
+        section_sizes["disclaimer"] = len(disclaimer)
+    except Exception as e:
+        logger.warning("Replay: disclaimer failed: %s", e)
+
+    try:
+        from compliance.api.agent_doc_check_banner import build_banner_deep_html
+        banner_html = build_banner_deep_html(banner_result) or ""
+        parts.append(banner_html)
+        section_sizes["banner"] = len(banner_html)
+    except Exception as e:
+        logger.warning("Replay: banner-block failed: %s", e)
+
+    try:
+        from compliance.api.vvt_table_renderer import build_vvt_table_html
+        vvt_html = build_vvt_table_html(cmp_vendors) or ""
+        parts.append(vvt_html)
+        section_sizes["vvt"] = len(vvt_html)
+    except Exception as e:
+        logger.warning("Replay: vvt failed: %s", e)
+
+    full_html = "".join(parts)
+
+    result = {
+        "snapshot_id": snapshot_id,
+        "check_id":    snap.get("check_id"),
+        "site_domain": snap.get("site_domain"),
+        "html_size":   len(full_html),
+        "sections":    section_sizes,
+        "mail_sent":   False,
+        "preview":     full_html[:500] + "..." if len(full_html) > 500 else full_html,
+    }
+
+    if recipient and not dry_run:
+        try:
+            from compliance.services.email_sender import send_email
+            email_res = send_email(
+                recipient=recipient,
+                subject=f"[REPLAY] {site_label} (Snapshot {snapshot_id[:8]})",
+                body_html=full_html,
+            )
+            result["mail_sent"] = (email_res.get("status") == "sent")
+            result["mail_status"] = email_res.get("status")
+        except Exception as e:
+            logger.warning("Replay: mail send failed: %s", e)
+            result["mail_send_error"] = str(e)[:200]
+
+    return result