feat(agent): MC scorecard + audit drill-down + tenant trend (A1-A6)

Now that all 1874 MCs run per check (Task #30 cap removal), the report was about to drown in noise. This commit adds the full aggregation / persistence / drill-down stack so each MC is actionable, not just counted. A1 mc_scorecard.py (new): build_scorecard(checks) -> per-regulation PASS/FAIL/SKIP + severity top_fails(checks, n) -> N most severe failed MCs full_audit_records(...) -> flat rows ready for sidecar SQLite A2 Email rendering: agent_doc_check_scorecard.py (new) builds an HTML scorecard table (regulation × passed/failed/HIGH/MEDIUM/score) shown at the top of the email. agent_doc_check_report._render_document now collapses the 500-MC L2 forest into 'X/Y bestanden (Z Fail)' summary plus a top-10 fails block per doc — old verbose render is gone. A3 compliance_audit_log.py (new) — sidecar SQLite at /data/compliance_audits.db (separate from compliance Postgres schema to comply with the no-new-migrations rule in CLAUDE.md): check_runs(check_id, ts, tenant_id, site_name, base_domain, doc_count, scorecard json, vvt_summary json) mc_results(check_id, doc_type, mc_id, label, passed, skipped, severity, regulation, matched_text, hint) Route persists every run after the email is sent. docker-compose.yml adds compliance-audit volume + env. A4 backfill_mc_regulation_llm.py (new) — Qwen-tagged backfill for the 1636 MCs the regex pass couldn't classify. Batches of 25, format=json, output constrained to the canonical regulation list. Run manually: docker exec bp-compliance-backend python3 \ /app/scripts/backfill_mc_regulation_llm.py [--dry-run] A5 Admin audit tab — GET /api/compliance/agent/audit/<check_id> proxied via /api/sdk/v1/agent/audit/<id>. New page /sdk/agent/audit/[checkId] renders scorecard + filterable MC table (status / doc_type / regulation, expandable rows with matched_text + hint). ComplianceCheckTab now shows 'Voll-Audit oeffnen' link. A6 Trend per tenant — GET /api/compliance/agent/audit/tenant/<id> returns recent runs. Email scorecard shows per-regulation delta badges ('(+12%)', '(-3%)') compared with the previous run for the same tenant + base_domain. Lookup is one SQLite query. Plumbing: rag_document_checker.py — SELECT now includes 'article'; MC results carry 'regulation' + 'article' through to CheckItem. agent_doc_check_routes.CheckItem schema gains regulation + article fields (defaults '') so old clients still parse. agent_compliance_check_routes — response gains 'check_id' so the frontend can build the audit link.
2026-05-17 13:45:58 +02:00
parent 6d29191e9b
commit 6ed30dae5b
12 changed files with 1159 additions and 10 deletions
@@ -428,10 +428,50 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
        scanned_html = build_scanned_urls_html(doc_entries)
        providers_html = build_provider_list_html(banner_result, vvt_entries)
        vvt_html = build_vvt_table_html(cmp_vendors)
+
+        # MC scorecard aggregated across ALL docs in this run (DSGVO/TDDDG/
+        # BGB/...). Sits at the top so the GF sees the regulation-by-
+        # regulation view before drilling into per-doc details.
+        from compliance.services.mc_scorecard import build_scorecard
+        from .agent_doc_check_scorecard import build_scorecard_html
+        all_mc_checks: list[dict] = []
+        for r in results:
+            for c in r.checks:
+                if c.id.startswith("mc-"):
+                    all_mc_checks.append({
+                        "id": c.id, "label": c.label, "passed": c.passed,
+                        "severity": c.severity, "skipped": c.skipped,
+                        "regulation": c.regulation,
+                    })
+        scorecard = build_scorecard(all_mc_checks) if all_mc_checks else {}
+        # Trend: load previous scorecard for the same tenant + domain so the
+        # email can show delta indicators (A6).
+        prev_scorecard: dict | None = None
+        if scorecard:
+            try:
+                from compliance.services.compliance_audit_log import (
+                    list_runs_for_tenant,
+                )
+                tenant_id_for_trend = req.recipient or ""
+                base_domain_for_trend = _extract_domain(doc_entries) or ""
+                prev_runs = list_runs_for_tenant(
+                    tenant_id_for_trend,
+                    base_domain=base_domain_for_trend,
+                    limit=1,
+                )
+                if prev_runs:
+                    prev_scorecard = prev_runs[0].get("scorecard")
+            except Exception as e:
+                logger.debug("trend lookup skipped: %s", e)
+        scorecard_html = (
+            build_scorecard_html(scorecard, previous_scorecard=prev_scorecard)
+            if scorecard else ""
+        )
+
        report_html = build_html_report(results, None)
        profile_html = _build_profile_html(profile)
        full_html = (
-            summary_html + scanned_html + profile_html
+            summary_html + scanned_html + profile_html + scorecard_html
            + providers_html + vvt_html + report_html
        )

@@ -452,6 +492,7 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):

        # Step 7: Store result
        response = {
+            "check_id": check_id,
            "results": [_result_to_dict(r) for r in results],
            "business_profile": profile_dict,
            "extracted_profile": extracted_profile,
@@ -474,6 +515,45 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
        _compliance_check_jobs[check_id]["progress"] = "Fertig"
        _compliance_check_jobs[check_id]["progress_pct"] = 100

+        # Persist to sidecar SQLite audit log — enables /audit endpoints
+        # (A5 admin tab) and trend view (A6). Best-effort; failures here
+        # do not affect the user-facing response.
+        try:
+            from compliance.services.compliance_audit_log import record_check_run
+            from compliance.services.mc_scorecard import full_audit_records
+            audit_rows: list[dict] = []
+            for r in results:
+                doc_mc = [c for c in r.checks if c.id.startswith("mc-")]
+                audit_rows.extend(full_audit_records(
+                    [{"id": c.id, "label": c.label, "passed": c.passed,
+                      "severity": c.severity, "skipped": c.skipped,
+                      "regulation": c.regulation, "matched_text": c.matched_text,
+                      "hint": c.hint, "level": c.level}
+                     for c in doc_mc],
+                    check_id=check_id,
+                    doc_type=r.doc_type,
+                ))
+            record_check_run(
+                check_id=check_id,
+                tenant_id=req.recipient or "",
+                site_name=site_name,
+                base_domain=domain or "",
+                doc_count=doc_count,
+                scorecard=scorecard,
+                vvt_summary={
+                    "total": len(cmp_vendors),
+                    "internal": sum(1 for v in cmp_vendors
+                                    if (v.get("recipient_type") or "").upper()
+                                    in ("INTERNAL", "GROUP_COMPANY")),
+                    "external": sum(1 for v in cmp_vendors
+                                    if (v.get("recipient_type") or "").upper()
+                                    in ("PROCESSOR", "CONTROLLER")),
+                },
+                mc_records=audit_rows,
+            )
+        except Exception as e:
+            logger.warning("Audit persistence skipped: %s", e)
+
    except Exception as e:
        logger.error("Compliance check %s failed: %s", check_id, e, exc_info=True)
        _compliance_check_jobs[check_id]["status"] = "failed"
@@ -1060,3 +1140,51 @@ def _build_profile_html(profile) -> str:

 # Cross-check extracted to compliance.services.banner_cookie_cross_check
 from compliance.services.banner_cookie_cross_check import cross_check_banner_vs_cookie as _cross_check_banner_vs_cookie
+
+
+# ── Admin: audit drill-down (A5) + trend view (A6) ──────────────────
+
+@router.get("/audit/{check_id}")
+async def audit_drill_down(
+    check_id: str,
+    doc_type: str = "",
+    regulation: str = "",
+    only_failed: bool = False,
+):
+    """Return scorecard + filterable MC results for a single check run.
+
+    Frontend uses this to render the /sdk/agent/audit/<check_id> view.
+    """
+    from compliance.services.compliance_audit_log import (
+        get_check_run, list_mc_results,
+    )
+    run = get_check_run(check_id)
+    if not run:
+        return {"check_id": check_id, "found": False}
+    rows = list_mc_results(
+        check_id,
+        doc_type=doc_type or None,
+        regulation=regulation or None,
+        only_failed=only_failed,
+    )
+    return {
+        "check_id": check_id,
+        "found": True,
+        "run": run,
+        "mc_count": len(rows),
+        "results": rows,
+    }
+
+
+@router.get("/audit/tenant/{tenant_id}")
+async def audit_tenant_history(
+    tenant_id: str,
+    base_domain: str = "",
+    limit: int = 30,
+):
+    """Tenant-level history for the trend view (A6)."""
+    from compliance.services.compliance_audit_log import list_runs_for_tenant
+    runs = list_runs_for_tenant(
+        tenant_id, base_domain=base_domain or None, limit=limit,
+    )
+    return {"tenant_id": tenant_id, "count": len(runs), "runs": runs}
@@ -245,6 +245,38 @@ def _render_document(html: list[str], r: DocCheckResult) -> None:
        html.append('<div style="padding:8px 16px 12px">')
        for c in l1_checks:
            _render_l1_check(html, c, l2_by_parent.get(c.id, []))
+
+        # Master-Control aggregation: with 1874 MCs evaluated per run,
+        # rendering every L2 check inline produces ~600 rows per doc and
+        # makes the email unreadable. Show only top-N severe fails plus a
+        # one-line summary. Full results live in /sdk/agent/audit/<id>.
+        from compliance.api.agent_doc_check_scorecard import build_top_fails_html
+        from compliance.services.mc_scorecard import top_fails
+
+        mc_results = [
+            {"id": c.id, "label": c.label, "passed": c.passed,
+             "severity": c.severity, "skipped": c.skipped, "hint": c.hint,
+             "regulation": c.regulation}
+            for c in r.checks
+            if c.id.startswith("mc-")
+        ]
+        if mc_results:
+            n_total = len(mc_results)
+            n_passed = sum(1 for x in mc_results if x["passed"])
+            n_skipped = sum(1 for x in mc_results if x["skipped"])
+            n_failed = n_total - n_passed - n_skipped
+            html.append(
+                f'<div style="margin-top:12px;padding-top:8px;'
+                f'border-top:1px solid #e5e7eb;font-size:11px;color:#475569">'
+                f'<strong>Master-Controls:</strong> {n_passed}/'
+                f'{n_total - n_skipped} bestanden '
+                f'<span style="color:#dc2626">({n_failed} Fail)</span>'
+                f'{f" + {n_skipped} nicht anwendbar" if n_skipped else ""}.'
+                f'</div>'
+            )
+            top = top_fails(mc_results, n=10)
+            html.append(build_top_fails_html(top, r.label))
+
        if r.word_count:
            html.append(
                f'<div style="font-size:11px;color:#9ca3af;margin-top:8px;'
@@ -53,6 +53,10 @@ class CheckItem(BaseModel):
    parent: str | None = None
    skipped: bool = False
    hint: str = ""
+    # Regulation + article are filled for MC-sourced items (e.g. 'DSGVO'
+    # + 'Art. 13 Abs. 1 lit. a'). Used by the mc_scorecard aggregator.
+    regulation: str = ""
+    article: str = ""


 class DocCheckResult(BaseModel):
@@ -0,0 +1,137 @@
+"""
+Email rendering for the Master-Control scorecard + top-fails summary.
+
+With all 1874 MCs now evaluated per run (#30 cap removed), the report
+must summarise rather than dump everything. This module produces:
+
+  - build_scorecard_html(scorecard) — compact regulation-by-regulation
+    table at the top of the email
+  - build_top_fails_html(fails)     — top-N severe MC fails as a small
+    cards block underneath
+"""
+
+from __future__ import annotations
+
+
+def build_scorecard_html(
+    scorecard: dict,
+    previous_scorecard: dict | None = None,
+) -> str:
+    """Render the MC scorecard as an HTML table.
+
+    Expects the dict returned by mc_scorecard.build_scorecard. When
+    `previous_scorecard` is passed (the run right before this one for
+    the same tenant + domain), each row shows a delta indicator
+    ('+12%', '-3%') so the DSB sees direction-of-travel at a glance.
+    """
+    if not scorecard:
+        return ""
+    rows = scorecard.get("by_regulation") or []
+    totals = scorecard.get("totals") or {}
+    if not rows:
+        return ""
+
+    prev_by_reg: dict[str, int] = {}
+    prev_total_pct: int | None = None
+    if previous_scorecard:
+        prev_total_pct = int((previous_scorecard.get("totals") or {}).get("pct") or 0)
+        for r in (previous_scorecard.get("by_regulation") or []):
+            prev_by_reg[r.get("regulation", "")] = int(r.get("pct", 0))
+
+    overall_pct = int(totals.get("pct", 0))
+    overall_color = ("#16a34a" if overall_pct >= 80 else
+                     "#d97706" if overall_pct >= 50 else "#dc2626")
+    trend_str = _delta_badge(overall_pct, prev_total_pct) if prev_total_pct is not None else ""
+
+    head = (
+        '<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
+        'max-width:700px;margin:0 auto 16px;padding:12px 16px;'
+        'background:#f0f9ff;border:1px solid #bae6fd;border-radius:8px">'
+        '<h3 style="margin:0 0 6px;font-size:14px;color:#0369a1">'
+        'MC-Scorecard (Pflichtangaben aus Master-Controls)</h3>'
+        f'<p style="margin:0 0 10px;font-size:11px;color:#475569">'
+        f'<strong style="color:{overall_color};font-size:13px">{overall_pct}%</strong>{trend_str} '
+        f'gesamt &middot; {totals.get("passed", 0)} bestanden, '
+        f'{totals.get("failed", 0)} nicht bestanden '
+        f'({totals.get("total", 0)} MCs ausgewertet, '
+        f'{totals.get("skipped", 0)} nicht anwendbar).</p>'
+    )
+
+    body = [
+        '<table style="width:100%;border-collapse:collapse;font-size:11px">'
+        '<thead><tr style="background:#dbeafe;color:#1e40af;text-align:left">'
+        '<th style="padding:5px 8px">Regulation</th>'
+        '<th style="padding:5px 8px;text-align:center">Bestanden</th>'
+        '<th style="padding:5px 8px;text-align:center">Fail</th>'
+        '<th style="padding:5px 8px;text-align:center">HIGH</th>'
+        '<th style="padding:5px 8px;text-align:center">MEDIUM</th>'
+        '<th style="padding:5px 8px;text-align:right">Score</th>'
+        '</tr></thead><tbody>'
+    ]
+    for r in rows:
+        pct = int(r.get("pct", 0))
+        color = ("#16a34a" if pct >= 80 else
+                 "#d97706" if pct >= 50 else "#dc2626")
+        sev = r.get("severity") or {}
+        prev = prev_by_reg.get(r["regulation"])
+        trend = _delta_badge(pct, prev)
+        body.append(
+            f'<tr style="border-top:1px solid #bfdbfe">'
+            f'<td style="padding:5px 8px;color:#1e293b">{r["regulation"]}</td>'
+            f'<td style="padding:5px 8px;text-align:center;color:#16a34a">'
+            f'{r["passed"]}</td>'
+            f'<td style="padding:5px 8px;text-align:center;color:#dc2626">'
+            f'{r["failed"]}</td>'
+            f'<td style="padding:5px 8px;text-align:center;color:#dc2626">'
+            f'{sev.get("HIGH", 0) + sev.get("CRITICAL", 0)}</td>'
+            f'<td style="padding:5px 8px;text-align:center;color:#d97706">'
+            f'{sev.get("MEDIUM", 0)}</td>'
+            f'<td style="padding:5px 8px;text-align:right;font-weight:600;'
+            f'color:{color}">{pct}%{trend}</td>'
+            f'</tr>'
+        )
+    body.append('</tbody></table></div>')
+    return head + "".join(body)
+
+
+def _delta_badge(current: int, previous: int | None) -> str:
+    """Render a small ±N% badge next to a percentage when previous is known."""
+    if previous is None or previous == current:
+        return ""
+    delta = current - previous
+    if delta > 0:
+        color, sign = "#16a34a", "+"
+    else:
+        color, sign = "#dc2626", ""
+    return (f' <span style="font-size:10px;color:{color};font-weight:500">'
+            f'({sign}{delta}%)</span>')
+
+
+def build_top_fails_html(fails: list[dict], doc_label: str) -> str:
+    """Render top-N severe MC fails as a compact card list per document."""
+    if not fails:
+        return ""
+    out = [
+        '<div style="margin:10px 0 8px;padding:10px 12px;'
+        'background:#fef2f2;border-left:3px solid #fca5a5;border-radius:4px">'
+        f'<strong style="font-size:12px;color:#991b1b">'
+        f'Top-Auffaelligkeiten in {doc_label} ({len(fails)})</strong>'
+        '<ul style="margin:6px 0 0 18px;padding:0;font-size:11px;'
+        'color:#7f1d1d">'
+    ]
+    for f in fails:
+        sev = f.get("severity") or "MEDIUM"
+        reg = f.get("regulation") or ""
+        reg_str = f' <span style="color:#94a3b8">[{reg}]</span>' if reg else ""
+        label = f.get("label") or "Unnamed"
+        hint = (f.get("hint") or "")[:200]
+        out.append(
+            f'<li style="margin-bottom:4px">'
+            f'<span style="color:#dc2626;font-weight:600">{sev}</span>'
+            f'{reg_str} &mdash; {label}'
+            + (f'<div style="font-size:10px;color:#94a3b8;margin-top:1px">'
+               f'{hint}</div>' if hint else "")
+            + '</li>'
+        )
+    out.append('</ul></div>')
+    return "".join(out)