feat(agent): MC scorecard + audit drill-down + tenant trend (A1-A6)

Now that all 1874 MCs run per check (Task #30 cap removal), the report was about to drown in noise. This commit adds the full aggregation / persistence / drill-down stack so each MC is actionable, not just counted. A1 mc_scorecard.py (new): build_scorecard(checks) -> per-regulation PASS/FAIL/SKIP + severity top_fails(checks, n) -> N most severe failed MCs full_audit_records(...) -> flat rows ready for sidecar SQLite A2 Email rendering: agent_doc_check_scorecard.py (new) builds an HTML scorecard table (regulation × passed/failed/HIGH/MEDIUM/score) shown at the top of the email. agent_doc_check_report._render_document now collapses the 500-MC L2 forest into 'X/Y bestanden (Z Fail)' summary plus a top-10 fails block per doc — old verbose render is gone. A3 compliance_audit_log.py (new) — sidecar SQLite at /data/compliance_audits.db (separate from compliance Postgres schema to comply with the no-new-migrations rule in CLAUDE.md): check_runs(check_id, ts, tenant_id, site_name, base_domain, doc_count, scorecard json, vvt_summary json) mc_results(check_id, doc_type, mc_id, label, passed, skipped, severity, regulation, matched_text, hint) Route persists every run after the email is sent. docker-compose.yml adds compliance-audit volume + env. A4 backfill_mc_regulation_llm.py (new) — Qwen-tagged backfill for the 1636 MCs the regex pass couldn't classify. Batches of 25, format=json, output constrained to the canonical regulation list. Run manually: docker exec bp-compliance-backend python3 \ /app/scripts/backfill_mc_regulation_llm.py [--dry-run] A5 Admin audit tab — GET /api/compliance/agent/audit/<check_id> proxied via /api/sdk/v1/agent/audit/<id>. New page /sdk/agent/audit/[checkId] renders scorecard + filterable MC table (status / doc_type / regulation, expandable rows with matched_text + hint). ComplianceCheckTab now shows 'Voll-Audit oeffnen' link. A6 Trend per tenant — GET /api/compliance/agent/audit/tenant/<id> returns recent runs. Email scorecard shows per-regulation delta badges ('(+12%)', '(-3%)') compared with the previous run for the same tenant + base_domain. Lookup is one SQLite query. Plumbing: rag_document_checker.py — SELECT now includes 'article'; MC results carry 'regulation' + 'article' through to CheckItem. agent_doc_check_routes.CheckItem schema gains regulation + article fields (defaults '') so old clients still parse. agent_compliance_check_routes — response gains 'check_id' so the frontend can build the audit link.
2026-05-17 13:45:58 +02:00
parent 6d29191e9b
commit 6ed30dae5b
12 changed files with 1159 additions and 10 deletions
@@ -245,6 +245,38 @@ def _render_document(html: list[str], r: DocCheckResult) -> None:
        html.append('<div style="padding:8px 16px 12px">')
        for c in l1_checks:
            _render_l1_check(html, c, l2_by_parent.get(c.id, []))
+
+        # Master-Control aggregation: with 1874 MCs evaluated per run,
+        # rendering every L2 check inline produces ~600 rows per doc and
+        # makes the email unreadable. Show only top-N severe fails plus a
+        # one-line summary. Full results live in /sdk/agent/audit/<id>.
+        from compliance.api.agent_doc_check_scorecard import build_top_fails_html
+        from compliance.services.mc_scorecard import top_fails
+
+        mc_results = [
+            {"id": c.id, "label": c.label, "passed": c.passed,
+             "severity": c.severity, "skipped": c.skipped, "hint": c.hint,
+             "regulation": c.regulation}
+            for c in r.checks
+            if c.id.startswith("mc-")
+        ]
+        if mc_results:
+            n_total = len(mc_results)
+            n_passed = sum(1 for x in mc_results if x["passed"])
+            n_skipped = sum(1 for x in mc_results if x["skipped"])
+            n_failed = n_total - n_passed - n_skipped
+            html.append(
+                f'<div style="margin-top:12px;padding-top:8px;'
+                f'border-top:1px solid #e5e7eb;font-size:11px;color:#475569">'
+                f'<strong>Master-Controls:</strong> {n_passed}/'
+                f'{n_total - n_skipped} bestanden '
+                f'<span style="color:#dc2626">({n_failed} Fail)</span>'
+                f'{f" + {n_skipped} nicht anwendbar" if n_skipped else ""}.'
+                f'</div>'
+            )
+            top = top_fails(mc_results, n=10)
+            html.append(build_top_fails_html(top, r.label))
+
        if r.word_count:
            html.append(
                f'<div style="font-size:11px;color:#9ca3af;margin-top:8px;'