feat(agent): MC scorecard + audit drill-down + tenant trend (A1-A6)

Now that all 1874 MCs run per check (Task #30 cap removal), the report
was about to drown in noise. This commit adds the full aggregation /
persistence / drill-down stack so each MC is actionable, not just
counted.

A1 mc_scorecard.py (new):
  build_scorecard(checks)    -> per-regulation PASS/FAIL/SKIP + severity
  top_fails(checks, n)       -> N most severe failed MCs
  full_audit_records(...)    -> flat rows ready for sidecar SQLite

A2 Email rendering:
  agent_doc_check_scorecard.py (new) builds an HTML scorecard table
  (regulation × passed/failed/HIGH/MEDIUM/score) shown at the top of
  the email. agent_doc_check_report._render_document now collapses
  the 500-MC L2 forest into 'X/Y bestanden (Z Fail)' summary plus
  a top-10 fails block per doc — old verbose render is gone.

A3 compliance_audit_log.py (new) — sidecar SQLite at
  /data/compliance_audits.db (separate from compliance Postgres
  schema to comply with the no-new-migrations rule in CLAUDE.md):
    check_runs(check_id, ts, tenant_id, site_name, base_domain,
               doc_count, scorecard json, vvt_summary json)
    mc_results(check_id, doc_type, mc_id, label, passed, skipped,
               severity, regulation, matched_text, hint)
  Route persists every run after the email is sent.
  docker-compose.yml adds compliance-audit volume + env.

A4 backfill_mc_regulation_llm.py (new) — Qwen-tagged backfill for
  the 1636 MCs the regex pass couldn't classify. Batches of 25,
  format=json, output constrained to the canonical regulation list.
  Run manually: docker exec bp-compliance-backend python3 \
                 /app/scripts/backfill_mc_regulation_llm.py [--dry-run]

A5 Admin audit tab — GET /api/compliance/agent/audit/<check_id>
  proxied via /api/sdk/v1/agent/audit/<id>. New page
  /sdk/agent/audit/[checkId] renders scorecard + filterable MC table
  (status / doc_type / regulation, expandable rows with matched_text
  + hint). ComplianceCheckTab now shows 'Voll-Audit oeffnen' link.

A6 Trend per tenant — GET /api/compliance/agent/audit/tenant/<id>
  returns recent runs. Email scorecard shows per-regulation delta
  badges ('(+12%)', '(-3%)') compared with the previous run for the
  same tenant + base_domain. Lookup is one SQLite query.

Plumbing:
  rag_document_checker.py — SELECT now includes 'article'; MC results
    carry 'regulation' + 'article' through to CheckItem.
  agent_doc_check_routes.CheckItem schema gains regulation + article
    fields (defaults '') so old clients still parse.
  agent_compliance_check_routes — response gains 'check_id' so the
    frontend can build the audit link.
This commit is contained in:
Benjamin Admin
2026-05-17 13:45:58 +02:00
parent 6d29191e9b
commit 6ed30dae5b
12 changed files with 1159 additions and 10 deletions
@@ -245,6 +245,38 @@ def _render_document(html: list[str], r: DocCheckResult) -> None:
html.append('<div style="padding:8px 16px 12px">')
for c in l1_checks:
_render_l1_check(html, c, l2_by_parent.get(c.id, []))
# Master-Control aggregation: with 1874 MCs evaluated per run,
# rendering every L2 check inline produces ~600 rows per doc and
# makes the email unreadable. Show only top-N severe fails plus a
# one-line summary. Full results live in /sdk/agent/audit/<id>.
from compliance.api.agent_doc_check_scorecard import build_top_fails_html
from compliance.services.mc_scorecard import top_fails
mc_results = [
{"id": c.id, "label": c.label, "passed": c.passed,
"severity": c.severity, "skipped": c.skipped, "hint": c.hint,
"regulation": c.regulation}
for c in r.checks
if c.id.startswith("mc-")
]
if mc_results:
n_total = len(mc_results)
n_passed = sum(1 for x in mc_results if x["passed"])
n_skipped = sum(1 for x in mc_results if x["skipped"])
n_failed = n_total - n_passed - n_skipped
html.append(
f'<div style="margin-top:12px;padding-top:8px;'
f'border-top:1px solid #e5e7eb;font-size:11px;color:#475569">'
f'<strong>Master-Controls:</strong> {n_passed}/'
f'{n_total - n_skipped} bestanden '
f'<span style="color:#dc2626">({n_failed} Fail)</span>'
f'{f" + {n_skipped} nicht anwendbar" if n_skipped else ""}.'
f'</div>'
)
top = top_fails(mc_results, n=10)
html.append(build_top_fails_html(top, r.label))
if r.word_count:
html.append(
f'<div style="font-size:11px;color:#9ca3af;margin-top:8px;'