Now that all 1874 MCs run per check (Task #30 cap removal), the report
was about to drown in noise. This commit adds the full aggregation /
persistence / drill-down stack so each MC is actionable, not just
counted.
A1 mc_scorecard.py (new):
build_scorecard(checks) -> per-regulation PASS/FAIL/SKIP + severity
top_fails(checks, n) -> N most severe failed MCs
full_audit_records(...) -> flat rows ready for sidecar SQLite
A2 Email rendering:
agent_doc_check_scorecard.py (new) builds an HTML scorecard table
(regulation × passed/failed/HIGH/MEDIUM/score) shown at the top of
the email. agent_doc_check_report._render_document now collapses
the 500-MC L2 forest into 'X/Y bestanden (Z Fail)' summary plus
a top-10 fails block per doc — old verbose render is gone.
A3 compliance_audit_log.py (new) — sidecar SQLite at
/data/compliance_audits.db (separate from compliance Postgres
schema to comply with the no-new-migrations rule in CLAUDE.md):
check_runs(check_id, ts, tenant_id, site_name, base_domain,
doc_count, scorecard json, vvt_summary json)
mc_results(check_id, doc_type, mc_id, label, passed, skipped,
severity, regulation, matched_text, hint)
Route persists every run after the email is sent.
docker-compose.yml adds compliance-audit volume + env.
A4 backfill_mc_regulation_llm.py (new) — Qwen-tagged backfill for
the 1636 MCs the regex pass couldn't classify. Batches of 25,
format=json, output constrained to the canonical regulation list.
Run manually: docker exec bp-compliance-backend python3 \
/app/scripts/backfill_mc_regulation_llm.py [--dry-run]
A5 Admin audit tab — GET /api/compliance/agent/audit/<check_id>
proxied via /api/sdk/v1/agent/audit/<id>. New page
/sdk/agent/audit/[checkId] renders scorecard + filterable MC table
(status / doc_type / regulation, expandable rows with matched_text
+ hint). ComplianceCheckTab now shows 'Voll-Audit oeffnen' link.
A6 Trend per tenant — GET /api/compliance/agent/audit/tenant/<id>
returns recent runs. Email scorecard shows per-regulation delta
badges ('(+12%)', '(-3%)') compared with the previous run for the
same tenant + base_domain. Lookup is one SQLite query.
Plumbing:
rag_document_checker.py — SELECT now includes 'article'; MC results
carry 'regulation' + 'article' through to CheckItem.
agent_doc_check_routes.CheckItem schema gains regulation + article
fields (defaults '') so old clients still parse.
agent_compliance_check_routes — response gains 'check_id' so the
frontend can build the audit link.
User: 'wir haben 1800 MCs erstellt um sie zu 10% zu nutzen — das ist
Schwachsinn'. Fixed all 6 gaps from the audit.
#1 max_controls=0 (was 20):
- agent_compliance_check_routes _check_single: passes max_controls=0 to
check_document_with_controls -> ALL MCs evaluated per doc_type.
- 8 doc_types now use 1874 MCs instead of 160 (10x coverage).
- Regex matching is cheap (<1s per doc); LLM-enrich cap of 10 stays.
#2 LLM-verify fixed:
- llm_verify.py was getting 0/N parsed. Causes: qwen3 thinking-mode
wrapped output in <think>...</think>, /api/generate doesn't enforce
JSON, prompt didn't handle code-fence wrappers.
- Now uses /api/chat with format='json' (forces valid JSON).
- _parse_batch_response strips <think> tags, accepts {results:[...]}
AND bare [...], adds richer regex-fallback parse, logs raw head on
total parse failure for diagnosis.
#3 Loeschkonzept checklist (new):
- doc_checks/loeschkonzept_checks.py — 9 L1 + 7 L2 checks per DIN 66398
+ Art. 5(1)(e)/17/32 DSGVO: scope+responsibility, data categories,
retention periods, legal basis refs (HGB/AO/BGB), deletion trigger,
deletion process+technical+systems, deletion proof, exceptions +
Art. 18 lock, review cycle, DSGVO references.
- runner.py registered for loeschkonzept/loeschung/loeschfristen.
#4 regulation backfill script:
- backend-compliance/scripts/backfill_mc_regulation.py — regex-detects
DSGVO/TDDDG/TMG/BGB/HGB/AO/MStV/UWG/VSBG/PAngV/GwG/BDSG/EU-VO
references in MC title+question+pass_criteria, UPDATEs regulation +
article fields.
- Idempotent (only NULL rows), --dry-run flag, batched 200/UPDATE.
- Run inside container: docker exec bp-compliance-backend python3 \
/app/scripts/backfill_mc_regulation.py
#5 MC alias-fallback:
- rag_document_checker._MC_ALIAS_FALLBACK maps doc_types without own
MCs to a related set: nutzungsbedingungen->agb, social_media->dse,
sub_processor/scc/tom_annex->avv, loeschfristen->loeschkonzept,
eu_institution/dsb->dse.
- _load_controls retries with the alias when the primary query
returns 0 rows.
- 14 additional doc_types now get MC coverage transparently.
#6 cross-domain auto-discovery:
- _autodiscover_missing builds a crawl plan: primary submitted base
+ up to 2 related domains sharing the owner SLD (e.g. BMW Group:
bmw.de + bmwgroup.com + bmwgroup.jobs).
- Detection: regex over submitted texts for https?://...<owner>...
hostnames distinct from the primary base.
- Each crawled base contributes documents + cmp_payloads to the
discovery pool.
Net effect for BMW: 1874 MCs evaluated (90 from cookie alone, was
20), Loeschkonzept Pflichtangaben benoten-bar, LLM overturns false
regex FAILs, Joint-Controller policies on bmwgroup.jobs (Social
Media) jetzt entdeckbar. Same wins will apply to CRA-Compliance check.
New agent architecture for intelligent MC evaluation:
agent_tools.py (367 LOC):
- 5 tools in OpenAI function-calling format
- query_controls: async DB query for MCs by doc_type
- evaluate_controls_batch: deterministic keyword matching
- search_document: text search with context
- get_document_stats: word count, sections, language
- submit_results: finalize check results
compliance_agent.py (398 LOC):
- ComplianceAgent class with agent loop
- 3 LLM providers: Ollama, OpenAI-compatible (OVH), Anthropic
- Tool call dispatch + result collection
- System prompt for systematic compliance analysis
- run_compliance_check() convenience function
Hybrid mode:
- COMPLIANCE_USE_AGENT=false (default): deterministic regex
- COMPLIANCE_USE_AGENT=true: LLM agent with tool calling
- Agent fallback to regex if LLM unavailable
Works with Qwen 35B (Ollama), Qwen 120B (OVH vLLM), Claude.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deterministic pass/fail stays unchanged. After keyword checking,
ONE batched LLM call enriches the top 10 severity FAILs with
context-specific recommendations based on the actual document.
Example: If document uses Google Analytics but lacks transfer
mechanism → LLM generates: "Sie nutzen Google Analytics (USA).
Ergaenzen Sie einen Verweis auf das EU-US Data Privacy Framework
und pruefen Sie die DPF-Zertifizierung unter dataprivacyframework.gov."
- Pass/fail: deterministic (keyword matching, reproducible)
- Hint enrichment: LLM (contextual, one call for all fails)
- Temperature 0.3 for consistency
- Graceful fallback if Ollama unavailable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaced LLM-based MC verification with deterministic keyword matching:
- Extracts keywords from pass_criteria/fail_criteria
- Matches against document text via regex (case-insensitive)
- PASS if >= 60% of criteria keywords found AND no fail_criteria triggered
- Same text + same MCs = same result every time
Checks ALL MCs for the doc_type (max_controls=0):
- DSE: all 571 controls checked in <1 second
- Impressum: all 75 controls
- Cookie: all 381 controls
No LLM calls needed — purely deterministic keyword matching.
Bigram extraction for compound terms (e.g. "standardvertragsklauseln").
Stop word filtering for German legal text.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewritten rag_document_checker.py to use doc_check_controls table
instead of generic canonical_controls. Each MC has:
- check_question: binary YES/NO for LLM
- pass_criteria: JSONB list of concrete requirements
- fail_criteria: JSONB list of common mistakes
Flow: Regex checks (fast) → LLM verify FAILs → MC deep check (15 per doc)
MC results appear as additional L2 checks in the report.
Coverage: 571 DSE, 381 Cookie, 309 Loeschkonzept, 153 Widerruf,
147 DSFA, 125 AVV, 113 AGB, 75 Impressum = 1.874 total.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Only use controls whose test_procedure mentions document-type-specific terms:
- DSI: test_procedure must contain 'datenschutzerkl' or 'art. 13/14'
- Cookie: must contain 'cookie', 'einwilligung', 'consent'
- Impressum: must contain 'impressum'
This filters out internal governance controls (Datenmodelle, Infrastruktur)
that are irrelevant for public document checks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete rewrite of rag_document_checker.py:
- Queries canonical_controls table (294K controls, 10K data_protection)
- Filters by category + title keywords per document type
- Uses test_procedure field as actual check instructions
- Regex pre-check extracts key terms from procedure → fast match
- LLM fallback only for regex misses (saves tokens)
- /no_think prefix for direct JSON output
SQL approach advantages:
- Structured data with test_procedure, pass_criteria, fail_criteria
- Category filtering (data_protection, compliance, governance)
- No Qdrant API key issues
- Controls are actual check criteria, not general legal texts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LLM returns {fulfilled: true} instead of {"fulfilled": true}.
Now fixes unquoted keys, True→true, and falls back to text-based
boolean extraction when JSON parsing fails entirely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Qwen 3.5 uses all tokens for thinking, leaving response empty.
Using /no_think prefix to get direct JSON output.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Qwen 3.5 with latest Ollama returns structured thinking in separate
'thinking' field, leaving 'response' empty. Now checks both fields.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces scroll+filter approach with proper semantic search:
1. Embed query via bp-core-embedding-service (bge-m3, 1024 dim)
2. Vector search in Qdrant (bp_compliance_datenschutz + bp_compliance_gesetze)
3. Sort by cosine similarity score
4. No API key needed — local Qdrant on Mac Mini
Falls back gracefully: SDK first, then semantic Qdrant, then empty.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Go SDK points to external Qdrant (qdrant-dev.breakpilot.ai) with expired API key.
Fallback: search directly in local Qdrant (bp-core-qdrant:6333) which has
all collections: bp_compliance_datenschutz, bp_compliance_gesetze, atomic_controls_dedup.
Search strategy:
1. Try Go SDK RAG endpoint (preferred, has embedding-based search)
2. Fallback: Qdrant scroll with text-based regulation filter
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New module: rag_document_checker.py
- Searches RAG (Qdrant) for controls relevant to document type
- Filters by regulation (DSGVO Art.13, TDDDG §25, BGB §355 etc.)
- LLM (Qwen 3.5:35b) verifies each control against document text
- Returns fulfilled/missing with evidence text + severity
- Supports: DSI, Cookie, Impressum, Widerruf, AGB, DSFA, AVV, Loeschkonzept
Integration in doc-check endpoint:
- Regex checklist runs first (fast, deterministic)
- RAG checks run after (semantic, catches what regex misses)
- Both results combined in single response
LLM prompt returns JSON: {fulfilled, evidence, issue, severity}
Think-tags stripped, JSON extracted from response.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>