User: 'wir haben 1800 MCs erstellt um sie zu 10% zu nutzen — das ist
Schwachsinn'. Fixed all 6 gaps from the audit.
#1 max_controls=0 (was 20):
- agent_compliance_check_routes _check_single: passes max_controls=0 to
check_document_with_controls -> ALL MCs evaluated per doc_type.
- 8 doc_types now use 1874 MCs instead of 160 (10x coverage).
- Regex matching is cheap (<1s per doc); LLM-enrich cap of 10 stays.
#2 LLM-verify fixed:
- llm_verify.py was getting 0/N parsed. Causes: qwen3 thinking-mode
wrapped output in <think>...</think>, /api/generate doesn't enforce
JSON, prompt didn't handle code-fence wrappers.
- Now uses /api/chat with format='json' (forces valid JSON).
- _parse_batch_response strips <think> tags, accepts {results:[...]}
AND bare [...], adds richer regex-fallback parse, logs raw head on
total parse failure for diagnosis.
#3 Loeschkonzept checklist (new):
- doc_checks/loeschkonzept_checks.py — 9 L1 + 7 L2 checks per DIN 66398
+ Art. 5(1)(e)/17/32 DSGVO: scope+responsibility, data categories,
retention periods, legal basis refs (HGB/AO/BGB), deletion trigger,
deletion process+technical+systems, deletion proof, exceptions +
Art. 18 lock, review cycle, DSGVO references.
- runner.py registered for loeschkonzept/loeschung/loeschfristen.
#4 regulation backfill script:
- backend-compliance/scripts/backfill_mc_regulation.py — regex-detects
DSGVO/TDDDG/TMG/BGB/HGB/AO/MStV/UWG/VSBG/PAngV/GwG/BDSG/EU-VO
references in MC title+question+pass_criteria, UPDATEs regulation +
article fields.
- Idempotent (only NULL rows), --dry-run flag, batched 200/UPDATE.
- Run inside container: docker exec bp-compliance-backend python3 \
/app/scripts/backfill_mc_regulation.py
#5 MC alias-fallback:
- rag_document_checker._MC_ALIAS_FALLBACK maps doc_types without own
MCs to a related set: nutzungsbedingungen->agb, social_media->dse,
sub_processor/scc/tom_annex->avv, loeschfristen->loeschkonzept,
eu_institution/dsb->dse.
- _load_controls retries with the alias when the primary query
returns 0 rows.
- 14 additional doc_types now get MC coverage transparently.
#6 cross-domain auto-discovery:
- _autodiscover_missing builds a crawl plan: primary submitted base
+ up to 2 related domains sharing the owner SLD (e.g. BMW Group:
bmw.de + bmwgroup.com + bmwgroup.jobs).
- Detection: regex over submitted texts for https?://...<owner>...
hostnames distinct from the primary base.
- Each crawled base contributes documents + cmp_payloads to the
discovery pool.
Net effect for BMW: 1874 MCs evaluated (90 from cookie alone, was
20), Loeschkonzept Pflichtangaben benoten-bar, LLM overturns false
regex FAILs, Joint-Controller policies on bmwgroup.jobs (Social
Media) jetzt entdeckbar. Same wins will apply to CRA-Compliance check.