Clean-Room derivation of 195 controls from BSI QUAIDAL (10 criteria + 15
building blocks + 30 measures + 140 metrics) for EU AI Act Art. 10
training-data quality compliance.
- ingest_bsi_quaidal.py parses YAML frontmatter into a structural index
(no protected prose stored on disk).
- derive_quaidal_mcs.py rewrites each entry via local LLM (qwen3.5:35b-a3b)
with a hard 4-gram plagiarism gate < 20%; achieved mean overlap 0.5%.
- Migration 011 adds compliance.derived_controls table with full source
provenance (framework, section, url, commit SHA, license note).
- apply_quaidal_to_db.py UPSERTs YAML into DB.
- Source repo (legal-sources/bsi-quaidal/) gitignored.
Same pattern as IACE module DIN-reference handling: name the norm and
section, never quote.
Backed by BSI license clarification 2026-05: § 5 UrhG anwendbar,
share:true im Frontmatter; Clean-Room derivation is the safe path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>