# Canonical Control Library (CP-CLIB) Eigenstaendig formulierte Security Controls basierend auf offenem Wissen (OWASP, NIST, ENISA). Unabhaengige Taxonomie — kein Bezug zu proprietaeren Frameworks. **Prefix:** `CP-CLIB` · **Frontend:** `https://macmini:3007/sdk/control-library` **Provenance Wiki:** `https://macmini:3007/sdk/control-provenance` **Proxy:** `/api/sdk/v1/canonical` → `backend-compliance:8002/api/v1/canonical/...` --- ## Motivation Wir benoetigen ein System, um aus verschiedenen Security-Guidelines **eigenstaendige, rechtlich defensible Controls** zu extrahieren, ohne proprietaere Texte im Produkt zu verwenden. ### Kernprinzipien 1. **Unabhaengige Taxonomie** — Eigene Domain-IDs (AUTH, NET, SUP, etc.), eigenes ID-Format (`DOMAIN-NNN`) 2. **Open-Source-Verankerung** — Jedes Control hat mindestens 1 Open Anchor (OWASP/NIST/ENISA) 3. **Strikte Quellentrennung** — Geschuetzte Quellen nur intern zur Analyse, nie im Produkt 4. **Automatisierte Pruefung** — Too-Close-Detektor + No-Leak-Scanner in CI/CD --- ## Rechtliche Basis | Gesetz | Bezug | |--------|-------| | UrhG §44b | Text & Data Mining — Kopien loeschen | | UrhG §23 | Hinreichender Abstand zum Originalwerk | | BSI Nutzungsbedingungen | Kommerziell nur mit Zustimmung | --- ## Domains (Unabhaengige Taxonomie) | Domain | Name | Beschreibung | |--------|------|-------------| | AUTH | Identity & Access Management | Authentisierung, MFA, Token-Management | | NET | Network & Transport Security | TLS, Zertifikate, Netzwerk-Haertung | | SUP | Software Supply Chain | Signierung, SBOM, Dependency-Scanning | | LOG | Security Operations & Logging | Privacy-Aware Logging, SIEM | | WEB | Web Application Security | Admin-Flows, Account Recovery | | DATA | Data Governance & Classification | Datenklassifikation, Schutzmassnahmen | | CRYP | Cryptographic Operations | Key Management, Rotation, HSM | | REL | Release & Change Governance | Change Impact Assessment, Security Review | !!! warning "Keine BSI-Nomenklatur" Die Domains verwenden bewusst KEINE BSI-Bezeichner (O.Auth_*, O.Netz_*). Das ID-Format `DOMAIN-NNN` ist eine gaengige, nicht-proprietaere Konvention. --- ## Datenmodell (Migration 044) ```mermaid erDiagram canonical_control_licenses ||--o{ canonical_control_sources : "hat" canonical_control_frameworks ||--o{ canonical_controls : "enthaelt" canonical_controls ||--o{ canonical_control_mappings : "hat" canonical_control_sources ||--o{ canonical_control_mappings : "referenziert" canonical_control_licenses { varchar license_id PK varchar name varchar commercial_use boolean deletion_required } canonical_control_sources { uuid id PK varchar source_id UK varchar title boolean allowed_ship_in_product } canonical_control_frameworks { uuid id PK varchar framework_id UK varchar name varchar version } canonical_controls { uuid id PK uuid framework_id FK varchar control_id varchar severity jsonb open_anchors } canonical_control_mappings { uuid id PK uuid control_id FK uuid source_id FK varchar mapping_type varchar attribution_class } ``` ### Tabellen | Tabelle | Zweck | Produktfaehig? | |---------|-------|----------------| | `canonical_control_licenses` | Lizenz-Metadaten | Ja (read-only) | | `canonical_control_sources` | Quellen-Register | **Nein** (nur intern) | | `canonical_control_frameworks` | Framework-Registry | Ja | | `canonical_controls` | Die eigentlichen Controls | Ja | | `canonical_control_mappings` | Provenance-Trail | **Nein** (nur Audit) | --- ## API Endpoints | Methode | Pfad | Beschreibung | |---------|------|--------------| | `GET` | `/v1/canonical/frameworks` | Alle Frameworks | | `GET` | `/v1/canonical/frameworks/{id}` | Framework-Details | | `GET` | `/v1/canonical/frameworks/{id}/controls` | Controls eines Frameworks | | `GET` | `/v1/canonical/controls` | Alle Controls (Filter: `severity`, `domain`, `release_state`) | | `GET` | `/v1/canonical/controls/{control_id}` | Einzelnes Control (z.B. AUTH-001) | | `GET` | `/v1/canonical/sources` | Quellenregister mit Berechtigungen | | `GET` | `/v1/canonical/licenses` | Lizenz-Matrix | | `POST` | `/v1/canonical/controls/{id}/similarity-check` | Too-Close-Pruefung | ### Beispiel: Control abrufen ```bash curl -s https://macmini:8002/api/v1/canonical/controls/AUTH-001 | jq ``` ### Beispiel: Similarity Check ```bash curl -X POST https://macmini:8002/api/v1/canonical/controls/AUTH-001/similarity-check \ -H 'Content-Type: application/json' \ -d '{ "source_text": "Die Anwendung muss MFA implementieren.", "candidate_text": "Privileged accounts require multi-factor authentication." }' | jq ``` **Response:** ```json { "max_exact_run": 0, "token_overlap": 0.0714, "ngram_jaccard": 0.0323, "embedding_cosine": 0.0, "lcs_ratio": 0.0714, "status": "PASS", "details": { "max_exact_run": "PASS", "token_overlap": "PASS", "ngram_jaccard": "PASS", "embedding_cosine": "PASS", "lcs_ratio": "PASS" } } ``` --- ## Too-Close-Detektor 5 Metriken mit Schwellwerten: | Metrik | Warn | Fail | Beschreibung | |--------|------|------|-------------| | Exact Phrase | ≥8 Tokens | ≥12 Tokens | Laengste identische Token-Sequenz | | Token Overlap | ≥0.20 | ≥0.30 | Jaccard der Token-Mengen | | 3-Gram Jaccard | ≥0.10 | ≥0.18 | Zeichenketten-Aehnlichkeit | | Embedding Cosine | ≥0.86 | ≥0.92 | Semantische Aehnlichkeit (bge-m3) | | LCS Ratio | ≥0.35 | ≥0.50 | Longest Common Subsequence | **Entscheidungslogik:** - **PASS** — Kein Fail + max 1 Warn - **WARN** — Max 2 Warn, kein Fail → Human Review - **FAIL** — Irgendein Fail → Block, Umformulierung noetig --- ## License Gate Jede Quelle hat definierte Berechtigungen: | Nutzungsart | Spalte | Beispiel OWASP | Beispiel BSI | |-------------|--------|---------------|-------------| | Analyse | `allowed_analysis` | ✅ | ✅ | | Excerpt speichern | `allowed_store_excerpt` | ✅ | ❌ | | Embeddings shippen | `allowed_ship_embeddings` | ✅ | ❌ | | Im Produkt shippen | `allowed_ship_in_product` | ✅ | ❌ | --- ## CI/CD Validation Der Validator (`scripts/validate-controls.py`) prueft bei jedem Commit: 1. **Schema Validation** — Alle Pflichtfelder, ID-Format, Severity 2. **No-Leak Scanner** — Regex gegen BSI-Muster (`O.Auth_*`, `TR-03161`, etc.) 3. **Open Anchor Check** — Jedes Control hat ≥1 Open Anchor 4. **Taxonomy Check** — Keine BSI-style ID-Prefixe 5. **Evidence Structure** — Alle Evidence-Items haben `type` + `description` --- ## Frontend ### Control Library Browser (`/sdk/control-library`) - Framework-Info mit Version und Beschreibung - Filterable Control-Tabelle (Domain, Severity, Freitext) - Detail-Ansicht mit: Ziel, Begruendung, Anforderungen, Pruefverfahren, Nachweise - **Open-Source-Referenzen** prominent dargestellt (gruener Kasten) - Tags und Scope-Informationen ### Control Provenance Wiki (`/sdk/control-provenance`) - Dokumentation der Methodik - Unabhaengige Taxonomie erklaert - Offene Referenzquellen aufgelistet - Geschuetzte Quellen und Trennungsprinzip - **Live-Daten:** Lizenz-Matrix und Quellenregister aus der Datenbank --- ## Dateien | Datei | Typ | Beschreibung | |-------|-----|-------------| | `backend-compliance/migrations/044_canonical_control_library.sql` | SQL | 5 Tabellen + Seed-Daten | | `backend-compliance/compliance/api/canonical_control_routes.py` | Python | REST API (8 Endpoints) | | `backend-compliance/compliance/services/license_gate.py` | Python | Lizenz-Gate-Logik | | `backend-compliance/compliance/services/similarity_detector.py` | Python | Too-Close-Detektor (5 Metriken) | | `ai-compliance-sdk/policies/canonical_controls_v1.json` | JSON | 10 Seed Controls, 39 Open Anchors | | `ai-compliance-sdk/internal/ucca/canonical_control_loader.go` | Go | Control Loader mit Multi-Index | | `admin-compliance/app/sdk/control-library/page.tsx` | TSX | Control Library Browser | | `admin-compliance/app/sdk/control-provenance/page.tsx` | TSX | Provenance Wiki | | `admin-compliance/app/api/sdk/v1/canonical/route.ts` | TS | Next.js API Proxy | | `scripts/validate-controls.py` | Python | CI/CD Validator | --- ## Tests | Datei | Sprache | Tests | |-------|---------|-------| | `ai-compliance-sdk/internal/ucca/canonical_control_loader_test.go` | Go | 8 Tests | | `backend-compliance/compliance/tests/test_similarity_detector.py` | Python | 19 Tests | | `backend-compliance/tests/test_canonical_control_routes.py` | Python | 14 Tests | | `backend-compliance/tests/test_license_gate.py` | Python | 12 Tests | | `backend-compliance/tests/test_validate_controls.py` | Python | 14 Tests | | **Gesamt** | | **67 Tests** |