Files
breakpilot-compliance/docs-src/services/sdk-modules/canonical-control-library.md
Benjamin Admin 050f353192
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 40s
CI/CD / test-python-backend-compliance (push) Successful in 41s
CI/CD / test-python-document-crawler (push) Successful in 26s
CI/CD / test-python-dsms-gateway (push) Successful in 23s
CI/CD / validate-canonical-controls (push) Successful in 18s
CI/CD / deploy-hetzner (push) Successful in 2m26s
feat(canonical-controls): Canonical Control Library — rechtssichere Security Controls
Eigenstaendig formulierte Security Controls mit unabhaengiger Taxonomie
und Open-Source-Verankerung (OWASP, NIST, ENISA). Keine BSI-Nomenklatur.

- Migration 044: 5 DB-Tabellen (frameworks, controls, sources, licenses, mappings)
- 10 Seed Controls mit 39 Open-Source-Referenzen
- License Gate: Quellen-Berechtigungspruefung (analysis/excerpt/embeddings/product)
- Too-Close-Detektor: 5 Metriken (exact-phrase, token-overlap, ngram, embedding, LCS)
- REST API: 8 Endpoints unter /v1/canonical/
- Go Loader mit Multi-Index (ID, domain, severity, framework)
- Frontend: Control Library Browser + Provenance Wiki
- CI/CD: validate-controls.py Job (schema, no-leak, open-anchors)
- 67 Tests (8 Go + 59 Python), alle PASS
- MkDocs Dokumentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 19:55:06 +01:00

252 lines
8.7 KiB
Markdown

# Canonical Control Library (CP-CLIB)
Eigenstaendig formulierte Security Controls basierend auf offenem Wissen (OWASP, NIST, ENISA).
Unabhaengige Taxonomie — kein Bezug zu proprietaeren Frameworks.
**Prefix:** `CP-CLIB` · **Frontend:** `https://macmini:3007/sdk/control-library`
**Provenance Wiki:** `https://macmini:3007/sdk/control-provenance`
**Proxy:** `/api/sdk/v1/canonical``backend-compliance:8002/api/v1/canonical/...`
---
## Motivation
Wir benoetigen ein System, um aus verschiedenen Security-Guidelines **eigenstaendige, rechtlich defensible Controls** zu extrahieren, ohne proprietaere Texte im Produkt zu verwenden.
### Kernprinzipien
1. **Unabhaengige Taxonomie** — Eigene Domain-IDs (AUTH, NET, SUP, etc.), eigenes ID-Format (`DOMAIN-NNN`)
2. **Open-Source-Verankerung** — Jedes Control hat mindestens 1 Open Anchor (OWASP/NIST/ENISA)
3. **Strikte Quellentrennung** — Geschuetzte Quellen nur intern zur Analyse, nie im Produkt
4. **Automatisierte Pruefung** — Too-Close-Detektor + No-Leak-Scanner in CI/CD
---
## Rechtliche Basis
| Gesetz | Bezug |
|--------|-------|
| UrhG §44b | Text & Data Mining — Kopien loeschen |
| UrhG §23 | Hinreichender Abstand zum Originalwerk |
| BSI Nutzungsbedingungen | Kommerziell nur mit Zustimmung |
---
## Domains (Unabhaengige Taxonomie)
| Domain | Name | Beschreibung |
|--------|------|-------------|
| AUTH | Identity & Access Management | Authentisierung, MFA, Token-Management |
| NET | Network & Transport Security | TLS, Zertifikate, Netzwerk-Haertung |
| SUP | Software Supply Chain | Signierung, SBOM, Dependency-Scanning |
| LOG | Security Operations & Logging | Privacy-Aware Logging, SIEM |
| WEB | Web Application Security | Admin-Flows, Account Recovery |
| DATA | Data Governance & Classification | Datenklassifikation, Schutzmassnahmen |
| CRYP | Cryptographic Operations | Key Management, Rotation, HSM |
| REL | Release & Change Governance | Change Impact Assessment, Security Review |
!!! warning "Keine BSI-Nomenklatur"
Die Domains verwenden bewusst KEINE BSI-Bezeichner (O.Auth_*, O.Netz_*).
Das ID-Format `DOMAIN-NNN` ist eine gaengige, nicht-proprietaere Konvention.
---
## Datenmodell (Migration 044)
```mermaid
erDiagram
canonical_control_licenses ||--o{ canonical_control_sources : "hat"
canonical_control_frameworks ||--o{ canonical_controls : "enthaelt"
canonical_controls ||--o{ canonical_control_mappings : "hat"
canonical_control_sources ||--o{ canonical_control_mappings : "referenziert"
canonical_control_licenses {
varchar license_id PK
varchar name
varchar commercial_use
boolean deletion_required
}
canonical_control_sources {
uuid id PK
varchar source_id UK
varchar title
boolean allowed_ship_in_product
}
canonical_control_frameworks {
uuid id PK
varchar framework_id UK
varchar name
varchar version
}
canonical_controls {
uuid id PK
uuid framework_id FK
varchar control_id
varchar severity
jsonb open_anchors
}
canonical_control_mappings {
uuid id PK
uuid control_id FK
uuid source_id FK
varchar mapping_type
varchar attribution_class
}
```
### Tabellen
| Tabelle | Zweck | Produktfaehig? |
|---------|-------|----------------|
| `canonical_control_licenses` | Lizenz-Metadaten | Ja (read-only) |
| `canonical_control_sources` | Quellen-Register | **Nein** (nur intern) |
| `canonical_control_frameworks` | Framework-Registry | Ja |
| `canonical_controls` | Die eigentlichen Controls | Ja |
| `canonical_control_mappings` | Provenance-Trail | **Nein** (nur Audit) |
---
## API Endpoints
| Methode | Pfad | Beschreibung |
|---------|------|--------------|
| `GET` | `/v1/canonical/frameworks` | Alle Frameworks |
| `GET` | `/v1/canonical/frameworks/{id}` | Framework-Details |
| `GET` | `/v1/canonical/frameworks/{id}/controls` | Controls eines Frameworks |
| `GET` | `/v1/canonical/controls` | Alle Controls (Filter: `severity`, `domain`, `release_state`) |
| `GET` | `/v1/canonical/controls/{control_id}` | Einzelnes Control (z.B. AUTH-001) |
| `GET` | `/v1/canonical/sources` | Quellenregister mit Berechtigungen |
| `GET` | `/v1/canonical/licenses` | Lizenz-Matrix |
| `POST` | `/v1/canonical/controls/{id}/similarity-check` | Too-Close-Pruefung |
### Beispiel: Control abrufen
```bash
curl -s https://macmini:8002/api/v1/canonical/controls/AUTH-001 | jq
```
### Beispiel: Similarity Check
```bash
curl -X POST https://macmini:8002/api/v1/canonical/controls/AUTH-001/similarity-check \
-H 'Content-Type: application/json' \
-d '{
"source_text": "Die Anwendung muss MFA implementieren.",
"candidate_text": "Privileged accounts require multi-factor authentication."
}' | jq
```
**Response:**
```json
{
"max_exact_run": 0,
"token_overlap": 0.0714,
"ngram_jaccard": 0.0323,
"embedding_cosine": 0.0,
"lcs_ratio": 0.0714,
"status": "PASS",
"details": {
"max_exact_run": "PASS",
"token_overlap": "PASS",
"ngram_jaccard": "PASS",
"embedding_cosine": "PASS",
"lcs_ratio": "PASS"
}
}
```
---
## Too-Close-Detektor
5 Metriken mit Schwellwerten:
| Metrik | Warn | Fail | Beschreibung |
|--------|------|------|-------------|
| Exact Phrase | ≥8 Tokens | ≥12 Tokens | Laengste identische Token-Sequenz |
| Token Overlap | ≥0.20 | ≥0.30 | Jaccard der Token-Mengen |
| 3-Gram Jaccard | ≥0.10 | ≥0.18 | Zeichenketten-Aehnlichkeit |
| Embedding Cosine | ≥0.86 | ≥0.92 | Semantische Aehnlichkeit (bge-m3) |
| LCS Ratio | ≥0.35 | ≥0.50 | Longest Common Subsequence |
**Entscheidungslogik:**
- **PASS** — Kein Fail + max 1 Warn
- **WARN** — Max 2 Warn, kein Fail → Human Review
- **FAIL** — Irgendein Fail → Block, Umformulierung noetig
---
## License Gate
Jede Quelle hat definierte Berechtigungen:
| Nutzungsart | Spalte | Beispiel OWASP | Beispiel BSI |
|-------------|--------|---------------|-------------|
| Analyse | `allowed_analysis` | ✅ | ✅ |
| Excerpt speichern | `allowed_store_excerpt` | ✅ | ❌ |
| Embeddings shippen | `allowed_ship_embeddings` | ✅ | ❌ |
| Im Produkt shippen | `allowed_ship_in_product` | ✅ | ❌ |
---
## CI/CD Validation
Der Validator (`scripts/validate-controls.py`) prueft bei jedem Commit:
1. **Schema Validation** — Alle Pflichtfelder, ID-Format, Severity
2. **No-Leak Scanner** — Regex gegen BSI-Muster (`O.Auth_*`, `TR-03161`, etc.)
3. **Open Anchor Check** — Jedes Control hat ≥1 Open Anchor
4. **Taxonomy Check** — Keine BSI-style ID-Prefixe
5. **Evidence Structure** — Alle Evidence-Items haben `type` + `description`
---
## Frontend
### Control Library Browser (`/sdk/control-library`)
- Framework-Info mit Version und Beschreibung
- Filterable Control-Tabelle (Domain, Severity, Freitext)
- Detail-Ansicht mit: Ziel, Begruendung, Anforderungen, Pruefverfahren, Nachweise
- **Open-Source-Referenzen** prominent dargestellt (gruener Kasten)
- Tags und Scope-Informationen
### Control Provenance Wiki (`/sdk/control-provenance`)
- Dokumentation der Methodik
- Unabhaengige Taxonomie erklaert
- Offene Referenzquellen aufgelistet
- Geschuetzte Quellen und Trennungsprinzip
- **Live-Daten:** Lizenz-Matrix und Quellenregister aus der Datenbank
---
## Dateien
| Datei | Typ | Beschreibung |
|-------|-----|-------------|
| `backend-compliance/migrations/044_canonical_control_library.sql` | SQL | 5 Tabellen + Seed-Daten |
| `backend-compliance/compliance/api/canonical_control_routes.py` | Python | REST API (8 Endpoints) |
| `backend-compliance/compliance/services/license_gate.py` | Python | Lizenz-Gate-Logik |
| `backend-compliance/compliance/services/similarity_detector.py` | Python | Too-Close-Detektor (5 Metriken) |
| `ai-compliance-sdk/policies/canonical_controls_v1.json` | JSON | 10 Seed Controls, 39 Open Anchors |
| `ai-compliance-sdk/internal/ucca/canonical_control_loader.go` | Go | Control Loader mit Multi-Index |
| `admin-compliance/app/sdk/control-library/page.tsx` | TSX | Control Library Browser |
| `admin-compliance/app/sdk/control-provenance/page.tsx` | TSX | Provenance Wiki |
| `admin-compliance/app/api/sdk/v1/canonical/route.ts` | TS | Next.js API Proxy |
| `scripts/validate-controls.py` | Python | CI/CD Validator |
---
## Tests
| Datei | Sprache | Tests |
|-------|---------|-------|
| `ai-compliance-sdk/internal/ucca/canonical_control_loader_test.go` | Go | 8 Tests |
| `backend-compliance/compliance/tests/test_similarity_detector.py` | Python | 19 Tests |
| `backend-compliance/tests/test_canonical_control_routes.py` | Python | 14 Tests |
| `backend-compliance/tests/test_license_gate.py` | Python | 12 Tests |
| `backend-compliance/tests/test_validate_controls.py` | Python | 14 Tests |
| **Gesamt** | | **67 Tests** |