All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 32s
CI/CD / test-python-backend-compliance (push) Successful in 34s
CI/CD / test-python-document-crawler (push) Successful in 23s
CI/CD / test-python-dsms-gateway (push) Successful in 21s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Successful in 2s
Module 2: Extended Compliance Dashboard with roadmap, module-status, next-actions, snapshots, score-history Module 3: 7 German security document templates (IT-Sicherheitskonzept, Datenschutz, Backup, Logging, Incident-Response, Zugriff, Risikomanagement) Module 4: Compliance Process Manager with CRUD, complete/skip/seed, ~50 seed tasks, 3-tab UI Module 5: Evidence Collector Extended with automated checks, control-mapping, coverage report, 4-tab UI Also includes: canonical control library enhancements (verification method, categories, dedup), control generator improvements, RAG client extensions 52 tests pass, frontend builds clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
387 lines
15 KiB
Markdown
387 lines
15 KiB
Markdown
# Canonical Control Library (CP-CLIB)
|
|
|
|
Eigenstaendig formulierte Security Controls basierend auf offenem Wissen (OWASP, NIST, ENISA).
|
|
Unabhaengige Taxonomie — kein Bezug zu proprietaeren Frameworks.
|
|
|
|
**Prefix:** `CP-CLIB` · **Frontend:** `https://macmini:3007/sdk/control-library`
|
|
**Provenance Wiki:** `https://macmini:3007/sdk/control-provenance`
|
|
**Proxy:** `/api/sdk/v1/canonical` → `backend-compliance:8002/api/compliance/v1/canonical/...`
|
|
|
|
---
|
|
|
|
## Motivation
|
|
|
|
Wir benoetigen ein System, um aus verschiedenen Security-Guidelines **eigenstaendige, rechtlich defensible Controls** zu extrahieren, ohne proprietaere Texte im Produkt zu verwenden.
|
|
|
|
### Kernprinzipien
|
|
|
|
1. **Unabhaengige Taxonomie** — Eigene Domain-IDs (AUTH, NET, SUP, etc.), eigenes ID-Format (`DOMAIN-NNN`)
|
|
2. **Open-Source-Verankerung** — Jedes Control hat mindestens 1 Open Anchor (OWASP/NIST/ENISA)
|
|
3. **Strikte Quellentrennung** — Geschuetzte Quellen nur intern zur Analyse, nie im Produkt
|
|
4. **Automatisierte Pruefung** — Too-Close-Detektor + No-Leak-Scanner in CI/CD
|
|
|
|
---
|
|
|
|
## Rechtliche Basis
|
|
|
|
| Gesetz | Bezug |
|
|
|--------|-------|
|
|
| UrhG §44b | Text & Data Mining — Kopien loeschen |
|
|
| UrhG §23 | Hinreichender Abstand zum Originalwerk |
|
|
| BSI Nutzungsbedingungen | Kommerziell nur mit Zustimmung |
|
|
|
|
---
|
|
|
|
## Domains (Unabhaengige Taxonomie)
|
|
|
|
| Domain | Name | Beschreibung |
|
|
|--------|------|-------------|
|
|
| AUTH | Identity & Access Management | Authentisierung, MFA, Token-Management |
|
|
| NET | Network & Transport Security | TLS, Zertifikate, Netzwerk-Haertung |
|
|
| SUP | Software Supply Chain | Signierung, SBOM, Dependency-Scanning |
|
|
| LOG | Security Operations & Logging | Privacy-Aware Logging, SIEM |
|
|
| WEB | Web Application Security | Admin-Flows, Account Recovery |
|
|
| DATA | Data Governance & Classification | Datenklassifikation, Schutzmassnahmen |
|
|
| CRYP | Cryptographic Operations | Key Management, Rotation, HSM |
|
|
| REL | Release & Change Governance | Change Impact Assessment, Security Review |
|
|
|
|
!!! warning "Keine BSI-Nomenklatur"
|
|
Die Domains verwenden bewusst KEINE BSI-Bezeichner (O.Auth_*, O.Netz_*).
|
|
Das ID-Format `DOMAIN-NNN` ist eine gaengige, nicht-proprietaere Konvention.
|
|
|
|
---
|
|
|
|
## Datenmodell (Migration 044)
|
|
|
|
```mermaid
|
|
erDiagram
|
|
canonical_control_licenses ||--o{ canonical_control_sources : "hat"
|
|
canonical_control_frameworks ||--o{ canonical_controls : "enthaelt"
|
|
canonical_controls ||--o{ canonical_control_mappings : "hat"
|
|
canonical_control_sources ||--o{ canonical_control_mappings : "referenziert"
|
|
|
|
canonical_control_licenses {
|
|
varchar license_id PK
|
|
varchar name
|
|
varchar commercial_use
|
|
boolean deletion_required
|
|
}
|
|
canonical_control_sources {
|
|
uuid id PK
|
|
varchar source_id UK
|
|
varchar title
|
|
boolean allowed_ship_in_product
|
|
}
|
|
canonical_control_frameworks {
|
|
uuid id PK
|
|
varchar framework_id UK
|
|
varchar name
|
|
varchar version
|
|
}
|
|
canonical_controls {
|
|
uuid id PK
|
|
uuid framework_id FK
|
|
varchar control_id
|
|
varchar severity
|
|
jsonb open_anchors
|
|
}
|
|
canonical_control_mappings {
|
|
uuid id PK
|
|
uuid control_id FK
|
|
uuid source_id FK
|
|
varchar mapping_type
|
|
varchar attribution_class
|
|
}
|
|
```
|
|
|
|
### Tabellen
|
|
|
|
| Tabelle | Zweck | Produktfaehig? |
|
|
|---------|-------|----------------|
|
|
| `canonical_control_licenses` | Lizenz-Metadaten | Ja (read-only) |
|
|
| `canonical_control_sources` | Quellen-Register | **Nein** (nur intern) |
|
|
| `canonical_control_frameworks` | Framework-Registry | Ja |
|
|
| `canonical_controls` | Die eigentlichen Controls | Ja |
|
|
| `canonical_control_mappings` | Provenance-Trail | **Nein** (nur Audit) |
|
|
|
|
---
|
|
|
|
## API Endpoints
|
|
|
|
| Methode | Pfad | Beschreibung |
|
|
|---------|------|--------------|
|
|
| `GET` | `/v1/canonical/frameworks` | Alle Frameworks |
|
|
| `GET` | `/v1/canonical/frameworks/{id}` | Framework-Details |
|
|
| `GET` | `/v1/canonical/frameworks/{id}/controls` | Controls eines Frameworks |
|
|
| `GET` | `/v1/canonical/controls` | Alle Controls (Filter: `severity`, `domain`, `release_state`) |
|
|
| `GET` | `/v1/canonical/controls/{control_id}` | Einzelnes Control (z.B. AUTH-001) |
|
|
| `GET` | `/v1/canonical/sources` | Quellenregister mit Berechtigungen |
|
|
| `GET` | `/v1/canonical/licenses` | Lizenz-Matrix |
|
|
| `POST` | `/v1/canonical/controls/{id}/similarity-check` | Too-Close-Pruefung |
|
|
|
|
### Beispiel: Control abrufen
|
|
|
|
```bash
|
|
curl -s https://macmini:8002/api/v1/canonical/controls/AUTH-001 | jq
|
|
```
|
|
|
|
### Beispiel: Similarity Check
|
|
|
|
```bash
|
|
curl -X POST https://macmini:8002/api/v1/canonical/controls/AUTH-001/similarity-check \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{
|
|
"source_text": "Die Anwendung muss MFA implementieren.",
|
|
"candidate_text": "Privileged accounts require multi-factor authentication."
|
|
}' | jq
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"max_exact_run": 0,
|
|
"token_overlap": 0.0714,
|
|
"ngram_jaccard": 0.0323,
|
|
"embedding_cosine": 0.0,
|
|
"lcs_ratio": 0.0714,
|
|
"status": "PASS",
|
|
"details": {
|
|
"max_exact_run": "PASS",
|
|
"token_overlap": "PASS",
|
|
"ngram_jaccard": "PASS",
|
|
"embedding_cosine": "PASS",
|
|
"lcs_ratio": "PASS"
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Too-Close-Detektor
|
|
|
|
5 Metriken mit Schwellwerten:
|
|
|
|
| Metrik | Warn | Fail | Beschreibung |
|
|
|--------|------|------|-------------|
|
|
| Exact Phrase | ≥8 Tokens | ≥12 Tokens | Laengste identische Token-Sequenz |
|
|
| Token Overlap | ≥0.20 | ≥0.30 | Jaccard der Token-Mengen |
|
|
| 3-Gram Jaccard | ≥0.10 | ≥0.18 | Zeichenketten-Aehnlichkeit |
|
|
| Embedding Cosine | ≥0.86 | ≥0.92 | Semantische Aehnlichkeit (bge-m3) |
|
|
| LCS Ratio | ≥0.35 | ≥0.50 | Longest Common Subsequence |
|
|
|
|
**Entscheidungslogik:**
|
|
|
|
- **PASS** — Kein Fail + max 1 Warn
|
|
- **WARN** — Max 2 Warn, kein Fail → Human Review
|
|
- **FAIL** — Irgendein Fail → Block, Umformulierung noetig
|
|
|
|
---
|
|
|
|
## License Gate
|
|
|
|
Jede Quelle hat definierte Berechtigungen:
|
|
|
|
| Nutzungsart | Spalte | Beispiel OWASP | Beispiel BSI |
|
|
|-------------|--------|---------------|-------------|
|
|
| Analyse | `allowed_analysis` | ✅ | ✅ |
|
|
| Excerpt speichern | `allowed_store_excerpt` | ✅ | ❌ |
|
|
| Embeddings shippen | `allowed_ship_embeddings` | ✅ | ❌ |
|
|
| Im Produkt shippen | `allowed_ship_in_product` | ✅ | ❌ |
|
|
|
|
---
|
|
|
|
## CI/CD Validation
|
|
|
|
Der Validator (`scripts/validate-controls.py`) prueft bei jedem Commit:
|
|
|
|
1. **Schema Validation** — Alle Pflichtfelder, ID-Format, Severity
|
|
2. **No-Leak Scanner** — Regex gegen BSI-Muster (`O.Auth_*`, `TR-03161`, etc.)
|
|
3. **Open Anchor Check** — Jedes Control hat ≥1 Open Anchor
|
|
4. **Taxonomy Check** — Keine BSI-style ID-Prefixe
|
|
5. **Evidence Structure** — Alle Evidence-Items haben `type` + `description`
|
|
|
|
---
|
|
|
|
## Frontend
|
|
|
|
### Control Library Browser (`/sdk/control-library`)
|
|
|
|
- Framework-Info mit Version und Beschreibung
|
|
- Filterable Control-Tabelle (Domain, Severity, Freitext)
|
|
- Detail-Ansicht mit: Ziel, Begruendung, Anforderungen, Pruefverfahren, Nachweise
|
|
- **Open-Source-Referenzen** prominent dargestellt (gruener Kasten)
|
|
- Tags und Scope-Informationen
|
|
|
|
### Control Provenance Wiki (`/sdk/control-provenance`)
|
|
|
|
- Dokumentation der Methodik
|
|
- Unabhaengige Taxonomie erklaert
|
|
- Offene Referenzquellen aufgelistet
|
|
- Geschuetzte Quellen und Trennungsprinzip
|
|
- **Live-Daten:** Lizenz-Matrix und Quellenregister aus der Datenbank
|
|
|
|
---
|
|
|
|
## Control Generator Pipeline
|
|
|
|
Automatische Generierung von Controls aus dem gesamten RAG-Korpus (170.000+ Chunks aus Gesetzen, Verordnungen und Standards).
|
|
|
|
### 8-Stufen-Pipeline
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A[1. RAG Scroll] -->|Alle Chunks| B[2. Prefilter - Lokales LLM]
|
|
B -->|Irrelevant| C[Als processed markieren]
|
|
B -->|Relevant| D[3. License Classify]
|
|
D -->|Rule 1/2| E[4a. Structure - Anthropic]
|
|
D -->|Rule 3| F[4b. LLM Reform - Anthropic]
|
|
E --> G[5. Harmonization - Embeddings]
|
|
F --> G
|
|
G -->|Duplikat| H[Als Duplikat speichern]
|
|
G -->|Neu| I[6. Anchor Search]
|
|
I --> J[7. Store Control]
|
|
J --> K[8. Mark Processed]
|
|
```
|
|
|
|
### Stufe 1: RAG Scroll (Vollstaendig)
|
|
|
|
Scrollt durch **ALLE** Chunks in allen RAG-Collections mittels Qdrant Scroll-API.
|
|
Kein Limit — jeder Chunk wird verarbeitet, um keine gesetzlichen Anforderungen zu uebersehen.
|
|
|
|
Bereits verarbeitete Chunks werden per SHA-256-Hash uebersprungen (`canonical_processed_chunks`).
|
|
|
|
### Stufe 2: Lokaler LLM-Vorfilter (Qwen 30B)
|
|
|
|
**Kostenoptimierung:** Bevor ein Chunk an die Anthropic API geht, prueft das lokale Qwen-Modell (`qwen3:30b-a3b` auf Mac Mini), ob der Chunk eine konkrete Anforderung enthaelt.
|
|
|
|
- **Relevant:** Pflichten ("muss", "soll"), technische Massnahmen, Datenschutz-Vorgaben
|
|
- **Irrelevant:** Definitionen, Inhaltsverzeichnisse, Begriffsbestimmungen, Uebergangsvorschriften
|
|
|
|
Irrelevante Chunks werden als `prefilter_skip` markiert und nie wieder verarbeitet.
|
|
Dies spart >50% der Anthropic-API-Kosten.
|
|
|
|
### Stufe 3: Lizenz-Klassifikation (3-Regel-System)
|
|
|
|
| Regel | Lizenz | Original erlaubt? | Beispiel |
|
|
|-------|--------|-------------------|----------|
|
|
| **Rule 1** (free_use) | EU-Gesetze, NIST, DE-Gesetze | Ja | DSGVO, BDSG, NIS2 |
|
|
| **Rule 2** (citation_required) | CC-BY, CC-BY-SA | Ja, mit Zitation | OWASP ASVS |
|
|
| **Rule 3** (restricted) | Proprietaer | Nein, volle Reformulierung | BSI TR-03161 |
|
|
|
|
### Stufe 4a/4b: Strukturierung / Reformulierung
|
|
|
|
- **Rule 1+2:** Anthropic strukturiert den Originaltext in Control-Format (Titel, Ziel, Anforderungen)
|
|
- **Rule 3:** Anthropic reformuliert vollstaendig — kein Originaltext, keine Quellennamen
|
|
|
|
### Stufe 5: Harmonisierung (Embedding-basiert)
|
|
|
|
Prueft per bge-m3 Embeddings (Cosine Similarity > 0.85), ob ein aehnliches Control existiert.
|
|
Embeddings werden in Batches vorgeladen (32 Texte/Request) fuer maximale Performance.
|
|
|
|
### Stufe 6-8: Anchor Search, Store, Mark Processed
|
|
|
|
- **Anchor Search:** Findet Open-Source-Referenzen (OWASP, NIST, ENISA)
|
|
- **Store:** Persistiert Control mit `verification_method` und `category`
|
|
- **Mark Processed:** Markiert **JEDEN** Chunk als verarbeitet (auch bei Skip/Error/Duplikat)
|
|
|
|
### Automatische Klassifikation
|
|
|
|
Bei der Generierung werden automatisch zugewiesen:
|
|
|
|
**Verification Method** (Nachweis-Methode):
|
|
|
|
| Methode | Beschreibung |
|
|
|---------|-------------|
|
|
| `code_review` | Im Source Code pruefbar |
|
|
| `document` | Dokument/Prozess-Nachweis |
|
|
| `tool` | Tool-basierte Pruefung |
|
|
| `hybrid` | Kombination mehrerer Methoden |
|
|
|
|
**Category** (17 thematische Kategorien):
|
|
encryption, authentication, network, data_protection, logging, incident,
|
|
continuity, compliance, supply_chain, physical, personnel, application,
|
|
system, risk, governance, hardware, identity
|
|
|
|
### Konfiguration
|
|
|
|
| ENV-Variable | Default | Beschreibung |
|
|
|-------------|---------|-------------|
|
|
| `ANTHROPIC_API_KEY` | — | API-Key fuer Anthropic Claude |
|
|
| `CONTROL_GEN_ANTHROPIC_MODEL` | `claude-sonnet-4-6` | Anthropic-Modell fuer Formulierung |
|
|
| `OLLAMA_URL` | `http://host.docker.internal:11434` | Lokaler Ollama-Server (Vorfilter) |
|
|
| `CONTROL_GEN_OLLAMA_MODEL` | `qwen3:30b-a3b` | Lokales LLM fuer Vorfilter |
|
|
| `CONTROL_GEN_LLM_TIMEOUT` | `120` | Timeout in Sekunden |
|
|
|
|
### Architektur-Entscheidung: Gesetzesverweise
|
|
|
|
Controls leiten sich aus zwei Quellen ab:
|
|
|
|
1. **Direkte gesetzliche Pflichten (Rule 1):** z.B. DSGVO Art. 32 erzwingt "technische und organisatorische Massnahmen". Diese Controls haben `source_citation` mit exakter Gesetzesreferenz und Originaltext.
|
|
|
|
2. **Implizite Umsetzung ueber Best Practices (Rule 2/3):** z.B. OWASP ASVS V2.7 fordert MFA — das ist keine gesetzliche Pflicht, aber eine Best Practice um NIS2 Art. 21 oder DSGVO Art. 32 zu erfuellen. Diese Controls haben Open-Source-Referenzen (Anchors).
|
|
|
|
**Im Frontend:**
|
|
- Rule 1/2 Controls zeigen eine blaue "Gesetzliche Grundlage" Box mit Gesetz, Artikel und Link
|
|
- Rule 3 Controls zeigen einen Hinweis dass sie implizit Gesetze umsetzen, mit Verweis auf die Referenzen
|
|
|
|
### API
|
|
|
|
```bash
|
|
# Job starten (laeuft im Hintergrund)
|
|
curl -X POST https://macmini:8002/api/compliance/v1/canonical/generate \
|
|
-H 'Content-Type: application/json' \
|
|
-H 'X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000' \
|
|
-d '{"collections": ["bp_compliance_gesetze"]}'
|
|
|
|
# Job-Status abfragen
|
|
curl https://macmini:8002/api/compliance/v1/canonical/generate/jobs \
|
|
-H 'X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000'
|
|
```
|
|
|
|
### RAG Collections
|
|
|
|
| Collection | Inhalte | Erwartete Regel |
|
|
|-----------|---------|----------------|
|
|
| `bp_compliance_gesetze` | Deutsche Gesetze (BDSG, TTDSG, TKG etc.) | Rule 1 |
|
|
| `bp_compliance_recht` | EU-Verordnungen (DSGVO, NIS2, AI Act etc.) | Rule 1 |
|
|
| `bp_compliance_datenschutz` | Datenschutz-Leitlinien | Rule 1/2 |
|
|
| `bp_compliance_ce` | CE/Sicherheitsstandards | Rule 1/2/3 |
|
|
| `bp_dsfa_corpus` | DSFA-Korpus | Rule 1/2 |
|
|
| `bp_legal_templates` | Rechtsvorlagen | Rule 1 |
|
|
|
|
---
|
|
|
|
## Dateien
|
|
|
|
| Datei | Typ | Beschreibung |
|
|
|-------|-----|-------------|
|
|
| `backend-compliance/migrations/044_canonical_control_library.sql` | SQL | 5 Tabellen + Seed-Daten |
|
|
| `backend-compliance/migrations/047_verification_method_category.sql` | SQL | verification_method + category Felder |
|
|
| `backend-compliance/compliance/api/canonical_control_routes.py` | Python | REST API (8+ Endpoints) |
|
|
| `backend-compliance/compliance/api/control_generator_routes.py` | Python | Generator API (Start/Status/Jobs) |
|
|
| `backend-compliance/compliance/services/control_generator.py` | Python | 8-Stufen-Pipeline |
|
|
| `backend-compliance/compliance/services/license_gate.py` | Python | Lizenz-Gate-Logik |
|
|
| `backend-compliance/compliance/services/similarity_detector.py` | Python | Too-Close-Detektor (5 Metriken) |
|
|
| `backend-compliance/compliance/services/rag_client.py` | Python | RAG-Client (Search + Scroll) |
|
|
| `ai-compliance-sdk/internal/ucca/legal_rag.go` | Go | RAG Search + Scroll (Qdrant) |
|
|
| `ai-compliance-sdk/internal/api/handlers/rag_handlers.go` | Go | RAG HTTP-Handler |
|
|
| `ai-compliance-sdk/policies/canonical_controls_v1.json` | JSON | 10 Seed Controls, 39 Open Anchors |
|
|
| `ai-compliance-sdk/internal/ucca/canonical_control_loader.go` | Go | Control Loader mit Multi-Index |
|
|
| `admin-compliance/app/sdk/control-library/page.tsx` | TSX | Control Library Browser |
|
|
| `admin-compliance/app/sdk/control-provenance/page.tsx` | TSX | Provenance Wiki |
|
|
| `admin-compliance/app/api/sdk/v1/canonical/route.ts` | TS | Next.js API Proxy |
|
|
| `scripts/validate-controls.py` | Python | CI/CD Validator |
|
|
|
|
---
|
|
|
|
## Tests
|
|
|
|
| Datei | Sprache | Tests |
|
|
|-------|---------|-------|
|
|
| `ai-compliance-sdk/internal/ucca/canonical_control_loader_test.go` | Go | 8 Tests |
|
|
| `backend-compliance/compliance/tests/test_similarity_detector.py` | Python | 19 Tests |
|
|
| `backend-compliance/tests/test_canonical_control_routes.py` | Python | 14 Tests |
|
|
| `backend-compliance/tests/test_license_gate.py` | Python | 12 Tests |
|
|
| `backend-compliance/tests/test_validate_controls.py` | Python | 14 Tests |
|
|
| **Gesamt** | | **67 Tests** |
|