feat: add compliance modules 2-5 (dashboard, security templates, process manager, evidence collector)
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 32s
CI/CD / test-python-backend-compliance (push) Successful in 34s
CI/CD / test-python-document-crawler (push) Successful in 23s
CI/CD / test-python-dsms-gateway (push) Successful in 21s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Successful in 2s
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 32s
CI/CD / test-python-backend-compliance (push) Successful in 34s
CI/CD / test-python-document-crawler (push) Successful in 23s
CI/CD / test-python-dsms-gateway (push) Successful in 21s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Successful in 2s
Module 2: Extended Compliance Dashboard with roadmap, module-status, next-actions, snapshots, score-history Module 3: 7 German security document templates (IT-Sicherheitskonzept, Datenschutz, Backup, Logging, Incident-Response, Zugriff, Risikomanagement) Module 4: Compliance Process Manager with CRUD, complete/skip/seed, ~50 seed tasks, 3-tab UI Module 5: Evidence Collector Extended with automated checks, control-mapping, coverage report, 4-tab UI Also includes: canonical control library enhancements (verification method, categories, dedup), control generator improvements, RAG client extensions 52 tests pass, frontend builds clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -222,14 +222,149 @@ Der Validator (`scripts/validate-controls.py`) prueft bei jedem Commit:
|
||||
|
||||
---
|
||||
|
||||
## Control Generator Pipeline
|
||||
|
||||
Automatische Generierung von Controls aus dem gesamten RAG-Korpus (170.000+ Chunks aus Gesetzen, Verordnungen und Standards).
|
||||
|
||||
### 8-Stufen-Pipeline
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[1. RAG Scroll] -->|Alle Chunks| B[2. Prefilter - Lokales LLM]
|
||||
B -->|Irrelevant| C[Als processed markieren]
|
||||
B -->|Relevant| D[3. License Classify]
|
||||
D -->|Rule 1/2| E[4a. Structure - Anthropic]
|
||||
D -->|Rule 3| F[4b. LLM Reform - Anthropic]
|
||||
E --> G[5. Harmonization - Embeddings]
|
||||
F --> G
|
||||
G -->|Duplikat| H[Als Duplikat speichern]
|
||||
G -->|Neu| I[6. Anchor Search]
|
||||
I --> J[7. Store Control]
|
||||
J --> K[8. Mark Processed]
|
||||
```
|
||||
|
||||
### Stufe 1: RAG Scroll (Vollstaendig)
|
||||
|
||||
Scrollt durch **ALLE** Chunks in allen RAG-Collections mittels Qdrant Scroll-API.
|
||||
Kein Limit — jeder Chunk wird verarbeitet, um keine gesetzlichen Anforderungen zu uebersehen.
|
||||
|
||||
Bereits verarbeitete Chunks werden per SHA-256-Hash uebersprungen (`canonical_processed_chunks`).
|
||||
|
||||
### Stufe 2: Lokaler LLM-Vorfilter (Qwen 30B)
|
||||
|
||||
**Kostenoptimierung:** Bevor ein Chunk an die Anthropic API geht, prueft das lokale Qwen-Modell (`qwen3:30b-a3b` auf Mac Mini), ob der Chunk eine konkrete Anforderung enthaelt.
|
||||
|
||||
- **Relevant:** Pflichten ("muss", "soll"), technische Massnahmen, Datenschutz-Vorgaben
|
||||
- **Irrelevant:** Definitionen, Inhaltsverzeichnisse, Begriffsbestimmungen, Uebergangsvorschriften
|
||||
|
||||
Irrelevante Chunks werden als `prefilter_skip` markiert und nie wieder verarbeitet.
|
||||
Dies spart >50% der Anthropic-API-Kosten.
|
||||
|
||||
### Stufe 3: Lizenz-Klassifikation (3-Regel-System)
|
||||
|
||||
| Regel | Lizenz | Original erlaubt? | Beispiel |
|
||||
|-------|--------|-------------------|----------|
|
||||
| **Rule 1** (free_use) | EU-Gesetze, NIST, DE-Gesetze | Ja | DSGVO, BDSG, NIS2 |
|
||||
| **Rule 2** (citation_required) | CC-BY, CC-BY-SA | Ja, mit Zitation | OWASP ASVS |
|
||||
| **Rule 3** (restricted) | Proprietaer | Nein, volle Reformulierung | BSI TR-03161 |
|
||||
|
||||
### Stufe 4a/4b: Strukturierung / Reformulierung
|
||||
|
||||
- **Rule 1+2:** Anthropic strukturiert den Originaltext in Control-Format (Titel, Ziel, Anforderungen)
|
||||
- **Rule 3:** Anthropic reformuliert vollstaendig — kein Originaltext, keine Quellennamen
|
||||
|
||||
### Stufe 5: Harmonisierung (Embedding-basiert)
|
||||
|
||||
Prueft per bge-m3 Embeddings (Cosine Similarity > 0.85), ob ein aehnliches Control existiert.
|
||||
Embeddings werden in Batches vorgeladen (32 Texte/Request) fuer maximale Performance.
|
||||
|
||||
### Stufe 6-8: Anchor Search, Store, Mark Processed
|
||||
|
||||
- **Anchor Search:** Findet Open-Source-Referenzen (OWASP, NIST, ENISA)
|
||||
- **Store:** Persistiert Control mit `verification_method` und `category`
|
||||
- **Mark Processed:** Markiert **JEDEN** Chunk als verarbeitet (auch bei Skip/Error/Duplikat)
|
||||
|
||||
### Automatische Klassifikation
|
||||
|
||||
Bei der Generierung werden automatisch zugewiesen:
|
||||
|
||||
**Verification Method** (Nachweis-Methode):
|
||||
|
||||
| Methode | Beschreibung |
|
||||
|---------|-------------|
|
||||
| `code_review` | Im Source Code pruefbar |
|
||||
| `document` | Dokument/Prozess-Nachweis |
|
||||
| `tool` | Tool-basierte Pruefung |
|
||||
| `hybrid` | Kombination mehrerer Methoden |
|
||||
|
||||
**Category** (17 thematische Kategorien):
|
||||
encryption, authentication, network, data_protection, logging, incident,
|
||||
continuity, compliance, supply_chain, physical, personnel, application,
|
||||
system, risk, governance, hardware, identity
|
||||
|
||||
### Konfiguration
|
||||
|
||||
| ENV-Variable | Default | Beschreibung |
|
||||
|-------------|---------|-------------|
|
||||
| `ANTHROPIC_API_KEY` | — | API-Key fuer Anthropic Claude |
|
||||
| `CONTROL_GEN_ANTHROPIC_MODEL` | `claude-sonnet-4-6` | Anthropic-Modell fuer Formulierung |
|
||||
| `OLLAMA_URL` | `http://host.docker.internal:11434` | Lokaler Ollama-Server (Vorfilter) |
|
||||
| `CONTROL_GEN_OLLAMA_MODEL` | `qwen3:30b-a3b` | Lokales LLM fuer Vorfilter |
|
||||
| `CONTROL_GEN_LLM_TIMEOUT` | `120` | Timeout in Sekunden |
|
||||
|
||||
### Architektur-Entscheidung: Gesetzesverweise
|
||||
|
||||
Controls leiten sich aus zwei Quellen ab:
|
||||
|
||||
1. **Direkte gesetzliche Pflichten (Rule 1):** z.B. DSGVO Art. 32 erzwingt "technische und organisatorische Massnahmen". Diese Controls haben `source_citation` mit exakter Gesetzesreferenz und Originaltext.
|
||||
|
||||
2. **Implizite Umsetzung ueber Best Practices (Rule 2/3):** z.B. OWASP ASVS V2.7 fordert MFA — das ist keine gesetzliche Pflicht, aber eine Best Practice um NIS2 Art. 21 oder DSGVO Art. 32 zu erfuellen. Diese Controls haben Open-Source-Referenzen (Anchors).
|
||||
|
||||
**Im Frontend:**
|
||||
- Rule 1/2 Controls zeigen eine blaue "Gesetzliche Grundlage" Box mit Gesetz, Artikel und Link
|
||||
- Rule 3 Controls zeigen einen Hinweis dass sie implizit Gesetze umsetzen, mit Verweis auf die Referenzen
|
||||
|
||||
### API
|
||||
|
||||
```bash
|
||||
# Job starten (laeuft im Hintergrund)
|
||||
curl -X POST https://macmini:8002/api/compliance/v1/canonical/generate \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000' \
|
||||
-d '{"collections": ["bp_compliance_gesetze"]}'
|
||||
|
||||
# Job-Status abfragen
|
||||
curl https://macmini:8002/api/compliance/v1/canonical/generate/jobs \
|
||||
-H 'X-Tenant-ID: 550e8400-e29b-41d4-a716-446655440000'
|
||||
```
|
||||
|
||||
### RAG Collections
|
||||
|
||||
| Collection | Inhalte | Erwartete Regel |
|
||||
|-----------|---------|----------------|
|
||||
| `bp_compliance_gesetze` | Deutsche Gesetze (BDSG, TTDSG, TKG etc.) | Rule 1 |
|
||||
| `bp_compliance_recht` | EU-Verordnungen (DSGVO, NIS2, AI Act etc.) | Rule 1 |
|
||||
| `bp_compliance_datenschutz` | Datenschutz-Leitlinien | Rule 1/2 |
|
||||
| `bp_compliance_ce` | CE/Sicherheitsstandards | Rule 1/2/3 |
|
||||
| `bp_dsfa_corpus` | DSFA-Korpus | Rule 1/2 |
|
||||
| `bp_legal_templates` | Rechtsvorlagen | Rule 1 |
|
||||
|
||||
---
|
||||
|
||||
## Dateien
|
||||
|
||||
| Datei | Typ | Beschreibung |
|
||||
|-------|-----|-------------|
|
||||
| `backend-compliance/migrations/044_canonical_control_library.sql` | SQL | 5 Tabellen + Seed-Daten |
|
||||
| `backend-compliance/compliance/api/canonical_control_routes.py` | Python | REST API (8 Endpoints) |
|
||||
| `backend-compliance/migrations/047_verification_method_category.sql` | SQL | verification_method + category Felder |
|
||||
| `backend-compliance/compliance/api/canonical_control_routes.py` | Python | REST API (8+ Endpoints) |
|
||||
| `backend-compliance/compliance/api/control_generator_routes.py` | Python | Generator API (Start/Status/Jobs) |
|
||||
| `backend-compliance/compliance/services/control_generator.py` | Python | 8-Stufen-Pipeline |
|
||||
| `backend-compliance/compliance/services/license_gate.py` | Python | Lizenz-Gate-Logik |
|
||||
| `backend-compliance/compliance/services/similarity_detector.py` | Python | Too-Close-Detektor (5 Metriken) |
|
||||
| `backend-compliance/compliance/services/rag_client.py` | Python | RAG-Client (Search + Scroll) |
|
||||
| `ai-compliance-sdk/internal/ucca/legal_rag.go` | Go | RAG Search + Scroll (Qdrant) |
|
||||
| `ai-compliance-sdk/internal/api/handlers/rag_handlers.go` | Go | RAG HTTP-Handler |
|
||||
| `ai-compliance-sdk/policies/canonical_controls_v1.json` | JSON | 10 Seed Controls, 39 Open Anchors |
|
||||
| `ai-compliance-sdk/internal/ucca/canonical_control_loader.go` | Go | Control Loader mit Multi-Index |
|
||||
| `admin-compliance/app/sdk/control-library/page.tsx` | TSX | Control Library Browser |
|
||||
|
||||
Reference in New Issue
Block a user