feat(multi-layer): complete Multi-Layer Control Architecture (Phases 1-8 + Pass 0)
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 47s
CI/CD / test-python-backend-compliance (push) Successful in 33s
CI/CD / test-python-document-crawler (push) Successful in 24s
CI/CD / test-python-dsms-gateway (push) Successful in 18s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Has been skipped
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 47s
CI/CD / test-python-backend-compliance (push) Successful in 33s
CI/CD / test-python-document-crawler (push) Successful in 24s
CI/CD / test-python-dsms-gateway (push) Successful in 18s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Has been skipped
Implements the full Multi-Layer Control Architecture for migrating ~25,000 Rich Controls into atomic, deduplicated Master Controls with full traceability. Architecture: Legal Source → Obligation → Control Pattern → Master Control → Customer Instance New services: - ObligationExtractor: 3-tier extraction (exact → embedding → LLM) - PatternMatcher: 2-tier matching (keyword + embedding + domain-bonus) - ControlComposer: Pattern + Obligation → Master Control - PipelineAdapter: Pipeline integration + Migration Passes 1-5 - DecompositionPass: Pass 0a/0b — Rich Control → atomic Controls - CrosswalkRoutes: 15 API endpoints under /v1/canonical/ New DB schema: - Migration 060: obligation_extractions, control_patterns, crosswalk_matrix - Migration 061: obligation_candidates, parent_control_uuid tracking Pattern Library: 50 YAML patterns (30 core + 20 IT-security) Go SDK: Pattern loader with YAML validation and indexing Documentation: MkDocs updated with full architecture overview 500 Python tests passing across all components. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -707,3 +707,258 @@ Die Generator-Tests decken folgende Bereiche ab:
|
||||
- **`TestAnchorFinder`** (2 Tests) — RAG-Suche filtert Rule 3 Quellen aus, Web-Suche erkennt Frameworks
|
||||
- **`TestPipelineMocked`** (5 Tests) — End-to-End mit Mocks: Lizenz-Klassifikation, Rule 3 Blocking,
|
||||
Hash-Deduplizierung, Config-Defaults (`batch_size: 5`), Rule 1 Citation-Generierung
|
||||
|
||||
---
|
||||
|
||||
## Multi-Layer Control Architecture
|
||||
|
||||
Erweitert die bestehende Pipeline um ein 5-Schichten-Modell:
|
||||
|
||||
```
|
||||
Legal Source → Obligation → Control Pattern → Master Control → Customer Instance
|
||||
```
|
||||
|
||||
### Architektur-Uebersicht
|
||||
|
||||
| Layer | Asset | Beschreibung |
|
||||
|-------|-------|-------------|
|
||||
| 1: Legal Sources | Qdrant 5 Collections, 105K+ Chunks | RAG-Rohdaten |
|
||||
| 2: Obligations | v2 Framework (325 Pflichten, 9 Verordnungen) | Rechtliche Pflichten |
|
||||
| 3: Control Patterns | 50 YAML Patterns (30 Core + 20 IT-Security) | Umsetzungsmuster |
|
||||
| 4: Master Controls | canonical_controls (atomare Controls nach Dedup) | Kanonische Controls |
|
||||
| 5: Customer Instance | TOM Controls + Gap Mapping | Kundenspezifisch |
|
||||
|
||||
### Control-Ebenen
|
||||
|
||||
| Ebene | Beschreibung | Nutzen |
|
||||
|-------|-------------|--------|
|
||||
| **Rich Controls** | Narrativ, erklaerend, kontextreich (~25.000) | Schulung, Audit-Fragen, Massnahmenplaene |
|
||||
| **Atomare Controls** | 1 Pflicht = 1 Control (nach Decomposition + Dedup) | Systemaudits, Code-Checks, Gap-Analyse, Traceability |
|
||||
|
||||
### Pipeline-Erweiterung (10-Stage)
|
||||
|
||||
```
|
||||
Stage 1: RAG SCAN (unveraendert)
|
||||
Stage 2: LICENSE CLASSIFY (unveraendert)
|
||||
Stage 3: PREFILTER (unveraendert)
|
||||
Stage 4: OBLIGATION EXTRACT (NEU — 3-Tier: exact → embedding → LLM)
|
||||
Stage 5: PATTERN MATCH (NEU — Keyword + Embedding + Domain-Bonus)
|
||||
Stage 6: CONTROL COMPOSE (NEU — Pattern + Obligation → Control)
|
||||
Stage 7: HARMONIZE (unveraendert)
|
||||
Stage 8: ANCHOR SEARCH (unveraendert)
|
||||
Stage 9: STORE + CROSSWALK (erweitert — Crosswalk-Matrix)
|
||||
Stage 10: MARK PROCESSED (unveraendert)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Obligation Extractor (Stage 4)
|
||||
|
||||
3-Tier Extraktion (schnellste zuerst):
|
||||
|
||||
| Tier | Methode | Latenz | Trefferquote |
|
||||
|------|---------|--------|--------------|
|
||||
| 1 | Exact Match (regulation_code + article → obligation_id) | <1ms | ~40% |
|
||||
| 2 | Embedding Match (Cosine > 0.80 gegen 325 Obligations) | ~50ms | ~30% |
|
||||
| 3 | LLM Extraction (lokales Ollama, nur Fallback) | ~2s | ~25% |
|
||||
|
||||
**Datei:** `compliance/services/obligation_extractor.py`
|
||||
|
||||
### Pattern Library (Stage 5)
|
||||
|
||||
50 YAML-basierte Control Patterns in 16 Domains:
|
||||
|
||||
| Datei | Patterns | Domains |
|
||||
|-------|----------|---------|
|
||||
| `core_patterns.yaml` | 30 | AUTH, CRYP, NET, DATA, LOG, ACC, SEC, INC, COMP, GOV, RES |
|
||||
| `domain_it_security.yaml` | 20 | SEC, NET, AUTH, LOG, CRYP |
|
||||
|
||||
**Pattern ID Format:** `CP-{DOMAIN}-{NNN}` (z.B. `CP-AUTH-001`)
|
||||
|
||||
**Matching:** 2-Tier (Keyword-Index + Embedding), Domain-Bonus (+0.10)
|
||||
|
||||
**Dateien:**
|
||||
- `ai-compliance-sdk/policies/control_patterns/core_patterns.yaml`
|
||||
- `ai-compliance-sdk/policies/control_patterns/domain_it_security.yaml`
|
||||
- `compliance/services/pattern_matcher.py`
|
||||
|
||||
### Control Composer (Stage 6)
|
||||
|
||||
Drei Kompositions-Modi:
|
||||
|
||||
| Modus | Wann | Qualitaet |
|
||||
|-------|------|-----------|
|
||||
| Pattern-guided | Pattern gefunden, LLM antwortet | Hoch |
|
||||
| Template-only | LLM-Fehler, aber Pattern vorhanden | Mittel |
|
||||
| Fallback | Kein Pattern-Match | Basis |
|
||||
|
||||
**Datei:** `compliance/services/control_composer.py`
|
||||
|
||||
---
|
||||
|
||||
### Decomposition Pass (Pass 0)
|
||||
|
||||
Zerlegt Rich Controls in atomare Controls. Laeuft VOR den Migration Passes 1-5.
|
||||
|
||||
#### Pass 0a — Obligation Extraction
|
||||
|
||||
Extrahiert einzelne normative Pflichten aus einem Rich Control per LLM.
|
||||
|
||||
**6 Guardrails:**
|
||||
|
||||
1. Nur normative Aussagen (müssen, sicherzustellen, verpflichtet, ...)
|
||||
2. Ein Hauptverb pro Pflicht
|
||||
3. Testpflichten separat
|
||||
4. Meldepflichten separat
|
||||
5. Nicht auf Evidence-Ebene zerlegen
|
||||
6. Parent-Link immer erhalten
|
||||
|
||||
**Quality Gate:** Jeder Kandidat wird gegen 6 Kriterien geprueft:
|
||||
|
||||
- `has_normative_signal` — Normatives Sprachsignal erkannt
|
||||
- `single_action` — Nur eine Handlung
|
||||
- `not_rationale` — Keine blosse Begruendung
|
||||
- `not_evidence_only` — Kein reines Evidence-Fragment
|
||||
- `min_length` — Mindestlaenge erreicht
|
||||
- `has_parent_link` — Referenz zum Rich Control
|
||||
|
||||
Kritische Checks: `has_normative_signal`, `not_evidence_only`, `min_length`, `has_parent_link`
|
||||
|
||||
#### Pass 0b — Atomic Control Composition
|
||||
|
||||
Erstellt aus jedem validierten Obligation Candidate ein atomares Control
|
||||
(LLM-gestuetzt mit Template-Fallback).
|
||||
|
||||
**Datei:** `compliance/services/decomposition_pass.py`
|
||||
|
||||
---
|
||||
|
||||
### Migration Passes (1-5)
|
||||
|
||||
Nicht-destruktive Passes fuer bestehende Controls:
|
||||
|
||||
| Pass | Beschreibung | Methode |
|
||||
|------|-------------|---------|
|
||||
| 1 | Obligation Linkage | source_citation → article → obligation_id (deterministisch) |
|
||||
| 2 | Pattern Classification | Keyword-Matching gegen Pattern Library |
|
||||
| 3 | Quality Triage | Kategorisierung: review / needs_obligation / needs_pattern / legacy_unlinked |
|
||||
| 4 | Crosswalk Backfill | crosswalk_matrix Zeilen fuer verlinkte Controls |
|
||||
| 5 | Deduplication | Gleiche obligation_id + pattern_id → Duplikat markieren |
|
||||
|
||||
**Datei:** `compliance/services/pipeline_adapter.py`
|
||||
|
||||
---
|
||||
|
||||
### Crosswalk Matrix
|
||||
|
||||
Der "goldene Faden" von Gesetz bis Umsetzung:
|
||||
|
||||
```
|
||||
Regulation → Article → Obligation → Pattern → Master Control → TOM
|
||||
```
|
||||
|
||||
Ein atomares Control kann von **mehreren Gesetzen** gleichzeitig gefordert sein.
|
||||
Die Crosswalk-Matrix bildet diese N:M-Beziehung ab.
|
||||
|
||||
---
|
||||
|
||||
### DB-Schema (Migrations 060 + 061)
|
||||
|
||||
**Migration 060:** Multi-Layer Basistabellen
|
||||
|
||||
| Tabelle | Beschreibung |
|
||||
|---------|-------------|
|
||||
| `obligation_extractions` | Chunk→Obligation Verknuepfungen (3-Tier Tracking) |
|
||||
| `control_patterns` | DB-Spiegel der YAML-Patterns fuer SQL-Queries |
|
||||
| `crosswalk_matrix` | Goldener Faden: Regulation→Obligation→Pattern→Control |
|
||||
| `canonical_controls.pattern_id` | Pattern-Zuordnung (neues Feld) |
|
||||
| `canonical_controls.obligation_ids` | Obligation-IDs als JSONB-Array (neues Feld) |
|
||||
|
||||
**Migration 061:** Decomposition-Tabellen
|
||||
|
||||
| Tabelle | Beschreibung |
|
||||
|---------|-------------|
|
||||
| `obligation_candidates` | Extrahierte atomare Pflichten aus Rich Controls |
|
||||
| `canonical_controls.parent_control_uuid` | Self-Referenz zum Rich Control (neues Feld) |
|
||||
| `canonical_controls.decomposition_method` | Zerlegungsmethode (neues Feld) |
|
||||
|
||||
---
|
||||
|
||||
### API Endpoints (Crosswalk Routes)
|
||||
|
||||
Alle Endpoints unter `/api/compliance/v1/canonical/`:
|
||||
|
||||
#### Pattern Library
|
||||
|
||||
| Methode | Pfad | Beschreibung |
|
||||
|---------|------|-------------|
|
||||
| GET | `/patterns` | Alle Patterns (Filter: domain, category, tag) |
|
||||
| GET | `/patterns/{pattern_id}` | Einzelnes Pattern mit Details |
|
||||
| GET | `/patterns/{pattern_id}/controls` | Controls aus einem Pattern |
|
||||
|
||||
#### Obligation Extraction
|
||||
|
||||
| Methode | Pfad | Beschreibung |
|
||||
|---------|------|-------------|
|
||||
| POST | `/obligations/extract` | Obligation aus Text extrahieren + Pattern matchen |
|
||||
|
||||
#### Crosswalk Matrix
|
||||
|
||||
| Methode | Pfad | Beschreibung |
|
||||
|---------|------|-------------|
|
||||
| GET | `/crosswalk` | Query (Filter: regulation, article, obligation, pattern) |
|
||||
| GET | `/crosswalk/stats` | Abdeckungs-Statistiken |
|
||||
|
||||
#### Migration + Decomposition
|
||||
|
||||
| Methode | Pfad | Beschreibung |
|
||||
|---------|------|-------------|
|
||||
| POST | `/migrate/decompose` | Pass 0a: Obligation Extraction aus Rich Controls |
|
||||
| POST | `/migrate/compose-atomic` | Pass 0b: Atomare Control-Komposition |
|
||||
| POST | `/migrate/link-obligations` | Pass 1: Obligation-Linkage |
|
||||
| POST | `/migrate/classify-patterns` | Pass 2: Pattern-Klassifikation |
|
||||
| POST | `/migrate/triage` | Pass 3: Quality Triage |
|
||||
| POST | `/migrate/backfill-crosswalk` | Pass 4: Crosswalk-Backfill |
|
||||
| POST | `/migrate/deduplicate` | Pass 5: Deduplizierung |
|
||||
| GET | `/migrate/status` | Migrations-Fortschritt |
|
||||
| GET | `/migrate/decomposition-status` | Decomposition-Fortschritt |
|
||||
|
||||
**Route-Datei:** `compliance/api/crosswalk_routes.py`
|
||||
|
||||
---
|
||||
|
||||
### Multi-Layer Tests
|
||||
|
||||
| Datei | Tests | Schwerpunkt |
|
||||
|-------|-------|-------------|
|
||||
| `tests/test_obligation_extractor.py` | 107 | 3-Tier Extraktion, Helpers, Regex |
|
||||
| `tests/test_pattern_matcher.py` | 72 | Keyword-Index, Embedding, Domain-Affinity |
|
||||
| `tests/test_control_composer.py` | 54 | Composition, Templates, License-Rules |
|
||||
| `tests/test_pipeline_adapter.py` | 36 | Pipeline Integration, 5 Migration Passes |
|
||||
| `tests/test_crosswalk_routes.py` | 57 | 15 API Endpoints, Pydantic Models |
|
||||
| `tests/test_decomposition_pass.py` | 68 | Pass 0a/0b, Quality Gate, 6 Guardrails |
|
||||
| `tests/test_migration_060.py` | 12 | Schema-Validierung |
|
||||
| `tests/test_control_patterns.py` | 18 | YAML-Validierung, Pattern-Schema |
|
||||
| **Gesamt Multi-Layer** | | **424 Tests** |
|
||||
|
||||
### Geplanter Migrationsflow
|
||||
|
||||
```
|
||||
Rich Controls (~25.000, release_state=raw)
|
||||
↓
|
||||
Pass 0a: Obligation Extraction (LLM + Quality Gate)
|
||||
↓
|
||||
Pass 0b: Atomic Control Composition (LLM + Template Fallback)
|
||||
↓
|
||||
Pass 1: Obligation Linking (deterministisch)
|
||||
↓
|
||||
Pass 2: Pattern Classification (Keyword + Embedding)
|
||||
↓
|
||||
Pass 3: Quality Triage
|
||||
↓
|
||||
Pass 4: Crosswalk Backfill
|
||||
↓
|
||||
Pass 5: Dedup / Merge
|
||||
↓
|
||||
Master Controls (~15.000-20.000 mit voller Traceability)
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user