docs: comprehensive session handover — Blocks F+G complete, next: MC quality refinement
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,69 +1,107 @@
|
|||||||
# Session-Instruktionen: G-pre1 Object-Normalisierung
|
# Session-Instruktionen: Master Control Qualitaet + Regulation-Source Split
|
||||||
|
|
||||||
**Datum:** 2026-05-05
|
**Datum:** 2026-05-06
|
||||||
**Fuer:** Naechste Claude-Session
|
**Fuer:** Naechste Claude-Session
|
||||||
**Repo:** breakpilot-core (~/Projekte/breakpilot-core)
|
**Repo:** breakpilot-core (~/Projekte/breakpilot-core)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## NAECHSTER SCHRITT: G-pre1 — Hierarchisches Themen-Clustering
|
## NAECHSTER SCHRITT: 25 grosse Master Controls aufsplitten
|
||||||
|
|
||||||
### Analyse-Ergebnis (05.05.2026)
|
### Problem
|
||||||
|
|
||||||
|
25 Master Controls sind zu generisch (>200 Atomic Controls pro MC). Sie basieren auf generischen Security-Domain-Keywords wie "monitoring", "encryption", "personal_data". Embedding-Clustering allein reicht nicht — die Controls handeln zwar alle von "monitoring", aber fuer unterschiedliche Regulierungen (DSGVO, NIS2, NIST, BSI etc.).
|
||||||
|
|
||||||
|
### Die 25 betroffenen MCs
|
||||||
|
|
||||||
|
| MC-ID | Name | Controls | Problem |
|
||||||
|
|-------|------|----------|---------|
|
||||||
|
| MC-8292 | monitoring | 6.157 | Alles von Video bis Vulnerability |
|
||||||
|
| MC-2260 | procedure | 4.176 | Generisch |
|
||||||
|
| MC-8302 | alerting | 3.126 | Meldepflichten aller Gesetze gemischt |
|
||||||
|
| MC-8306 | personal_data | 3.057 | DSGVO + NIS2 + AT/CH gemischt |
|
||||||
|
| MC-8312 | training | 2.572 | |
|
||||||
|
| MC-7932 | certificate_management | 2.350 | |
|
||||||
|
| MC-8317 | incident | 2.288 | |
|
||||||
|
| MC-8329 | encryption | 1.790 | |
|
||||||
|
| MC-8333 | audit_logging | 1.645 | |
|
||||||
|
| MC-8321 | policy | 1.463 | |
|
||||||
|
| MC-8325 | patch_management | 1.155 | |
|
||||||
|
| MC-8338 | network_security | 1.071 | |
|
||||||
|
| ... | (13 weitere) | 200-960 | |
|
||||||
|
|
||||||
|
### Loesung: Regulation-Source Split
|
||||||
|
|
||||||
|
Statt nur nach Embedding-Aehnlichkeit zu clustern, nach **Regulation-Quelle** aufteilen:
|
||||||
|
|
||||||
```
|
```
|
||||||
Unique raw objects: 183.058
|
MC "encryption" (1.790 Controls)
|
||||||
Nach normalize_object(): 144.151 (nur 21% Reduktion)
|
→ encryption_dsgvo (DSGVO Art. 32, ~200)
|
||||||
Singletons: 144.117 (99.98% sind einzigartig!)
|
→ encryption_nis2 (NIS2 Art. 21, ~150)
|
||||||
Gruppen mit 2+ Members: 34
|
→ encryption_nist (NIST SC-13, ~300)
|
||||||
|
→ encryption_bsi (BSI, ~200)
|
||||||
|
→ encryption_owasp (OWASP, ~100)
|
||||||
|
→ encryption_other (~840)
|
||||||
```
|
```
|
||||||
|
|
||||||
**Erkenntnis:** Das Problem ist NICHT "gleiche Objekte mit verschiedenen Namen" sondern "144k granulare Objekte die zu uebergeordneten Themen zusammengefasst werden muessen."
|
### Script-Ansatz
|
||||||
|
|
||||||
### Neuer Ansatz: Hierarchisches Themen-Clustering
|
```python
|
||||||
|
# Fuer jeden der 25 grossen MCs:
|
||||||
|
# 1. Hole alle member controls mit source_citation->>'source'
|
||||||
|
# 2. Gruppiere nach source (Regulation)
|
||||||
|
# 3. Erstelle Sub-MCs pro Regulation
|
||||||
|
# 4. Controls ohne source → "general" Sub-MC
|
||||||
|
```
|
||||||
|
|
||||||
Statt 1:1 Synonym-Matching brauchen wir:
|
### Qualitaetsanforderung (WICHTIG!)
|
||||||
1. **Themen-Hierarchie** definieren (z.B. "Authentication & Access" → password, mfa, session, rbac)
|
|
||||||
2. **Embedding-basierte Zuordnung** jedes Objects zu einem Thema
|
|
||||||
3. **Qdrant-basiert** (kein voller Distance-Matrix im RAM noetig)
|
|
||||||
4. Ggf. Sampling + Mini-Batch K-Means statt DBSCAN
|
|
||||||
|
|
||||||
### Speicher-Problem
|
**Nur "sehr gut" ist akzeptabel.** Mittlere MCs (30-100 Controls) sind bereits excellent:
|
||||||
- 144k × 144k Distance-Matrix = ~83 GB RAM → nicht machbar
|
- MC-1082 (data_retention_policies, 52) → perfekt koharent
|
||||||
- Alternative: Qdrant nearest-neighbor search pro Object (O(n) statt O(n²))
|
- MC-5477 (austausch_von_cybersicherheitsinformationen, 5) → perfekt
|
||||||
- Oder: Mini-Batch K-Means mit k=20.000 auf 144k × 1024 Matrix (~600 MB, machbar)
|
|
||||||
|
|
||||||
### Analyse-Script vorhanden
|
Ziel: ALLE MCs sollen diese Qualitaet haben. Kein MC >100 Controls.
|
||||||
- `control-pipeline/scripts/gpre1_analyze.py` (lokal, nicht committed)
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## SESSION 03-05.05.2026 ERLEDIGT
|
## SESSION 03-06.05.2026 KOMPLETT ERLEDIGT
|
||||||
|
|
||||||
### Block F (Hardcoded Knowledge → DB) — KOMPLETT ✅
|
### Block F (Hardcoded Knowledge → DB)
|
||||||
- F1: regulation_registry (223 Eintraege)
|
- F1: regulation_registry (223 Eintraege) ✅
|
||||||
- F2: action_types (34) + action_synonyms (368)
|
- F2: action_types (34) + action_synonyms (368) ✅
|
||||||
- F3: object_synonyms (320)
|
- F3: object_synonyms (320) ✅
|
||||||
- F4: LLM Enrichment (+468 neue Synonyme via Ollama)
|
- F4: LLM Enrichment (+468 Synonyme via Ollama) ✅
|
||||||
- F5: Validation (8 Tests) + Dicts als Fallback beibehalten
|
- F5: Validation (8 Tests, Dicts als Fallback) ✅
|
||||||
- 454 Pipeline-Tests pass, 0 Regressionen
|
|
||||||
|
|
||||||
### Control Generation Pipeline — KOMPLETT ✅
|
### Control Generation Pipeline
|
||||||
- 1.599 Rich Controls aus E-Block Chunks generiert (~$17 Anthropic)
|
- 1.599 Rich Controls aus E-Block Chunks (~$17 Anthropic)
|
||||||
- 11.522 Obligations extrahiert (Pass 0a, ~$4 Anthropic)
|
- 11.522 Obligations (Pass 0a, ~$4)
|
||||||
- 1.147 Atomic Controls komponiert (Pass 0b, ~$4.60 Anthropic)
|
- 1.147 Atomic Controls (Pass 0b, ~$4.60)
|
||||||
- **Gesamtkosten: ~$25.60**
|
- **Gesamtkosten: ~$25.60**
|
||||||
|
|
||||||
### Production Sync — KOMPLETT ✅
|
### Production Sync
|
||||||
- 2.625 neue Controls auf Production synchronisiert (ON CONFLICT DO NOTHING)
|
- 2.625 Controls + 11.522 Obligations auf Production synchronisiert
|
||||||
- 11.522 Obligations auf Production synchronisiert
|
- Production: 294.027 Controls total
|
||||||
- Production: 294.027 Controls total (vorher 291.402)
|
- Backups: lokal + production auf MacBook
|
||||||
- Backups auf MacBook: komprimiert (30 MB) + plain SQL (1.3 GB)
|
|
||||||
|
### Block G-pre (Master Controls)
|
||||||
|
- G-pre1: 144k Objects → 7.753 Gruppen (K-Means k=5000 + Sub-Cluster + Refinement)
|
||||||
|
- G-pre2: 5.329 Master Controls, 172.504+ Members
|
||||||
|
- G-pre3: Master Control API (list, stats, detail)
|
||||||
|
- **Qualitaet:** Kleine/mittlere MCs excellent, 25 grosse MCs brauchen Regulation-Source Split
|
||||||
|
|
||||||
|
### Block G (Compliance Execution Layer)
|
||||||
|
- G1: Decision Trace (decision_traces Tabelle + 6 API Endpoints) ✅
|
||||||
|
- G2: Compliance Commit Ledger (compliance_commits + 5 Endpoints) ✅
|
||||||
|
- G3: Full Decision Memory (decision_events + Timeline + 4 Endpoints) ✅
|
||||||
|
- G4: Pre-Deployment Enforcement (deployment_checks + Override + 4 Endpoints) ✅
|
||||||
|
|
||||||
### Infrastruktur
|
### Infrastruktur
|
||||||
- Vault CPU-Fix committed (Marker-File + idempotente Checks)
|
- Vault CPU-Fix committed (Marker-File + idempotente Checks)
|
||||||
- Pass 0a Endpoint im Core Control-Pipeline registriert
|
- Pass 0a Endpoint im Core Control-Pipeline registriert
|
||||||
- 61 neue regulation_ids in regulation_registry eingefuegt
|
- Gitea Timezone-Fix (docker-compose.yml)
|
||||||
- Container bp-core-vault, bp-lehrer-opensearch, fewo-finance-agent gestoppt (CPU-Saver)
|
- 61 neue regulation_ids in regulation_registry
|
||||||
|
- Container-Cleanup (fewo-finance-agent, mediaanalysisd)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -71,21 +109,67 @@ Statt 1:1 Synonym-Matching brauchen wir:
|
|||||||
|
|
||||||
| Tabelle | Rows | Migration |
|
| Tabelle | Rows | Migration |
|
||||||
|---------|------|-----------|
|
|---------|------|-----------|
|
||||||
| compliance.regulation_registry | 223 | 002_regulation_registry.sql |
|
| compliance.regulation_registry | 223 | 002 |
|
||||||
| compliance.action_types | 34 | 003_action_object_ontology.sql |
|
| compliance.action_types | 34 | 003 |
|
||||||
| compliance.action_synonyms | 368 | 003_action_object_ontology.sql |
|
| compliance.action_synonyms | 368 | 003 |
|
||||||
| compliance.object_synonyms | 320 | 003_action_object_ontology.sql |
|
| compliance.object_synonyms | 320 | 003 |
|
||||||
|
| compliance.object_groups | 7.753 | 004 |
|
||||||
|
| compliance.master_controls | 5.329 | 005 |
|
||||||
|
| compliance.master_control_members | ~170k | 005 |
|
||||||
|
| compliance.decision_traces | 0 (Schema ready) | 006 |
|
||||||
|
| compliance.compliance_commits | 0 (Schema ready) | 007 |
|
||||||
|
| compliance.decision_events | 0 (Schema ready) | 008 |
|
||||||
|
| compliance.deployment_checks | 0 (Schema ready) | 009 |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## GESTOPPTE CONTAINER (wieder starten wenn noetig)
|
## API Endpoints (Core Control-Pipeline, Port 8098)
|
||||||
|
|
||||||
|
### Bestehend
|
||||||
|
- `/v1/canonical/generate/*` — Control Generation Pipeline
|
||||||
|
- `/v1/canonical/generate/run-pass0a` — Pass 0a (NEU in dieser Session)
|
||||||
|
- `/v1/canonical/generate/submit-pass0b` — Pass 0b Batch API
|
||||||
|
|
||||||
|
### Neu (diese Session)
|
||||||
|
- `/v1/master-controls` — G-pre3: Liste, Stats, Detail
|
||||||
|
- `/v1/decision-traces` — G1: CRUD + Stats
|
||||||
|
- `/v1/controls/{id}/full-trace` — G1: Volle Kette
|
||||||
|
- `/v1/compliance-commits` — G2: Commit Ledger
|
||||||
|
- `/v1/decision-events` — G3: Lifecycle Events + Timeline
|
||||||
|
- `/v1/deployment-checks` — G4: Pre-Deploy Gate + Override
|
||||||
|
|
||||||
|
### API-Zugriff (WICHTIG)
|
||||||
```bash
|
```bash
|
||||||
ssh macmini "/usr/local/bin/docker start bp-core-vault bp-lehrer-opensearch"
|
# Nur via Docker exec (Port 8098 blockiert durch document-crawler)
|
||||||
# fewo-finance-agent: fremder Container, nicht starten
|
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline curl -sf http://127.0.0.1:8098/..."
|
||||||
```
|
```
|
||||||
|
|
||||||
**Vault:** Erst nach Deploy des Fixes (Marker-File) starten, sonst CPU-Loop.
|
---
|
||||||
|
|
||||||
|
## BACKUPS (auf MacBook)
|
||||||
|
|
||||||
|
| Datei | Inhalt | Groesse |
|
||||||
|
|-------|--------|---------|
|
||||||
|
| controls_backup_20260505.csv | 1.599 neue Controls | 7.2 MB |
|
||||||
|
| obligations_backup_20260505.csv | 11.522 Obligations | 6.2 MB |
|
||||||
|
| production_backup_20260505.dump | Production komprimiert | 30 MB |
|
||||||
|
| production_backup_20260505_plain.sql | Production plain | 1.3 GB |
|
||||||
|
| local_backup_20260506.dump | Lokale DB komprimiert | ~30 MB |
|
||||||
|
| production_backup_20260506.dump | Production komprimiert | ~30 MB |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## GESTOPPTE CONTAINER
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Vault: Erst nach Fix-Deploy starten (Marker-File noetig)
|
||||||
|
ssh macmini "/usr/local/bin/docker start bp-core-vault"
|
||||||
|
|
||||||
|
# OpenSearch: Bei Bedarf
|
||||||
|
ssh macmini "/usr/local/bin/docker start bp-lehrer-opensearch"
|
||||||
|
|
||||||
|
# fewo-finance-agent: Fremder Container, nicht starten
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -98,11 +182,13 @@ PYTHONPATH=control-pipeline python3 -m pytest control-pipeline/tests/ -v
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## API-Zugriff (WICHTIG)
|
## OFFENE PUNKTE FUER ANDERE SESSIONS
|
||||||
|
|
||||||
- **Control-Pipeline:** Nur via Docker exec erreichbar (Port 8098 blockiert durch document-crawler)
|
1. **Qdrant API-Key** fuer Production (qdrant-dev.breakpilot.ai) ist ungueltig (401). Muss in Coolify erneuert werden.
|
||||||
```bash
|
2. **DSI-Check False Positives**: Controls mischen interne Governance mit externen DSI-Anforderungen. Fix: nur Controls mit Art. 13/14 Referenz fuer DSI-Checks nutzen.
|
||||||
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline curl -sf http://127.0.0.1:8098/..."
|
3. **Spotlight + mediaanalysisd** auf Mac Mini deaktivieren (braucht sudo):
|
||||||
```
|
```bash
|
||||||
- **Compliance Backend:** Zeigt auf PRODUCTION DB (nicht lokal!)
|
sudo mdutil -a -i off
|
||||||
- **Pass 0a Endpoint:** `/v1/canonical/generate/run-pass0a` (auf Core Pipeline, Port 8098)
|
sudo launchctl disable system/com.apple.mediaanalysisd
|
||||||
|
```
|
||||||
|
4. **Production DB Sync** fuer neue G-Block Tabellen (decision_traces, compliance_commits, decision_events, deployment_checks) noch ausstehend — Tabellen sind leer, Schema muss auf Production deployed werden.
|
||||||
|
|||||||
Reference in New Issue
Block a user