# Session-Instruktionen: Block F — Hardcoded Knowledge Migration **Datum:** 2026-05-03 **Fuer:** Naechste Claude-Session **Repo:** breakpilot-core (~/Projekte/breakpilot-core) --- ## NAECHSTER SCHRITT: Block F1 — Regulation Registry ### Was zu tun ist 1. **DB-Tabelle** `compliance.regulation_registry` erstellen (Migration-Script) 2. **Daten migrieren** aus `control_generator.py` (135 Eintraege) + `source_type_classification.py` (58) 3. **Auto-Create** im RAG-Service bei Document-Upload (status='needs_review') 4. **Backend-API** in breakpilot-compliance Backend (GET/POST/PUT /v1/regulations) 5. **Frontend** in breakpilot-compliance Admin unter `/sdk/regulation-registry` (zwischen roadmap und isms) 6. **Sync-Check** Script (wöchentlich: Qdrant regulation_ids vs. DB) 7. **Code umstellen** in control_generator.py (Dict → DB-Query mit Cache) ### Frontend-Anforderungen (breakpilot-compliance Admin, Port 3007) - NAV-Position: zwischen `/sdk/roadmap` und `/sdk/isms` - Tabelle mit allen Regulations (sortierbar, filterbar) - Status-Badge: "Needs Review" (gelb), "Active" (grün), "Deprecated" (grau) - Counter im NAV für unreviewed Einträge - Inline-Edit: license_rule, jurisdiction, source_type, names - "Approve" Button → status='active' - Diskrepanz-Anzeige: regulation_ids in Qdrant die nicht in DB sind ### Kritische Dateien | Repo | Datei | Aktion | |------|-------|--------| | core | `control-pipeline/services/control_generator.py` Z.75-236 | EDIT: Dict → DB | | core | `control-pipeline/data/source_type_classification.py` | DELETE (nach Migration) | | core | `rag-service/api/documents.py` | EDIT: Auto-Create bei Upload | | compliance | `backend-compliance/compliance/api/regulations.py` | NEU: API Endpoints | | compliance | `admin-compliance/app/sdk/regulation-registry/` | NEU: Frontend-Seite | ### DB-Schema ```sql CREATE TABLE compliance.regulation_registry ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), regulation_id VARCHAR(100) UNIQUE NOT NULL, regulation_name_de TEXT, regulation_name_en TEXT, regulation_short VARCHAR(50), license_rule INTEGER NOT NULL DEFAULT 1 CHECK (license_rule IN (1, 2, 3)), license_type VARCHAR(50), source_type VARCHAR(20) NOT NULL DEFAULT 'law', jurisdiction VARCHAR(10), category VARCHAR(50), celex VARCHAR(20), url TEXT, status VARCHAR(20) NOT NULL DEFAULT 'needs_review', created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); CREATE INDEX idx_reg_registry_status ON compliance.regulation_registry(status); CREATE INDEX idx_reg_registry_jurisdiction ON compliance.regulation_registry(jurisdiction); ``` --- ## GESAMTPLAN Block F (4 Tage) | Phase | Was | Aufwand | Status | |-------|-----|---------|--------| | F1 | Regulation Registry (DB + API + Frontend + Auto-Create) | 1 Tag | 🔥 NAECHSTER | | F2 | Action Types + Synonyme → DB | 1 Tag | Ausstehend | | F3 | Object Synonyms → DB | 0.5 Tag | Ausstehend | | F4 | LLM Synonym-Enrichment | 1 Tag | Ausstehend | | F5 | Validation + Cleanup | 0.5 Tag | Ausstehend | --- ## SESSION 02-03.05.2026 ERLEDIGT - Block D5+: NIST/ENISA PDF-Qualitaet (0%→45%) - Block D6: Citation-Backfill (3.651 Controls) - Block E2: 8 DE-Gesetze (1.629 Chunks) - Block E3: 5 EU-Regulierungen (1.057 Chunks) - Block E4: GoBD, BAIT, VAIT (144 Chunks) - Block E6: 3 CH + 4 AT Gesetze (3.881 Chunks) - Block E7: 9 Urteile als Volltext (709 Chunks total) - Schrems II: 154, BVerfG Datenanalyse: 161, DSK OH Telemedien: 119 - Meta: 101, BAG Zeiterfassung: 48, Planet49: 42, SCHUFA: 41 - Schadenersatz: 29, Google Fonts: 14 - Infra: Qdrant-Snapshot, Upload-before-Delete, 99 Tests **Gesamt neue Chunks diese Session: ~25.000+** --- ## TESTS ```bash # Embedding-Service (99 Tests) cd embedding-service && python3 -m pytest test_chunking.py test_d4_bgb.py test_nist_normalization.py -v # Control-Pipeline (387 Tests) PYTHONPATH=control-pipeline python3 -m pytest control-pipeline/tests/ -v # Qdrant-Snapshot ssh macmini "cd ~/Projekte/breakpilot-core && bash scripts/qdrant-snapshot.sh" ``` --- ## PLAN-DATEI Block F Detailplan: `/Users/benjaminadmin/.claude/plans/humming-nibbling-sonnet.md`