docs: session handover — F1-F3 done, control generation running
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,115 +1,127 @@
|
||||
# Session-Instruktionen: Block F — Hardcoded Knowledge Migration
|
||||
# Session-Instruktionen: Control Generation + Block F Rest
|
||||
|
||||
**Datum:** 2026-05-03
|
||||
**Datum:** 2026-05-04
|
||||
**Fuer:** Naechste Claude-Session
|
||||
**Repo:** breakpilot-core (~/Projekte/breakpilot-core)
|
||||
|
||||
---
|
||||
|
||||
## NAECHSTER SCHRITT: Block F1 — Regulation Registry
|
||||
## LAUFENDER JOB (vor dieser Session pruefen!)
|
||||
|
||||
### Was zu tun ist
|
||||
### Control Generation Job `60190756-b660-4b03-869a-fa1076394cca`
|
||||
|
||||
1. **DB-Tabelle** `compliance.regulation_registry` erstellen (Migration-Script)
|
||||
2. **Daten migrieren** aus `control_generator.py` (135 Eintraege) + `source_type_classification.py` (58)
|
||||
3. **Auto-Create** im RAG-Service bei Document-Upload (status='needs_review')
|
||||
4. **Backend-API** in breakpilot-compliance Backend (GET/POST/PUT /v1/regulations)
|
||||
5. **Frontend** in breakpilot-compliance Admin unter `/sdk/regulation-registry` (zwischen roadmap und isms)
|
||||
6. **Sync-Check** Script (wöchentlich: Qdrant regulation_ids vs. DB)
|
||||
7. **Code umstellen** in control_generator.py (Dict → DB-Query mit Cache)
|
||||
Gestartet am 04.05.2026 ~00:30. Verarbeitet neue DE/CH/AT-Gesetze aus `bp_compliance_gesetze`.
|
||||
|
||||
### Frontend-Anforderungen (breakpilot-compliance Admin, Port 3007)
|
||||
```bash
|
||||
# Status pruefen:
|
||||
ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf http://127.0.0.1:8002/api/compliance/v1/canonical/generate/status/60190756-b660-4b03-869a-fa1076394cca"
|
||||
|
||||
- NAV-Position: zwischen `/sdk/roadmap` und `/sdk/isms`
|
||||
- Tabelle mit allen Regulations (sortierbar, filterbar)
|
||||
- Status-Badge: "Needs Review" (gelb), "Active" (grün), "Deprecated" (grau)
|
||||
- Counter im NAV für unreviewed Einträge
|
||||
- Inline-Edit: license_rule, jurisdiction, source_type, names
|
||||
- "Approve" Button → status='active'
|
||||
- Diskrepanz-Anzeige: regulation_ids in Qdrant die nicht in DB sind
|
||||
# Processed Stats:
|
||||
ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf http://127.0.0.1:8002/api/compliance/v1/canonical/generate/processed-stats"
|
||||
```
|
||||
|
||||
### Kritische Dateien
|
||||
|
||||
| Repo | Datei | Aktion |
|
||||
|------|-------|--------|
|
||||
| core | `control-pipeline/services/control_generator.py` Z.75-236 | EDIT: Dict → DB |
|
||||
| core | `control-pipeline/data/source_type_classification.py` | DELETE (nach Migration) |
|
||||
| core | `rag-service/api/documents.py` | EDIT: Auto-Create bei Upload |
|
||||
| compliance | `backend-compliance/compliance/api/regulations.py` | NEU: API Endpoints |
|
||||
| compliance | `admin-compliance/app/sdk/regulation-registry/` | NEU: Frontend-Seite |
|
||||
|
||||
### DB-Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE compliance.regulation_registry (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
regulation_id VARCHAR(100) UNIQUE NOT NULL,
|
||||
regulation_name_de TEXT,
|
||||
regulation_name_en TEXT,
|
||||
regulation_short VARCHAR(50),
|
||||
license_rule INTEGER NOT NULL DEFAULT 1 CHECK (license_rule IN (1, 2, 3)),
|
||||
license_type VARCHAR(50),
|
||||
source_type VARCHAR(20) NOT NULL DEFAULT 'law',
|
||||
jurisdiction VARCHAR(10),
|
||||
category VARCHAR(50),
|
||||
celex VARCHAR(20),
|
||||
url TEXT,
|
||||
status VARCHAR(20) NOT NULL DEFAULT 'needs_review',
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_reg_registry_status ON compliance.regulation_registry(status);
|
||||
CREATE INDEX idx_reg_registry_jurisdiction ON compliance.regulation_registry(jurisdiction);
|
||||
**WICHTIG:** API-Zugriff nur ueber Docker exec (nginx-HTTPS-Proxy ist langsam/timeout):
|
||||
```bash
|
||||
ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf --max-time 60 -X POST http://127.0.0.1:8002/api/compliance/v1/canonical/generate ..."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GESAMTPLAN Block F (4 Tage)
|
||||
## NAECHSTE SCHRITTE (Reihenfolge!)
|
||||
|
||||
| Phase | Was | Aufwand | Status |
|
||||
|-------|-----|---------|--------|
|
||||
| F1 | Regulation Registry (DB + API + Frontend + Auto-Create) | 1 Tag | 🔥 NAECHSTER |
|
||||
| F2 | Action Types + Synonyme → DB | 1 Tag | Ausstehend |
|
||||
| F3 | Object Synonyms → DB | 0.5 Tag | Ausstehend |
|
||||
| F4 | LLM Synonym-Enrichment | 1 Tag | Ausstehend |
|
||||
| F5 | Validation + Cleanup | 0.5 Tag | Ausstehend |
|
||||
### 1. Control Generation fuer verbleibende Collections
|
||||
|
||||
Nach Abschluss von Job 1 (bp_compliance_gesetze), starten:
|
||||
|
||||
**Job 2: bp_compliance_ce** (EU-Regulierungen, ~20k Chunks)
|
||||
```bash
|
||||
ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf --max-time 60 -X POST http://127.0.0.1:8002/api/compliance/v1/canonical/generate \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
\"collections\": [\"bp_compliance_ce\"],
|
||||
\"max_chunks\": 2000,
|
||||
\"max_controls\": 500,
|
||||
\"batch_size\": 5,
|
||||
\"skip_web_search\": true,
|
||||
\"regulation_filter\": [\"dsgvo_2016\",\"nis2_2022\",\"cra_2024\",\"ai_act_2024\",\"dsa_2022\",\"dma_2022\",\"dga_2022\",\"dora_2022\",\"dataact_2023\",\"dpf_2023\",\"dsm_2019\",\"gpsr_2023\",\"eprivacy_2002\",\"ecommerce_2000\",\"machinery_2023\",\"eu_mdr_2017\",\"ifrs_2023\",\"amlr_2024\",\"digital_content_2019\",\"omnibus_2019\",\"csrd_2022\",\"csddd_2024\",\"eu_taxonomy_2020\",\"eidas_2_0_2024\",\"pay_transparency_2023\",\"fda_human_factors\",\"eu_machinery_guide_2006_42\"]
|
||||
}'"
|
||||
```
|
||||
|
||||
**Job 3: bp_compliance_datenschutz** (~4k Chunks)
|
||||
```bash
|
||||
ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf --max-time 60 -X POST http://127.0.0.1:8002/api/compliance/v1/canonical/generate \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
\"collections\": [\"bp_compliance_datenschutz\"],
|
||||
\"max_chunks\": 2000,
|
||||
\"max_controls\": 500,
|
||||
\"batch_size\": 5,
|
||||
\"skip_web_search\": true,
|
||||
\"regulation_filter\": [\"dsk_oh_telemedien_2022\",\"edpb_gl_7_2020\",\"bverfg_1bvr1547_19_datenanalyse\",\"eugh_c_252_21_meta\",\"eugh_c_300_21_schadenersatz\",\"eugh_c_311_18_schrems_ii\",\"eugh_c_634_21_schufa\",\"eugh_c_673_17_planet49\",\"lg_muc_google_fonts\",\"bgh_art82_2024_218\",\"bgh_i_zr_7_16\",\"bgh_vi_zr_396_24\",\"bvge_2024_iv_2\",\"ogh_6ob102_24d\",\"ogh_6ob70_24y\"]
|
||||
}'"
|
||||
```
|
||||
|
||||
**ACHTUNG:** `regulation_filter` ist PFLICHT um bereits verarbeitete Regulations (bgb_komplett etc.) nicht doppelt zu verarbeiten! Alte Chunks wurden re-ingested → neue Hashes → Pipeline wuerde sie als "unprocessed" sehen.
|
||||
|
||||
### 2. Pass 0b (Anthropic Batch API, ~$50)
|
||||
|
||||
Erst NACH Abschluss aller 3 Generation-Jobs:
|
||||
```bash
|
||||
ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf --max-time 60 -X POST http://127.0.0.1:8002/api/compliance/v1/canonical/generate/submit-pass0b \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{\"limit\": 500, \"batch_size\": 10}'"
|
||||
```
|
||||
|
||||
### 3. Block F Rest (nach Pipeline)
|
||||
|
||||
| Phase | Was | Status |
|
||||
|-------|-----|--------|
|
||||
| F1 | Regulation Registry → DB | ✅ 162 Eintraege |
|
||||
| F2 | ACTION_TYPES + Synonyme → DB | ✅ 34 Types + 145 Synonyme |
|
||||
| F3 | Object Synonyms → DB | ✅ 75 Synonyme |
|
||||
| F4 | LLM Synonym-Enrichment | Ausstehend |
|
||||
| F5 | Validation + Cleanup | Ausstehend |
|
||||
|
||||
---
|
||||
|
||||
## SESSION 02-03.05.2026 ERLEDIGT
|
||||
## SESSION 03-04.05.2026 ERLEDIGT
|
||||
|
||||
- Block D5+: NIST/ENISA PDF-Qualitaet (0%→45%)
|
||||
- Block D6: Citation-Backfill (3.651 Controls)
|
||||
- Block E2: 8 DE-Gesetze (1.629 Chunks)
|
||||
- Block E3: 5 EU-Regulierungen (1.057 Chunks)
|
||||
- Block E4: GoBD, BAIT, VAIT (144 Chunks)
|
||||
- Block E6: 3 CH + 4 AT Gesetze (3.881 Chunks)
|
||||
- Block E7: 9 Urteile als Volltext (709 Chunks total)
|
||||
- Schrems II: 154, BVerfG Datenanalyse: 161, DSK OH Telemedien: 119
|
||||
- Meta: 101, BAG Zeiterfassung: 48, Planet49: 42, SCHUFA: 41
|
||||
- Schadenersatz: 29, Google Fonts: 14
|
||||
- Infra: Qdrant-Snapshot, Upload-before-Delete, 99 Tests
|
||||
### Block F (Hardcoded Knowledge Migration)
|
||||
- F1: regulation_registry Tabelle + 162 Eintraege migriert + 34 Tests
|
||||
- F2: action_types (34) + action_synonyms (145) + Tests
|
||||
- F3: object_synonyms (75) + Tests
|
||||
- Alle 3 mit DB-backed Cache (5min TTL) + Dict-Fallback
|
||||
- 446 Tests pass, 0 Regressionen
|
||||
|
||||
**Gesamt neue Chunks diese Session: ~25.000+**
|
||||
### D5 + E1c Verifizierung
|
||||
- D5 Re-Ingestion: KOMPLETT (419/423 Docs, 4 NIST-"Fehler" = Duplikate von .txt)
|
||||
- E1c BGB: § 312k VORHANDEN, 93% Section-Coverage, 3053 Chunks
|
||||
- Section-Metadata: gesetze=83%, ce=52%, datenschutz=50%
|
||||
|
||||
### Control Generation gestartet
|
||||
- Job 1 (bp_compliance_gesetze, DE/CH/AT-Gesetze) laeuft seit ~00:30
|
||||
- 61 neue regulation_ids identifiziert (nicht in canonical_processed_chunks)
|
||||
- regulation_filter verhindert Doppelverarbeitung von re-ingestierten Dokumenten
|
||||
|
||||
---
|
||||
|
||||
## DB-Tabellen (Block F)
|
||||
|
||||
| Tabelle | Rows | Migration |
|
||||
|---------|------|-----------|
|
||||
| compliance.regulation_registry | 162 | 002_regulation_registry.sql |
|
||||
| compliance.action_types | 34 | 003_action_object_ontology.sql |
|
||||
| compliance.action_synonyms | 145 | 003_action_object_ontology.sql |
|
||||
| compliance.object_synonyms | 75 | 003_action_object_ontology.sql |
|
||||
|
||||
---
|
||||
|
||||
## TESTS
|
||||
|
||||
```bash
|
||||
# Embedding-Service (99 Tests)
|
||||
cd embedding-service && python3 -m pytest test_chunking.py test_d4_bgb.py test_nist_normalization.py -v
|
||||
|
||||
# Control-Pipeline (387 Tests)
|
||||
# Pipeline (446 Tests)
|
||||
PYTHONPATH=control-pipeline python3 -m pytest control-pipeline/tests/ -v
|
||||
|
||||
# Qdrant-Snapshot
|
||||
ssh macmini "cd ~/Projekte/breakpilot-core && bash scripts/qdrant-snapshot.sh"
|
||||
# Embedding-Service (99 Tests)
|
||||
cd embedding-service && python3 -m pytest test_chunking.py test_d4_bgb.py test_nist_normalization.py -v
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## PLAN-DATEI
|
||||
|
||||
Block F Detailplan: `/Users/benjaminadmin/.claude/plans/humming-nibbling-sonnet.md`
|
||||
|
||||
Reference in New Issue
Block a user