e869cabc81
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
128 lines
5.2 KiB
Markdown
128 lines
5.2 KiB
Markdown
# Session-Instruktionen: Control Generation + Block F Rest
|
|
|
|
**Datum:** 2026-05-04
|
|
**Fuer:** Naechste Claude-Session
|
|
**Repo:** breakpilot-core (~/Projekte/breakpilot-core)
|
|
|
|
---
|
|
|
|
## LAUFENDER JOB (vor dieser Session pruefen!)
|
|
|
|
### Control Generation Job `60190756-b660-4b03-869a-fa1076394cca`
|
|
|
|
Gestartet am 04.05.2026 ~00:30. Verarbeitet neue DE/CH/AT-Gesetze aus `bp_compliance_gesetze`.
|
|
|
|
```bash
|
|
# Status pruefen:
|
|
ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf http://127.0.0.1:8002/api/compliance/v1/canonical/generate/status/60190756-b660-4b03-869a-fa1076394cca"
|
|
|
|
# Processed Stats:
|
|
ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf http://127.0.0.1:8002/api/compliance/v1/canonical/generate/processed-stats"
|
|
```
|
|
|
|
**WICHTIG:** API-Zugriff nur ueber Docker exec (nginx-HTTPS-Proxy ist langsam/timeout):
|
|
```bash
|
|
ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf --max-time 60 -X POST http://127.0.0.1:8002/api/compliance/v1/canonical/generate ..."
|
|
```
|
|
|
|
---
|
|
|
|
## NAECHSTE SCHRITTE (Reihenfolge!)
|
|
|
|
### 1. Control Generation fuer verbleibende Collections
|
|
|
|
Nach Abschluss von Job 1 (bp_compliance_gesetze), starten:
|
|
|
|
**Job 2: bp_compliance_ce** (EU-Regulierungen, ~20k Chunks)
|
|
```bash
|
|
ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf --max-time 60 -X POST http://127.0.0.1:8002/api/compliance/v1/canonical/generate \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{
|
|
\"collections\": [\"bp_compliance_ce\"],
|
|
\"max_chunks\": 2000,
|
|
\"max_controls\": 500,
|
|
\"batch_size\": 5,
|
|
\"skip_web_search\": true,
|
|
\"regulation_filter\": [\"dsgvo_2016\",\"nis2_2022\",\"cra_2024\",\"ai_act_2024\",\"dsa_2022\",\"dma_2022\",\"dga_2022\",\"dora_2022\",\"dataact_2023\",\"dpf_2023\",\"dsm_2019\",\"gpsr_2023\",\"eprivacy_2002\",\"ecommerce_2000\",\"machinery_2023\",\"eu_mdr_2017\",\"ifrs_2023\",\"amlr_2024\",\"digital_content_2019\",\"omnibus_2019\",\"csrd_2022\",\"csddd_2024\",\"eu_taxonomy_2020\",\"eidas_2_0_2024\",\"pay_transparency_2023\",\"fda_human_factors\",\"eu_machinery_guide_2006_42\"]
|
|
}'"
|
|
```
|
|
|
|
**Job 3: bp_compliance_datenschutz** (~4k Chunks)
|
|
```bash
|
|
ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf --max-time 60 -X POST http://127.0.0.1:8002/api/compliance/v1/canonical/generate \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{
|
|
\"collections\": [\"bp_compliance_datenschutz\"],
|
|
\"max_chunks\": 2000,
|
|
\"max_controls\": 500,
|
|
\"batch_size\": 5,
|
|
\"skip_web_search\": true,
|
|
\"regulation_filter\": [\"dsk_oh_telemedien_2022\",\"edpb_gl_7_2020\",\"bverfg_1bvr1547_19_datenanalyse\",\"eugh_c_252_21_meta\",\"eugh_c_300_21_schadenersatz\",\"eugh_c_311_18_schrems_ii\",\"eugh_c_634_21_schufa\",\"eugh_c_673_17_planet49\",\"lg_muc_google_fonts\",\"bgh_art82_2024_218\",\"bgh_i_zr_7_16\",\"bgh_vi_zr_396_24\",\"bvge_2024_iv_2\",\"ogh_6ob102_24d\",\"ogh_6ob70_24y\"]
|
|
}'"
|
|
```
|
|
|
|
**ACHTUNG:** `regulation_filter` ist PFLICHT um bereits verarbeitete Regulations (bgb_komplett etc.) nicht doppelt zu verarbeiten! Alte Chunks wurden re-ingested → neue Hashes → Pipeline wuerde sie als "unprocessed" sehen.
|
|
|
|
### 2. Pass 0b (Anthropic Batch API, ~$50)
|
|
|
|
Erst NACH Abschluss aller 3 Generation-Jobs:
|
|
```bash
|
|
ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf --max-time 60 -X POST http://127.0.0.1:8002/api/compliance/v1/canonical/generate/submit-pass0b \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{\"limit\": 500, \"batch_size\": 10}'"
|
|
```
|
|
|
|
### 3. Block F Rest (nach Pipeline)
|
|
|
|
| Phase | Was | Status |
|
|
|-------|-----|--------|
|
|
| F1 | Regulation Registry → DB | ✅ 162 Eintraege |
|
|
| F2 | ACTION_TYPES + Synonyme → DB | ✅ 34 Types + 145 Synonyme |
|
|
| F3 | Object Synonyms → DB | ✅ 75 Synonyme |
|
|
| F4 | LLM Synonym-Enrichment | Ausstehend |
|
|
| F5 | Validation + Cleanup | Ausstehend |
|
|
|
|
---
|
|
|
|
## SESSION 03-04.05.2026 ERLEDIGT
|
|
|
|
### Block F (Hardcoded Knowledge Migration)
|
|
- F1: regulation_registry Tabelle + 162 Eintraege migriert + 34 Tests
|
|
- F2: action_types (34) + action_synonyms (145) + Tests
|
|
- F3: object_synonyms (75) + Tests
|
|
- Alle 3 mit DB-backed Cache (5min TTL) + Dict-Fallback
|
|
- 446 Tests pass, 0 Regressionen
|
|
|
|
### D5 + E1c Verifizierung
|
|
- D5 Re-Ingestion: KOMPLETT (419/423 Docs, 4 NIST-"Fehler" = Duplikate von .txt)
|
|
- E1c BGB: § 312k VORHANDEN, 93% Section-Coverage, 3053 Chunks
|
|
- Section-Metadata: gesetze=83%, ce=52%, datenschutz=50%
|
|
|
|
### Control Generation gestartet
|
|
- Job 1 (bp_compliance_gesetze, DE/CH/AT-Gesetze) laeuft seit ~00:30
|
|
- 61 neue regulation_ids identifiziert (nicht in canonical_processed_chunks)
|
|
- regulation_filter verhindert Doppelverarbeitung von re-ingestierten Dokumenten
|
|
|
|
---
|
|
|
|
## DB-Tabellen (Block F)
|
|
|
|
| Tabelle | Rows | Migration |
|
|
|---------|------|-----------|
|
|
| compliance.regulation_registry | 162 | 002_regulation_registry.sql |
|
|
| compliance.action_types | 34 | 003_action_object_ontology.sql |
|
|
| compliance.action_synonyms | 145 | 003_action_object_ontology.sql |
|
|
| compliance.object_synonyms | 75 | 003_action_object_ontology.sql |
|
|
|
|
---
|
|
|
|
## TESTS
|
|
|
|
```bash
|
|
# Pipeline (446 Tests)
|
|
PYTHONPATH=control-pipeline python3 -m pytest control-pipeline/tests/ -v
|
|
|
|
# Embedding-Service (99 Tests)
|
|
cd embedding-service && python3 -m pytest test_chunking.py test_d4_bgb.py test_nist_normalization.py -v
|
|
```
|