Files
breakpilot-core/control-pipeline/INSTRUCTION-session-handover.md
T
2026-05-11 21:47:22 +02:00

118 lines
4.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Session-Handover: MC Quality + Gap-Analyse + RAG Ingestion
**Datum:** 2026-05-07 bis 2026-05-11 (5 Tage Marathon)
**Repo:** breakpilot-core + breakpilot-compliance
---
## ERLEDIGT
### Master Control Quality Overhaul (Core)
- **74.5% → 92.8% Accuracy** (13.588 MCs, 83.073 Members)
- Phase 0: Quality Audit mit Claude Sonnet ($3)
- Phase 1: Ontologie 31 → 74 Tokens + LLM-Prompt fix
- Phase 2: 174K Controls re-klassifiziert via Haiku (10 Batches, ~$50)
- Phase 2b: Generic Tokens gefixt (documentation/procedure → echte Themen, $7.54)
- Phase 2c: L2 Sub-Topics (2 Runden, 172K Controls, ~$32)
- Phase 2d: Bad Subtopics gefixt (stakeholder_*, $0.50)
- Phase 3: Re-Clustering K=18704
- Phase 4: gpre2 Direct MC (13.588 MCs)
- Phase 6: Golden Dataset (20 Controls) + 8 Quality Tests (alle grün)
- **Production Sync:** MCs + Members + Hints + doc_check_controls
### doc_check_controls (Core → Production)
- **1.874 Controls** über 8 Dokumenttypen (DSE, Cookie, Impressum, AGB, Widerruf, DSFA, AVV, Löschkonzept)
- Jeder mit check_question + pass_criteria + fail_criteria
- Tabelle `compliance.doc_check_controls` lokal + Production
### RAG Ingestion (Core)
- **126 BAuA PDFs** (TRBS/TRGS/ASR): 27.664 Chunks → `bp_compliance_ce`
- **OSHA Technical Manual** (23 Kapitel): 7.241 Chunks → `bp_compliance_ce`
- **OSHA 1910 Subpart O** (Volltext): 745 Chunks
- **EuGH C-588/21 P**: 216 Chunks
- **EU 2018/1725**: 842 Chunks → `bp_compliance`
- **CE-Obligations extrahiert:** 6.141 Obligations → `/tmp/ce_obligations_v2.json`
- Playwright-Crawler für BAuA + OSHA gebaut
### Gap-Analyse Engine (Compliance)
- **12 Regulierungen** automatisch klassifiziert (CRA, AI Act, NIS2, DSGVO, MiCA, PSD2, AML, etc.)
- **IST-Zustand Assessment:** CE-Kennzeichnung, angewandte Normen, bestehende Prozesse, IACE-Projekt-Link
- **Norm→Control Mapping:** 20 Normen → MC-Topic Coverage
- **Prioritäts-Engine:** Severity × Deadline × Dependency
- **5 Branchentemplates:** IoT, Exchange, Cobot, SaaS, Medical
- **Frontend:** 2-Step Wizard (Produkt + IST-Zustand) + Dashboard mit Ampel-Status
- **API:** 8 Endpoints unter `/sdk/v1/gap/`
- **Persistente Projekte:** Speichern + wieder öffnen
- **Getestet:** SmartFactory Gateway → 5 Regulierungen, 500 Gaps
### Tenant Document Upload API (Core)
- `POST/GET/DELETE /api/v1/tenant/documents`
- Tenant-isolierte Qdrant-Collections
- Code fertig, nicht deployed (RAG Service rebuild nötig)
### Master Controls Browser (Compliance)
- **Neue Seite** `/sdk/master-controls` — reused Control Library UI
- Sidebar-Eintrag zwischen Control Library und Provenance
- 13.588 MCs mit allen Filtern, Paginierung, Klick-Detail
- Verbindet sich mit Production-DB
---
## DB-Tabellen (neu/geändert)
| Tabelle | Repo | Rows (lokal) | Rows (Production) |
|---------|------|-------------|-------------------|
| compliance.master_controls | Core | 13.588 | 13.588 |
| compliance.master_control_members | Core | 83.073 | 83.073 |
| compliance.object_ontology | Core | 74 | 74 |
| compliance.object_groups | Core | 16.683 | — |
| compliance.doc_check_controls | Core | 1.874 | 1.874 |
| compliance.gap_projects | Compliance | 1 | 0 |
---
## OFFEN / NÄCHSTE SESSION
1. **Orca Deploy-Fix** — Production deployed nicht automatisch (Webhook + docker pull Problem)
2. **Gap-Analyse v2 IST-Zustand** — Frontend Step 2 deployed, Backend deployed, aber Orca blockiert
3. **Tenant Document Upload** deployen (RAG Service rebuild)
4. **Compliance-Repo auf gitea pushen** — aktuell "Everything up-to-date", Orca muss manuell redeployt werden
5. **MC-Browser erweitern** — Detail-View mit Member-Controls verbessern
---
## BACKUPS (auf MacBook)
| Datei | Inhalt |
|-------|--------|
| `backup_pre_gpre3_20260510.dump` | Vor gpre3 Live-Run (171 MB) |
| `backup_session_end_20260511.dump` | Session-Ende |
| `production_backup_20260508.dump` | Production nach Phase 2 |
| `gpre0_checkpoints_backup_20260508/` | 10 Corrections-JSONs |
---
## API-Kosten (Anthropic)
| Phase | Modell | Kosten |
|-------|--------|--------|
| Phase 0: Quality Audit | Sonnet | $2.92 |
| Phase 0b: Quality Audit v2 | Sonnet | $5.93 |
| Phase 2: 174K Re-Klassifizierung | Haiku | ~$50 |
| Phase 2b: Generic Token Fix | Haiku | $7.54 |
| Phase 2c: Subtopics R1 | Haiku | $20.22 |
| Phase 2c: Subtopics R2 | Haiku | $12.03 |
| Phase 2d: Bad Subtopics | Haiku | ~$0.50 |
| 5K Test-Run | Sonnet | $5.32 |
| doc_check_controls | Haiku | ~$5 |
| **Gesamt** | | **~$110** |
---
## STRATEGISCHE ENTSCHEIDUNGEN (in Memory)
1. **3 Use Cases:** Gap-Analyse (Prio 1), Vendor Risk (Prio 2), Web3/Crypto als Vertikal (Prio 3)
2. **Keine Norm-Reproduktion:** Obligation Extraction statt ISO-Texte (juristisch sicher)
3. **Regulatory Ingestion Engine:** BAuA/OSHA Crawler als Vorlage für automatisierte Source-Feeds
4. **CE-Compliance Crossover:** IACE × Master Controls für Trigger-basierte Compliance-Hinweise