8510af46eb
Phase 0: Quality Audit script (Claude Sonnet, 1750 samples) Phase 1: Object ontology expanded 31 → 74 tokens with descriptions + boundaries Phase 2: 174K controls re-classified via Haiku (10 batches, $50) - Generic tokens removed (documentation, procedure, process) - L2 sub-topics added (108K + 64K controls) - Bad subtopics fixed (stakeholder_*, escalation fragments) Phase 3: Re-clustering K=18704 (37K objects → 16.7K groups) Phase 4: Direct MC generation from canonical tokens (gpre2_direct_mc.py) Phase 5: Regulation-source split (gpre3, dry-run tested) New features: - Tenant-isolated document upload API (rag-service) - BAuA crawler (Playwright, 131 PDFs downloaded) - OSHA Technical Manual crawler (23 chapters) - CE obligation extractor (6141 obligations from Qdrant) RAG ingestion: - 126 BAuA PDFs (TRBS/TRGS/ASR): 27,664 chunks - OSHA Technical Manual: 7,241 chunks - OSHA 1910 Subpart O (full): 745 chunks - EuGH C-588/21 P: 216 chunks - EU 2018/1725: 842 chunks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
159 lines
6.2 KiB
Markdown
159 lines
6.2 KiB
Markdown
# Controls nutzen — Anleitung für andere Sessions
|
|
|
|
**Stand:** 2026-05-07, wird laufend aktualisiert
|
|
**Repo:** breakpilot-core (~/Projekte/breakpilot-core)
|
|
|
|
---
|
|
|
|
## Was sind die Controls?
|
|
|
|
174.497 atomare Compliance-Controls in der Datenbank. Jeder Control ist eine **einzelne prüfbare Anforderung** aus einer Rechtsquelle (DSGVO, NIS2, NIST, AI Act, etc.).
|
|
|
|
### Beispiel
|
|
|
|
```
|
|
Control-ID: AUTH-2956-A14
|
|
Titel: "Implementierung von Multi-Faktor-Authentifizierung prüfen"
|
|
Objective: "Sicherstellen, dass MFA korrekt implementiert ist..."
|
|
Merge-Key: "verify:multi_factor_auth:testing"
|
|
Severity: high
|
|
```
|
|
|
|
## Wo liegen die Controls?
|
|
|
|
### Datenbank (PostgreSQL auf Mac Mini)
|
|
|
|
```sql
|
|
-- Alle Controls abfragen
|
|
SELECT id, control_id, title, objective, severity,
|
|
source_citation, -- Rechtsquelle (JSON)
|
|
generation_metadata->>'merge_group_hint' AS merge_key
|
|
FROM compliance.canonical_controls
|
|
WHERE release_state NOT IN ('deprecated', 'rejected');
|
|
```
|
|
|
|
**Verbindung:**
|
|
```bash
|
|
# Vom MacBook:
|
|
ssh macmini "/usr/local/bin/docker exec bp-core-postgres psql -U breakpilot -d breakpilot_db"
|
|
|
|
# Oder via Control-Pipeline Container:
|
|
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline curl -sf http://127.0.0.1:8098/..."
|
|
```
|
|
|
|
### API (Port 8098, nur via Docker exec erreichbar)
|
|
|
|
```bash
|
|
# Master Controls auflisten
|
|
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline \
|
|
curl -sf 'http://127.0.0.1:8098/v1/master-controls?limit=50&sort=total_controls'"
|
|
|
|
# Master Control Detail mit allen Membern
|
|
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline \
|
|
curl -sf 'http://127.0.0.1:8098/v1/master-controls/MC-8292'"
|
|
```
|
|
|
|
## Struktur der Controls
|
|
|
|
### merge_group_hint (Schlüsselfeld!)
|
|
|
|
Jeder Control hat einen `merge_group_hint` im Format `action:object:phase`:
|
|
|
|
```
|
|
implement:encryption:implementation
|
|
define:access_control:definition
|
|
monitor:network_security:monitoring
|
|
report:supervisory_authority:reporting
|
|
```
|
|
|
|
**74 kanonische Object-Tokens** (Stand 2026-05-07):
|
|
|
|
| Kategorie | Tokens |
|
|
|-----------|--------|
|
|
| **Security** | multi_factor_auth, password_policy, credentials, session_management, privileged_access, access_control, encryption, transport_encryption, key_management, certificate_management, network_security, network_segmentation, firewall, vpn, remote_access, monitoring, audit_logging, siem, alerting, compliance_audit, vulnerability, patch_management, backup, disaster_recovery, physical_security, secure_development, api_security, input_validation, container_security, logging_configuration |
|
|
| **Data Protection** | personal_data, sensitive_data, health_data, consent, data_subject_rights, data_retention, data_transfer, data_breach_notification, dpia, data_processing_agreement, privacy_by_design, data_processing_register, data_classification, cookie_consent, video_surveillance |
|
|
| **Governance** | policy, procedure, process, training, awareness, incident, risk_management, third_party_management, change_management, documentation, records_management, compliance_reporting, asset_management, human_resources_security |
|
|
| **Regulatory** | supervisory_authority, certification, product_safety, ai_system, financial_reporting, aml, whistleblowing, consumer_protection, ecommerce, telecommunications, medical_device, payment_services, critical_infrastructure, supply_chain_due_diligence, sustainability_reporting |
|
|
|
|
### Rechtsquellen (source_citation)
|
|
|
|
Die **Parent-Controls** (nicht die atomaren!) haben `source_citation`:
|
|
|
|
```sql
|
|
-- Controls mit Rechtsquelle finden
|
|
SELECT cc.control_id, cc.title,
|
|
pc.source_citation->>'source' AS regulation,
|
|
pc.source_citation->>'article' AS article
|
|
FROM compliance.canonical_controls cc
|
|
JOIN compliance.canonical_controls pc ON pc.id = cc.parent_control_uuid
|
|
WHERE pc.source_citation IS NOT NULL
|
|
AND pc.source_citation->>'source' LIKE '%DSGVO%';
|
|
```
|
|
|
|
148 verschiedene Rechtsquellen (DSGVO, NIS2, NIST, OWASP, BSI, TKG, etc.)
|
|
|
|
## Controls filtern (Use Cases)
|
|
|
|
### Beispiel: Alle DSGVO Art. 13 Controls (für DSI-Prüfung)
|
|
|
|
```sql
|
|
SELECT cc.control_id, cc.title, cc.objective,
|
|
cc.generation_metadata->>'merge_group_hint' AS merge_key,
|
|
pc.source_citation->>'article' AS article
|
|
FROM compliance.canonical_controls cc
|
|
JOIN compliance.canonical_controls pc ON pc.id = cc.parent_control_uuid
|
|
WHERE pc.source_citation->>'source' = 'DSGVO (EU) 2016/679'
|
|
AND pc.source_citation->>'article' LIKE '%13%'
|
|
AND cc.release_state NOT IN ('deprecated', 'rejected')
|
|
ORDER BY cc.control_id;
|
|
```
|
|
|
|
### Beispiel: Alle Encryption-Controls
|
|
|
|
```sql
|
|
SELECT control_id, title, objective
|
|
FROM compliance.canonical_controls
|
|
WHERE generation_metadata->>'merge_group_hint' LIKE '%:encryption:%'
|
|
AND release_state NOT IN ('deprecated', 'rejected');
|
|
```
|
|
|
|
### Beispiel: Controls nach Object-Token filtern
|
|
|
|
```sql
|
|
-- Alle Controls zu einem bestimmten Thema
|
|
SELECT control_id, title,
|
|
generation_metadata->>'merge_group_hint' AS merge_key
|
|
FROM compliance.canonical_controls
|
|
WHERE generation_metadata->>'merge_group_hint' LIKE '%:data_retention:%'
|
|
AND release_state NOT IN ('deprecated', 'rejected');
|
|
```
|
|
|
|
## Wichtige Tabellen
|
|
|
|
| Tabelle | Rows | Beschreibung |
|
|
|---------|------|-------------|
|
|
| `compliance.canonical_controls` | ~294K | Alle Controls (Rich + Atomic) |
|
|
| `compliance.master_controls` | ~5.329 | Gruppierte Master Controls |
|
|
| `compliance.master_control_members` | ~172K | Zuordnung Control → MC |
|
|
| `compliance.object_ontology` | 74 | Kanonische Object-Definitionen |
|
|
| `compliance.regulation_registry` | 223 | Rechtsquellen-Register |
|
|
|
|
## Was gerade passiert (2026-05-07)
|
|
|
|
**Phase 2 läuft:** Alle 174K Controls werden per Claude Haiku re-klassifiziert. Die `merge_group_hint` werden von frei-form LLM-Objekten auf 74 kanonische Tokens normalisiert. Danach:
|
|
- Phase 3: Re-Clustering (gpre1 mit K=20000)
|
|
- Phase 4: Neue Master Controls (gpre2)
|
|
- Phase 5: Regulation-Source-Split (gpre3)
|
|
|
|
**NICHT ÄNDERN:** `canonical_controls`, `master_controls`, `object_ontology` Tabellen werden aktiv bearbeitet.
|
|
|
|
## DB-Zugang Quick Reference
|
|
|
|
```bash
|
|
# Quick Query (eine Zeile)
|
|
ssh macmini "/usr/local/bin/docker exec bp-core-postgres psql -U breakpilot -d breakpilot_db -c \"SELECT count(*) FROM compliance.canonical_controls\""
|
|
|
|
# Interaktive Session
|
|
ssh macmini "/usr/local/bin/docker exec -it bp-core-postgres psql -U breakpilot -d breakpilot_db"
|
|
```
|