Phase 0: Quality Audit script (Claude Sonnet, 1750 samples) Phase 1: Object ontology expanded 31 → 74 tokens with descriptions + boundaries Phase 2: 174K controls re-classified via Haiku (10 batches, $50) - Generic tokens removed (documentation, procedure, process) - L2 sub-topics added (108K + 64K controls) - Bad subtopics fixed (stakeholder_*, escalation fragments) Phase 3: Re-clustering K=18704 (37K objects → 16.7K groups) Phase 4: Direct MC generation from canonical tokens (gpre2_direct_mc.py) Phase 5: Regulation-source split (gpre3, dry-run tested) New features: - Tenant-isolated document upload API (rag-service) - BAuA crawler (Playwright, 131 PDFs downloaded) - OSHA Technical Manual crawler (23 chapters) - CE obligation extractor (6141 obligations from Qdrant) RAG ingestion: - 126 BAuA PDFs (TRBS/TRGS/ASR): 27,664 chunks - OSHA Technical Manual: 7,241 chunks - OSHA 1910 Subpart O (full): 745 chunks - EuGH C-588/21 P: 216 chunks - EU 2018/1725: 842 chunks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.2 KiB
Controls nutzen — Anleitung für andere Sessions
Stand: 2026-05-07, wird laufend aktualisiert Repo: breakpilot-core (~/Projekte/breakpilot-core)
Was sind die Controls?
174.497 atomare Compliance-Controls in der Datenbank. Jeder Control ist eine einzelne prüfbare Anforderung aus einer Rechtsquelle (DSGVO, NIS2, NIST, AI Act, etc.).
Beispiel
Control-ID: AUTH-2956-A14
Titel: "Implementierung von Multi-Faktor-Authentifizierung prüfen"
Objective: "Sicherstellen, dass MFA korrekt implementiert ist..."
Merge-Key: "verify:multi_factor_auth:testing"
Severity: high
Wo liegen die Controls?
Datenbank (PostgreSQL auf Mac Mini)
-- Alle Controls abfragen
SELECT id, control_id, title, objective, severity,
source_citation, -- Rechtsquelle (JSON)
generation_metadata->>'merge_group_hint' AS merge_key
FROM compliance.canonical_controls
WHERE release_state NOT IN ('deprecated', 'rejected');
Verbindung:
# Vom MacBook:
ssh macmini "/usr/local/bin/docker exec bp-core-postgres psql -U breakpilot -d breakpilot_db"
# Oder via Control-Pipeline Container:
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline curl -sf http://127.0.0.1:8098/..."
API (Port 8098, nur via Docker exec erreichbar)
# Master Controls auflisten
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline \
curl -sf 'http://127.0.0.1:8098/v1/master-controls?limit=50&sort=total_controls'"
# Master Control Detail mit allen Membern
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline \
curl -sf 'http://127.0.0.1:8098/v1/master-controls/MC-8292'"
Struktur der Controls
merge_group_hint (Schlüsselfeld!)
Jeder Control hat einen merge_group_hint im Format action:object:phase:
implement:encryption:implementation
define:access_control:definition
monitor:network_security:monitoring
report:supervisory_authority:reporting
74 kanonische Object-Tokens (Stand 2026-05-07):
| Kategorie | Tokens |
|---|---|
| Security | multi_factor_auth, password_policy, credentials, session_management, privileged_access, access_control, encryption, transport_encryption, key_management, certificate_management, network_security, network_segmentation, firewall, vpn, remote_access, monitoring, audit_logging, siem, alerting, compliance_audit, vulnerability, patch_management, backup, disaster_recovery, physical_security, secure_development, api_security, input_validation, container_security, logging_configuration |
| Data Protection | personal_data, sensitive_data, health_data, consent, data_subject_rights, data_retention, data_transfer, data_breach_notification, dpia, data_processing_agreement, privacy_by_design, data_processing_register, data_classification, cookie_consent, video_surveillance |
| Governance | policy, procedure, process, training, awareness, incident, risk_management, third_party_management, change_management, documentation, records_management, compliance_reporting, asset_management, human_resources_security |
| Regulatory | supervisory_authority, certification, product_safety, ai_system, financial_reporting, aml, whistleblowing, consumer_protection, ecommerce, telecommunications, medical_device, payment_services, critical_infrastructure, supply_chain_due_diligence, sustainability_reporting |
Rechtsquellen (source_citation)
Die Parent-Controls (nicht die atomaren!) haben source_citation:
-- Controls mit Rechtsquelle finden
SELECT cc.control_id, cc.title,
pc.source_citation->>'source' AS regulation,
pc.source_citation->>'article' AS article
FROM compliance.canonical_controls cc
JOIN compliance.canonical_controls pc ON pc.id = cc.parent_control_uuid
WHERE pc.source_citation IS NOT NULL
AND pc.source_citation->>'source' LIKE '%DSGVO%';
148 verschiedene Rechtsquellen (DSGVO, NIS2, NIST, OWASP, BSI, TKG, etc.)
Controls filtern (Use Cases)
Beispiel: Alle DSGVO Art. 13 Controls (für DSI-Prüfung)
SELECT cc.control_id, cc.title, cc.objective,
cc.generation_metadata->>'merge_group_hint' AS merge_key,
pc.source_citation->>'article' AS article
FROM compliance.canonical_controls cc
JOIN compliance.canonical_controls pc ON pc.id = cc.parent_control_uuid
WHERE pc.source_citation->>'source' = 'DSGVO (EU) 2016/679'
AND pc.source_citation->>'article' LIKE '%13%'
AND cc.release_state NOT IN ('deprecated', 'rejected')
ORDER BY cc.control_id;
Beispiel: Alle Encryption-Controls
SELECT control_id, title, objective
FROM compliance.canonical_controls
WHERE generation_metadata->>'merge_group_hint' LIKE '%:encryption:%'
AND release_state NOT IN ('deprecated', 'rejected');
Beispiel: Controls nach Object-Token filtern
-- Alle Controls zu einem bestimmten Thema
SELECT control_id, title,
generation_metadata->>'merge_group_hint' AS merge_key
FROM compliance.canonical_controls
WHERE generation_metadata->>'merge_group_hint' LIKE '%:data_retention:%'
AND release_state NOT IN ('deprecated', 'rejected');
Wichtige Tabellen
| Tabelle | Rows | Beschreibung |
|---|---|---|
compliance.canonical_controls |
~294K | Alle Controls (Rich + Atomic) |
compliance.master_controls |
~5.329 | Gruppierte Master Controls |
compliance.master_control_members |
~172K | Zuordnung Control → MC |
compliance.object_ontology |
74 | Kanonische Object-Definitionen |
compliance.regulation_registry |
223 | Rechtsquellen-Register |
Was gerade passiert (2026-05-07)
Phase 2 läuft: Alle 174K Controls werden per Claude Haiku re-klassifiziert. Die merge_group_hint werden von frei-form LLM-Objekten auf 74 kanonische Tokens normalisiert. Danach:
- Phase 3: Re-Clustering (gpre1 mit K=20000)
- Phase 4: Neue Master Controls (gpre2)
- Phase 5: Regulation-Source-Split (gpre3)
NICHT ÄNDERN: canonical_controls, master_controls, object_ontology Tabellen werden aktiv bearbeitet.
DB-Zugang Quick Reference
# Quick Query (eine Zeile)
ssh macmini "/usr/local/bin/docker exec bp-core-postgres psql -U breakpilot -d breakpilot_db -c \"SELECT count(*) FROM compliance.canonical_controls\""
# Interaktive Session
ssh macmini "/usr/local/bin/docker exec -it bp-core-postgres psql -U breakpilot -d breakpilot_db"