Files
breakpilot-core/control-pipeline/INSTRUCTION-controls-fuer-andere-sessions.md
T
Benjamin Admin 8510af46eb feat(pipeline): MC Quality Overhaul — 74.5% → 92.8% accuracy, 5.3K → 13.6K MCs
Phase 0: Quality Audit script (Claude Sonnet, 1750 samples)
Phase 1: Object ontology expanded 31 → 74 tokens with descriptions + boundaries
Phase 2: 174K controls re-classified via Haiku (10 batches, $50)
  - Generic tokens removed (documentation, procedure, process)
  - L2 sub-topics added (108K + 64K controls)
  - Bad subtopics fixed (stakeholder_*, escalation fragments)
Phase 3: Re-clustering K=18704 (37K objects → 16.7K groups)
Phase 4: Direct MC generation from canonical tokens (gpre2_direct_mc.py)
Phase 5: Regulation-source split (gpre3, dry-run tested)

New features:
- Tenant-isolated document upload API (rag-service)
- BAuA crawler (Playwright, 131 PDFs downloaded)
- OSHA Technical Manual crawler (23 chapters)
- CE obligation extractor (6141 obligations from Qdrant)

RAG ingestion:
- 126 BAuA PDFs (TRBS/TRGS/ASR): 27,664 chunks
- OSHA Technical Manual: 7,241 chunks
- OSHA 1910 Subpart O (full): 745 chunks
- EuGH C-588/21 P: 216 chunks
- EU 2018/1725: 842 chunks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-10 15:08:15 +02:00

6.2 KiB

Controls nutzen — Anleitung für andere Sessions

Stand: 2026-05-07, wird laufend aktualisiert Repo: breakpilot-core (~/Projekte/breakpilot-core)


Was sind die Controls?

174.497 atomare Compliance-Controls in der Datenbank. Jeder Control ist eine einzelne prüfbare Anforderung aus einer Rechtsquelle (DSGVO, NIS2, NIST, AI Act, etc.).

Beispiel

Control-ID:  AUTH-2956-A14
Titel:       "Implementierung von Multi-Faktor-Authentifizierung prüfen"
Objective:   "Sicherstellen, dass MFA korrekt implementiert ist..."
Merge-Key:   "verify:multi_factor_auth:testing"
Severity:    high

Wo liegen die Controls?

Datenbank (PostgreSQL auf Mac Mini)

-- Alle Controls abfragen
SELECT id, control_id, title, objective, severity,
       source_citation,           -- Rechtsquelle (JSON)
       generation_metadata->>'merge_group_hint' AS merge_key
FROM compliance.canonical_controls
WHERE release_state NOT IN ('deprecated', 'rejected');

Verbindung:

# Vom MacBook:
ssh macmini "/usr/local/bin/docker exec bp-core-postgres psql -U breakpilot -d breakpilot_db"

# Oder via Control-Pipeline Container:
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline curl -sf http://127.0.0.1:8098/..."

API (Port 8098, nur via Docker exec erreichbar)

# Master Controls auflisten
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline \
  curl -sf 'http://127.0.0.1:8098/v1/master-controls?limit=50&sort=total_controls'"

# Master Control Detail mit allen Membern
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline \
  curl -sf 'http://127.0.0.1:8098/v1/master-controls/MC-8292'"

Struktur der Controls

merge_group_hint (Schlüsselfeld!)

Jeder Control hat einen merge_group_hint im Format action:object:phase:

implement:encryption:implementation
define:access_control:definition
monitor:network_security:monitoring
report:supervisory_authority:reporting

74 kanonische Object-Tokens (Stand 2026-05-07):

Kategorie Tokens
Security multi_factor_auth, password_policy, credentials, session_management, privileged_access, access_control, encryption, transport_encryption, key_management, certificate_management, network_security, network_segmentation, firewall, vpn, remote_access, monitoring, audit_logging, siem, alerting, compliance_audit, vulnerability, patch_management, backup, disaster_recovery, physical_security, secure_development, api_security, input_validation, container_security, logging_configuration
Data Protection personal_data, sensitive_data, health_data, consent, data_subject_rights, data_retention, data_transfer, data_breach_notification, dpia, data_processing_agreement, privacy_by_design, data_processing_register, data_classification, cookie_consent, video_surveillance
Governance policy, procedure, process, training, awareness, incident, risk_management, third_party_management, change_management, documentation, records_management, compliance_reporting, asset_management, human_resources_security
Regulatory supervisory_authority, certification, product_safety, ai_system, financial_reporting, aml, whistleblowing, consumer_protection, ecommerce, telecommunications, medical_device, payment_services, critical_infrastructure, supply_chain_due_diligence, sustainability_reporting

Rechtsquellen (source_citation)

Die Parent-Controls (nicht die atomaren!) haben source_citation:

-- Controls mit Rechtsquelle finden
SELECT cc.control_id, cc.title,
       pc.source_citation->>'source' AS regulation,
       pc.source_citation->>'article' AS article
FROM compliance.canonical_controls cc
JOIN compliance.canonical_controls pc ON pc.id = cc.parent_control_uuid
WHERE pc.source_citation IS NOT NULL
  AND pc.source_citation->>'source' LIKE '%DSGVO%';

148 verschiedene Rechtsquellen (DSGVO, NIS2, NIST, OWASP, BSI, TKG, etc.)

Controls filtern (Use Cases)

Beispiel: Alle DSGVO Art. 13 Controls (für DSI-Prüfung)

SELECT cc.control_id, cc.title, cc.objective,
       cc.generation_metadata->>'merge_group_hint' AS merge_key,
       pc.source_citation->>'article' AS article
FROM compliance.canonical_controls cc
JOIN compliance.canonical_controls pc ON pc.id = cc.parent_control_uuid
WHERE pc.source_citation->>'source' = 'DSGVO (EU) 2016/679'
  AND pc.source_citation->>'article' LIKE '%13%'
  AND cc.release_state NOT IN ('deprecated', 'rejected')
ORDER BY cc.control_id;

Beispiel: Alle Encryption-Controls

SELECT control_id, title, objective
FROM compliance.canonical_controls
WHERE generation_metadata->>'merge_group_hint' LIKE '%:encryption:%'
  AND release_state NOT IN ('deprecated', 'rejected');

Beispiel: Controls nach Object-Token filtern

-- Alle Controls zu einem bestimmten Thema
SELECT control_id, title,
       generation_metadata->>'merge_group_hint' AS merge_key
FROM compliance.canonical_controls
WHERE generation_metadata->>'merge_group_hint' LIKE '%:data_retention:%'
  AND release_state NOT IN ('deprecated', 'rejected');

Wichtige Tabellen

Tabelle Rows Beschreibung
compliance.canonical_controls ~294K Alle Controls (Rich + Atomic)
compliance.master_controls ~5.329 Gruppierte Master Controls
compliance.master_control_members ~172K Zuordnung Control → MC
compliance.object_ontology 74 Kanonische Object-Definitionen
compliance.regulation_registry 223 Rechtsquellen-Register

Was gerade passiert (2026-05-07)

Phase 2 läuft: Alle 174K Controls werden per Claude Haiku re-klassifiziert. Die merge_group_hint werden von frei-form LLM-Objekten auf 74 kanonische Tokens normalisiert. Danach:

  • Phase 3: Re-Clustering (gpre1 mit K=20000)
  • Phase 4: Neue Master Controls (gpre2)
  • Phase 5: Regulation-Source-Split (gpre3)

NICHT ÄNDERN: canonical_controls, master_controls, object_ontology Tabellen werden aktiv bearbeitet.

DB-Zugang Quick Reference

# Quick Query (eine Zeile)
ssh macmini "/usr/local/bin/docker exec bp-core-postgres psql -U breakpilot -d breakpilot_db -c \"SELECT count(*) FROM compliance.canonical_controls\""

# Interaktive Session
ssh macmini "/usr/local/bin/docker exec -it bp-core-postgres psql -U breakpilot -d breakpilot_db"