Benjamin Admin 8a44e67293 feat(compliance-check): unlock all 1874 MCs + close gap-table items
User: 'wir haben 1800 MCs erstellt um sie zu 10% zu nutzen — das ist
Schwachsinn'. Fixed all 6 gaps from the audit.

#1 max_controls=0 (was 20):
- agent_compliance_check_routes _check_single: passes max_controls=0 to
  check_document_with_controls -> ALL MCs evaluated per doc_type.
- 8 doc_types now use 1874 MCs instead of 160 (10x coverage).
- Regex matching is cheap (<1s per doc); LLM-enrich cap of 10 stays.

#2 LLM-verify fixed:
- llm_verify.py was getting 0/N parsed. Causes: qwen3 thinking-mode
  wrapped output in <think>...</think>, /api/generate doesn't enforce
  JSON, prompt didn't handle code-fence wrappers.
- Now uses /api/chat with format='json' (forces valid JSON).
- _parse_batch_response strips <think> tags, accepts {results:[...]}
  AND bare [...], adds richer regex-fallback parse, logs raw head on
  total parse failure for diagnosis.

#3 Loeschkonzept checklist (new):
- doc_checks/loeschkonzept_checks.py — 9 L1 + 7 L2 checks per DIN 66398
  + Art. 5(1)(e)/17/32 DSGVO: scope+responsibility, data categories,
  retention periods, legal basis refs (HGB/AO/BGB), deletion trigger,
  deletion process+technical+systems, deletion proof, exceptions +
  Art. 18 lock, review cycle, DSGVO references.
- runner.py registered for loeschkonzept/loeschung/loeschfristen.

#4 regulation backfill script:
- backend-compliance/scripts/backfill_mc_regulation.py — regex-detects
  DSGVO/TDDDG/TMG/BGB/HGB/AO/MStV/UWG/VSBG/PAngV/GwG/BDSG/EU-VO
  references in MC title+question+pass_criteria, UPDATEs regulation +
  article fields.
- Idempotent (only NULL rows), --dry-run flag, batched 200/UPDATE.
- Run inside container: docker exec bp-compliance-backend python3 \
    /app/scripts/backfill_mc_regulation.py

#5 MC alias-fallback:
- rag_document_checker._MC_ALIAS_FALLBACK maps doc_types without own
  MCs to a related set: nutzungsbedingungen->agb, social_media->dse,
  sub_processor/scc/tom_annex->avv, loeschfristen->loeschkonzept,
  eu_institution/dsb->dse.
- _load_controls retries with the alias when the primary query
  returns 0 rows.
- 14 additional doc_types now get MC coverage transparently.

#6 cross-domain auto-discovery:
- _autodiscover_missing builds a crawl plan: primary submitted base
  + up to 2 related domains sharing the owner SLD (e.g. BMW Group:
  bmw.de + bmwgroup.com + bmwgroup.jobs).
- Detection: regex over submitted texts for https?://...<owner>...
  hostnames distinct from the primary base.
- Each crawled base contributes documents + cmp_payloads to the
  discovery pool.

Net effect for BMW: 1874 MCs evaluated (90 from cookie alone, was
20), Loeschkonzept Pflichtangaben benoten-bar, LLM overturns false
regex FAILs, Joint-Controller policies on bmwgroup.jobs (Social
Media) jetzt entdeckbar. Same wins will apply to CRA-Compliance check.
2026-05-17 13:07:50 +02:00

breakpilot-compliance

DSGVO/AI-Act compliance platform — 10 services, Go · Python · TypeScript

CI Go Python Node.js TypeScript FastAPI DSGVO AI Act LOC guard Services


Overview

breakpilot-compliance is a multi-tenant DSGVO/EU AI Act compliance platform that provides an SDK for consent management, data subject requests (DSR), audit logging, iACE impact assessments, and document archival. It ships as 10 containerised services covering an admin dashboard, a developer portal, a Python/FastAPI backend, a Go AI compliance engine, TTS, and a decentralised document store on IPFS. Every service is deployed automatically via Gitea Actions → Orca on every push to main.


Architecture

Service Tech Port Container
admin-compliance Next.js 15 3007 bp-compliance-admin
backend-compliance Python / FastAPI 0.123 8002 bp-compliance-backend
ai-compliance-sdk Go 1.24 / Gin 8093 bp-compliance-ai-sdk
developer-portal Next.js 15 3006 bp-compliance-developer-portal
breakpilot-compliance-sdk TypeScript SDK (React/Vue/Angular/vanilla)
consent-sdk JS/TS Consent SDK
compliance-tts-service Python / Piper TTS 8095 bp-compliance-tts
document-crawler Python / FastAPI 8098 bp-compliance-document-crawler
dsms-gateway Python / FastAPI / IPFS 8082 bp-compliance-dsms-gateway
dsms-node IPFS Kubo v0.24.0 bp-compliance-dsms-node

All containers share the external breakpilot-network Docker network and depend on breakpilot-core (Valkey, Vault, RAG service, Nginx reverse proxy).


Quick Start

Prerequisites: Docker, Go 1.24+, Python 3.12+, Node.js 20+

git clone ssh://git@gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-compliance.git
cd breakpilot-compliance

# Copy and populate secrets (never commit .env)
cp .env.example .env

# Start all services
docker compose up -d

For the Orca/Hetzner production target (x86_64), use the override:

docker compose -f docker-compose.yml -f docker-compose.hetzner.yml up -d

Development Workflow

Use feature branches off main. Supported prefixes: feat/, feature/, hotfix/.

git checkout main && git pull origin main
git checkout -b feat/my-change
# ... make changes ...
git push origin feat/my-change
# Open a PR → squash merge to main

Push to main triggers:

  1. Gitea Actions — lint → test → validate (see CI Pipeline below)
  2. Orca — automatic build + deploy (~3 min total)

Monitor status: https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance/actions


CI Pipeline

Defined in .gitea/workflows/ci.yaml.

Job What it checks
loc-budget All source files ≤ 500 LOC; soft target 300
guardrail-integrity Commits touching guardrail files carry [guardrail-change]
go-lint golangci-lint on ai-compliance-sdk/
python-lint ruff + mypy on Python services
nodejs-lint tsc --noEmit + ESLint on Next.js services
test-go-ai-compliance go test ./... in ai-compliance-sdk/
test-python-backend-compliance pytest in backend-compliance/
test-python-document-crawler pytest in document-crawler/
test-python-dsms-gateway pytest test_main.py in dsms-gateway/
sbom-scan License + vulnerability scan via syft + grype
validate-canonical-controls OpenAPI contract baseline diff

File Budget

Limit Value How to check
Soft target 300 LOC bash scripts/check-loc.sh
Hard cap 500 LOC Same; also enforced by PreToolUse hook + git pre-commit + CI
Exceptions .claude/rules/loc-exceptions.txt Require written rationale + [guardrail-change] commit marker

The .claude/settings.json PreToolUse hook blocks Claude Code from writing or editing files that would exceed the hard cap. The git pre-commit hook re-checks. CI is the final gate.


URL
Admin dashboard https://admin-dev.breakpilot.ai
Developer portal https://developers-dev.breakpilot.ai
Backend API https://api-dev.breakpilot.ai
AI SDK API https://sdk-dev.breakpilot.ai
Gitea repo https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance
Gitea Actions https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance/actions
S
Description
No description provided
Readme 25 MiB
Languages
TypeScript 43.1%
Python 30.8%
Go 23.5%
Shell 1.2%
PLpgSQL 0.8%
Other 0.3%