Smoke gegen www.elli.eco hat 3 Bugs offengelegt, die in den
synthetischen Tests nicht greifbar waren — Real-Texte haben
Abkürzungen, HTML-Stripping-Artefakte, andere Formulierungen.
B9 Multi-Entity-Impressum — vorher: 13 "Entities" statt 2.
- Block-Boundary jetzt HRB-Anker-basiert (jeder HRB-Eintrag
markiert eine Entity). Robuster als Legal-Form-Anker, der bei
"Programmierung der Webseite Acme GmbH" über-matchte.
- _NAME_BLOCKLIST gegen 11 typische False-Positives
(programmierung, webseite, umsatzsteueridentifik, ...).
- _LEADING_NOISE_RE strippt Email-TLD-Artefakte ("eco "),
deutsche Artikel ("Die "), URL-Fragmente.
- _USTID_PAT fängt jetzt auch die Vollform
("Umsatzsteueridentifikationsnummer der … ist DE…") über eine
zweite Pattern-Alternative mit [\s\S]{0,80}? Bridge.
- Dedup gleicher Entity-Namen — Mehrfacherwähnung in einem Doc
zählt als EINE Entity.
- Fallback auf alten Legal-Form-Anker wenn keine HRBs vorhanden
(z.B. e.V. ohne HR-Pflicht).
B14 Retention-Conflict — Anchor-Liste erweitert:
- "protokolldat" / "protokollierung der zugriffe" /
"zugriffsdat" / "zugriffsprotokoll" als zusätzliche
Logfile-Anchors (Elli's reale DSE-Wortwahl statt "Logfile").
B15 AI-Legal-Basis — kein Code-Fix. Elli's aktuelle DSE enthält
keine LLM-Provider-Erwähnung mehr; der GT-Anker (2026-06-06) ist
seither veraltet. 0 Findings ist korrekt für den aktuellen Stand.
Tests: 3 neue Real-World-Regression-Tests in
test_impressum_multi_entity_check.py::TestRealWorldElliPattern.
Combined: 75/75 grün.
Real-World-Smoke gegen Elli (HTTP→Text via crude strip):
B9: Entities 13→2 ✓, IMPRESSUM-MULTI-UST_ID → VW ✓
B13: 1 Finding (b2c_strong) ✓
B14: 0 (Elli hat aktuell nur EINEN Retention-Wert für Logs)
B15: 0 (LLM nicht erwähnt, korrekt)
B16: 3 Findings (impressum/dse/cookie Standard-Slug-Brüche) ✓
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
breakpilot-compliance
DSGVO/AI-Act compliance platform — 10 services, Go · Python · TypeScript
Overview
breakpilot-compliance is a multi-tenant DSGVO/EU AI Act compliance platform that provides an SDK for consent management, data subject requests (DSR), audit logging, iACE impact assessments, and document archival. It ships as 10 containerised services covering an admin dashboard, a developer portal, a Python/FastAPI backend, a Go AI compliance engine, TTS, and a decentralised document store on IPFS. Every service is deployed automatically via Gitea Actions → Orca on every push to main.
Architecture
| Service | Tech | Port | Container |
|---|---|---|---|
| admin-compliance | Next.js 15 | 3007 | bp-compliance-admin |
| backend-compliance | Python / FastAPI 0.123 | 8002 | bp-compliance-backend |
| ai-compliance-sdk | Go 1.24 / Gin | 8093 | bp-compliance-ai-sdk |
| developer-portal | Next.js 15 | 3006 | bp-compliance-developer-portal |
| breakpilot-compliance-sdk | TypeScript SDK (React/Vue/Angular/vanilla) | — | — |
| consent-sdk | JS/TS Consent SDK | — | — |
| compliance-tts-service | Python / Piper TTS | 8095 | bp-compliance-tts |
| document-crawler | Python / FastAPI | 8098 | bp-compliance-document-crawler |
| dsms-gateway | Python / FastAPI / IPFS | 8082 | bp-compliance-dsms-gateway |
| dsms-node | IPFS Kubo v0.24.0 | — | bp-compliance-dsms-node |
All containers share the external breakpilot-network Docker network and depend on breakpilot-core (Valkey, Vault, RAG service, Nginx reverse proxy).
Quick Start
Prerequisites: Docker, Go 1.24+, Python 3.12+, Node.js 20+
git clone ssh://git@gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-compliance.git
cd breakpilot-compliance
# Copy and populate secrets (never commit .env)
cp .env.example .env
# Start all services
docker compose up -d
For the Orca/Hetzner production target (x86_64), use the override:
docker compose -f docker-compose.yml -f docker-compose.hetzner.yml up -d
Development Workflow
Use feature branches off main. Supported prefixes: feat/, feature/, hotfix/.
git checkout main && git pull origin main
git checkout -b feat/my-change
# ... make changes ...
git push origin feat/my-change
# Open a PR → squash merge to main
Push to main triggers:
- Gitea Actions — lint → test → validate (see CI Pipeline below)
- Orca — automatic build + deploy (~3 min total)
Monitor status: https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance/actions
CI Pipeline
Defined in .gitea/workflows/ci.yaml.
| Job | What it checks |
|---|---|
loc-budget |
All source files ≤ 500 LOC; soft target 300 |
guardrail-integrity |
Commits touching guardrail files carry [guardrail-change] |
go-lint |
golangci-lint on ai-compliance-sdk/ |
python-lint |
ruff + mypy on Python services |
nodejs-lint |
tsc --noEmit + ESLint on Next.js services |
test-go-ai-compliance |
go test ./... in ai-compliance-sdk/ |
test-python-backend-compliance |
pytest in backend-compliance/ |
test-python-document-crawler |
pytest in document-crawler/ |
test-python-dsms-gateway |
pytest test_main.py in dsms-gateway/ |
sbom-scan |
License + vulnerability scan via syft + grype |
validate-canonical-controls |
OpenAPI contract baseline diff |
File Budget
| Limit | Value | How to check |
|---|---|---|
| Soft target | 300 LOC | bash scripts/check-loc.sh |
| Hard cap | 500 LOC | Same; also enforced by PreToolUse hook + git pre-commit + CI |
| Exceptions | .claude/rules/loc-exceptions.txt |
Require written rationale + [guardrail-change] commit marker |
The .claude/settings.json PreToolUse hook blocks Claude Code from writing or editing files that would exceed the hard cap. The git pre-commit hook re-checks. CI is the final gate.
Links
| URL | |
|---|---|
| Admin dashboard | https://admin-dev.breakpilot.ai |
| Developer portal | https://developers-dev.breakpilot.ai |
| Backend API | https://api-dev.breakpilot.ai |
| AI SDK API | https://sdk-dev.breakpilot.ai |
| Gitea repo | https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance |
| Gitea Actions | https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance/actions |