Benjamin Admin 662327e8b4
CI / nodejs-build (push) Successful in 2m47s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
feat(compliance-check): MC-Classification + Embedding + Vendor-Redundanz + Action-Recipes + Borlabs-Features
Massiv-Update auf Basis BMW-Test-Iterationen (v1→v9):

Core Compliance-Check
- Sonnet check_type Klassifikation: text/process/review fuer alle 1874 MCs
  in compliance.doc_check_controls (script + Sidecar /data/mc_classification.db).
  rag_document_checker filtert auf check_type='text' fuer doc_check.
  Plus fits_doc_type-Audit (v2) + ui_only-Audit fuer DSA/E-Commerce-MCs in
  falscher doc_type-Schublade.
- scope_requires-Filter: biometric/ai_decision/child_targeting MCs werden
  per business_profile gefiltert (FRT skipped fuer BMW etc.).
- Embedding-Match (BGE-M3) als Phase-3 nach Regex-Match:
  Per-doc_type-Threshold-Override (impressum 0.50, dse/cookie 0.60),
  Short-Field-Rescue (15-Wort-Chunks) fuer Pflichtfelder im Impressum.
  Title+check_question als Embedding-Input fuer mehr Kontext.
- Cookie-Text-Routing: consent-tester gibt cmp_cookie_text aus dem
  CMP-Reconstruct zurueck, Backend bevorzugt das gegen DOM-Extraction
  wenn richer (BMW 1824 vs 600 Worte).

Vendor-Redundanz + EU-Alternativen + Cost-Saving
- vendor_redundancy.analyze() — funktionale Kategorisierung der CMP-Vendors,
  Detektion von Mehrfach-Anbietern pro Kategorie, EU-Alternative-Lookup
  (Matomo, IONOS, HERE, Friendly Captcha, Smart AdServer, ...).
- vendor_cost_estimator: Tier-Inferenz aus Cookie-Footprint (Cookie-Anzahl
  + Premium-Feature-Cookies + Third-Party-Quote → starter/professional/
  enterprise/premier).
- Self-Service-Werbung (Google/Meta/Pinterest/...) = 0 Lizenz-Kosten
  (nur Media-Spend, separat). DSP-Plattformen behalten enge Range.
- Tier-aware Saving-Range: bei Enterprise/Premier nutzen wir den
  oberen 40-100%-Band der Listpreise, nicht starter→premier.
- Multi-Function-Tools (Matomo Pro, SAP CX, IONOS Cloud, Userlike, Smart
  AdServer, HERE Maps, Vimeo Pro, LamaPoll) — ein Tool ersetzt mehrere
  Kategorien gleichzeitig.

Cookie-Wissens-DB + Funktionale Klassifikation
- cookie_knowledge_db: 50 kuratierte Top-Cookies (Google/Meta/Adobe/MS/...)
  mit vendor, exact_purpose, data_collected, IAB-TCF-IDs, reid_risk,
  schrems_ii_status, EuGH-Urteile, EU-Alternative.
- cookie_function_classifier: pro Cookie funktionale Rolle (tracking_id,
  ad_pixel, session_id, ab_test, csrf, ...) + blocking_impact.

Country-Inferenz aus Rechtsform
- cookie_link_validator: Country-Field wird aus Vendor-Name abgeleitet
  (A/S=DK, GmbH=DE, Inc=US, B.V.=NL, ...) plus Vendor-Lookup-Table.
  Reduziert false-positive no_country-Flags bei eindeutig-EU-Vendors
  (Adform DK, Pinterest IE).

Action-Recipes + Doc-Anchor-Locator
- finding_action_recipes: pro Finding-Typ (no_cookies_listed, no_country,
  broken_opt_out, "Auftragsverarbeiter erwaehnen", "Art. 22 Profiling",
  ...) eine strukturierte Anweisung mit what/why/fix_text/where/example.
  Zum 1:1-Einfuegen in Kunden-Dokumente.
- doc_anchor_locator: Embedding-basiert (BGE-M3 cosine) — sucht den
  passenden Absatz im existierenden Kundendokument fuer jeden Finding.
  Per-Run Thread-Local-Cache. Fallback: keyword-Match.
- Email-Rendering integriert Recipe + Anchor pro Doc-Pruefungs-Fail
  + Vendor-Flag-Liste mit aufklappbarer Action-Liste.
- Score-Erklaerung pro Vendor-Zeile (3/5-Untertitel + Tooltip).

Migration-Pipeline (Compliance-Check -> Customer Banner/Documents)
- migration_to_banner.py: Vendor-Liste -> CookieBannerConfig mit
  4 Kategorien + Review-Flags.
- migration_to_document.py: Vendor-Liste -> Cookie-Policy + VVT-Register
  + Privacy-Policy-Pre-Fills.
- agent_migration_routes: 3 Preview-Endpoints (banner-preview,
  document-preview, summary). Persistierung der cmp_vendors in
  /data/compliance_audits.db check_payloads-Tabelle.

Borlabs-Parity Cookie-Banner-Features
- Consent-Historie im Banner: window.bpShowConsentHistory() + localStorage.
- Content-Blocker: cookie-banner-content-blocker.ts — YouTube/Maps/Video
  Placeholder bis Einwilligung.
- Google Consent Mode v2 erweitert: wait_for_update + region=EEA/CH/GB.
- Consent-Log Export (CSV/JSON) per einwilligungen_export_routes.

Bug-Fixes
- canonical_control_routes: _jsonish-Helper fuer string-typed jsonb,
  similar-controls-Endpoint mit _has_embedding_col()-Cache (kein 500 mehr).
- Control-Library Frontend: defensive .map-Coercer in 2 Detail-Views.
- Embedding-Service-Batching (32er Batches statt 165 in einem Call).
- KeyError 'control_id' in MC-Result-Aggregation (defensive .get).
- Master-Controls-Klick-Through von /sdk/master-controls auf
  /sdk/control-library?control=<id> mit URL-Param-Auto-Open.
- Dockerfile: /data pre-chowned auf appuser (Audit-DB-Schreibrecht).
- Cookie-Text-Routing-Bug (cmp_reconstructed > DOM-extraction).
- doc_type-aware MC-Filter (statt all-text-MCs).
- Master-Contract-Dedup (60 BMW-Internal-Eintraege = 1 Adobe-Vertrag).
- A3-v2-Audit hat 24 UI-Sprache-MCs als 'process' reklassifiziert.

Tests
- test_migration_mappers.py (9 Tests)
- test_migration_endpoints.py (4 Tests)

Skripte (one-shot)
- classify_mc_check_type.py (v1) + _v2 (PK=control_id,doc_type)
- audit_mc_doctype_fit.py (v1 fits) + _v2 (ui_only + scope_requires)

BMW-Run-Bilanz v1 (broken) -> v9 (alle Fixes):
  DSE     7,5% -> 81-83%
  Impressum 4%   -> 100% (6 echte MCs alle erfuellt)
  Cookie  0%    -> 79-83% (CMP-Text-Routing + Embedding)
  Plus: 10 Konsolidierungs-Kategorien, geschaetzte Saving 200k-3M / Jahr
  Plus: Action-Recipes + Doc-Anchors fuer jeden Fail

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:30:08 +02:00

breakpilot-compliance

DSGVO/AI-Act compliance platform — 10 services, Go · Python · TypeScript

CI Go Python Node.js TypeScript FastAPI DSGVO AI Act LOC guard Services


Overview

breakpilot-compliance is a multi-tenant DSGVO/EU AI Act compliance platform that provides an SDK for consent management, data subject requests (DSR), audit logging, iACE impact assessments, and document archival. It ships as 10 containerised services covering an admin dashboard, a developer portal, a Python/FastAPI backend, a Go AI compliance engine, TTS, and a decentralised document store on IPFS. Every service is deployed automatically via Gitea Actions → Orca on every push to main.


Architecture

Service Tech Port Container
admin-compliance Next.js 15 3007 bp-compliance-admin
backend-compliance Python / FastAPI 0.123 8002 bp-compliance-backend
ai-compliance-sdk Go 1.24 / Gin 8093 bp-compliance-ai-sdk
developer-portal Next.js 15 3006 bp-compliance-developer-portal
breakpilot-compliance-sdk TypeScript SDK (React/Vue/Angular/vanilla)
consent-sdk JS/TS Consent SDK
compliance-tts-service Python / Piper TTS 8095 bp-compliance-tts
document-crawler Python / FastAPI 8098 bp-compliance-document-crawler
dsms-gateway Python / FastAPI / IPFS 8082 bp-compliance-dsms-gateway
dsms-node IPFS Kubo v0.24.0 bp-compliance-dsms-node

All containers share the external breakpilot-network Docker network and depend on breakpilot-core (Valkey, Vault, RAG service, Nginx reverse proxy).


Quick Start

Prerequisites: Docker, Go 1.24+, Python 3.12+, Node.js 20+

git clone ssh://git@gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-compliance.git
cd breakpilot-compliance

# Copy and populate secrets (never commit .env)
cp .env.example .env

# Start all services
docker compose up -d

For the Orca/Hetzner production target (x86_64), use the override:

docker compose -f docker-compose.yml -f docker-compose.hetzner.yml up -d

Development Workflow

Use feature branches off main. Supported prefixes: feat/, feature/, hotfix/.

git checkout main && git pull origin main
git checkout -b feat/my-change
# ... make changes ...
git push origin feat/my-change
# Open a PR → squash merge to main

Push to main triggers:

  1. Gitea Actions — lint → test → validate (see CI Pipeline below)
  2. Orca — automatic build + deploy (~3 min total)

Monitor status: https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance/actions


CI Pipeline

Defined in .gitea/workflows/ci.yaml.

Job What it checks
loc-budget All source files ≤ 500 LOC; soft target 300
guardrail-integrity Commits touching guardrail files carry [guardrail-change]
go-lint golangci-lint on ai-compliance-sdk/
python-lint ruff + mypy on Python services
nodejs-lint tsc --noEmit + ESLint on Next.js services
test-go-ai-compliance go test ./... in ai-compliance-sdk/
test-python-backend-compliance pytest in backend-compliance/
test-python-document-crawler pytest in document-crawler/
test-python-dsms-gateway pytest test_main.py in dsms-gateway/
sbom-scan License + vulnerability scan via syft + grype
validate-canonical-controls OpenAPI contract baseline diff

File Budget

Limit Value How to check
Soft target 300 LOC bash scripts/check-loc.sh
Hard cap 500 LOC Same; also enforced by PreToolUse hook + git pre-commit + CI
Exceptions .claude/rules/loc-exceptions.txt Require written rationale + [guardrail-change] commit marker

The .claude/settings.json PreToolUse hook blocks Claude Code from writing or editing files that would exceed the hard cap. The git pre-commit hook re-checks. CI is the final gate.


URL
Admin dashboard https://admin-dev.breakpilot.ai
Developer portal https://developers-dev.breakpilot.ai
Backend API https://api-dev.breakpilot.ai
AI SDK API https://sdk-dev.breakpilot.ai
Gitea repo https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance
Gitea Actions https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance/actions
S
Description
No description provided
Readme 25 MiB
Languages
TypeScript 43.1%
Python 30.8%
Go 23.5%
Shell 1.2%
PLpgSQL 0.8%
Other 0.3%