T

Benjamin Admin 07e392913f feat(knowledge-intake): classify a document + assess its impact before extraction

Phase A1. The real knowledge production is not writing — it is TARGETED UPDATING: when 20 documents
arrive, which 5 change our knowledge and which 15 are ignorable? Before the parser, Knowledge Intake
classifies a new document (no content extraction) and intersects its signals with an index of the
existing knowledge to emit a Knowledge Package (an impact analysis).

- compliance/knowledge_intake/: build_knowledge_index(patterns, playbooks, reference_scenarios,
  obligation_index) + assess_document_impact(descriptor, index) -> KnowledgePackage. Deterministic,
  NO content extraction, NO LLM. Surfaces affected capabilities / playbooks / transition patterns /
  reference scenarios / (injected) obligations, whether it is a new domain, and a triage level
  (HIGH / LOW / NONE / NEW_DOMAIN) with a recommendation.
- ADR-006: Knowledge Intake = classify + impact before extraction; full factory Intake -> Package ->
  Parser -> Draft -> Review -> Published; phase order A1 Intake / A2 Draft / A3 Review.
- reference suite: "Knowledge Intake" section triages 3 example documents (CRA SBOM-FAQ -> high,
  14C/2PB/3RTS/2Obl; environmental guidance -> new_domain; marketing blog -> ignorable). Section
  lives in _helpers.py to keep generate.py under the 500-LOC budget.
- Honest known refinement surfaced by intake: regulation-ID normalization (CRA vs Cyber Resilience Act).

10 intake tests (60 with the adjacent modules), mypy --strict clean (16 files), check-loc 0.
Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-06-27 13:58:59 +02:00

.claude

feat: wire breakpilot-compliance to Infisical for local dev

2026-06-22 21:00:58 +02:00

.gitea/workflows

ci(go-lint): golangci-lint v1.64.8 (go1.24) + new-from-merge-base (#32 )

2026-06-23 10:58:48 +00:00

.woodpecker

…

admin-compliance

feat(ai-sdk): legal-corpus coverage + Phase-2 citation-graph assessment (#33 )

2026-06-24 06:37:22 +00:00

ai-compliance-sdk

feat(ai-sdk): coverage blind-spot proposer (P2 slice 6, type 4)

2026-06-26 10:27:01 +02:00

backend-compliance

feat(knowledge-intake): classify a document + assess its impact before extraction

2026-06-27 13:58:59 +02:00

breakpilot-compliance-sdk

…

compliance-tts-service

feat(p83): wire BUILD_SHA through all Dockerfiles + compose + CI check

2026-05-22 18:29:03 +02:00

consent-sdk

…

consent-tester

fix(consent-history): banner_provider als Fallback fürs CMP (#62 )

2026-06-13 17:03:44 +02:00

design/redesign

feat(redesign): Design-Tokens + Ebene-2 "Cyber trifft Safety" (additiv)

2026-06-18 16:49:04 +02:00

developer-portal

feat(p83): wire BUILD_SHA through all Dockerfiles + compose + CI check

2026-05-22 18:29:03 +02:00

docs-site

feat(p83): wire BUILD_SHA through all Dockerfiles + compose + CI check

2026-05-22 18:29:03 +02:00

docs-src

feat(knowledge-intake): classify a document + assess its impact before extraction

2026-06-27 13:58:59 +02:00

document-crawler

feat(p83): wire BUILD_SHA through all Dockerfiles + compose + CI check

2026-05-22 18:29:03 +02:00

dsms-gateway

feat(iace): DSMS-CID-Badge im Tech-File-Export + aggregierter Bulk-Diff

2026-06-09 09:07:20 +02:00

dsms-node

feat(p83): wire BUILD_SHA through all Dockerfiles + compose + CI check

2026-05-22 18:29:03 +02:00

obligations

feat: Ownership-Konflikt #1 RESOLVED (Capability = geteilter Knoten) + Reasoning#1 Re-Link

2026-06-26 21:54:07 +02:00

scripts

feat: Capability Registry v1 API-Vertrag (#59 ) + Ownership-Modell finalisiert

2026-06-26 10:35:49 +02:00

zeroclaw

feat(platform): live-wire AGB v2 + DSE v3 + Architektur-Tab (#29 )

2026-06-21 12:58:26 +00:00

.env.example

…

.env.orca.example

chore: replace all Coolify references with Orca

2026-04-19 16:33:56 +02:00

.gitignore

…

.gitleaks.toml

feat(platform): live-wire AGB v2 + DSE v3 + Architektur-Tab (#29 )

2026-06-21 12:58:26 +00:00

.infisical.json

feat: wire breakpilot-compliance to Infisical for local dev

2026-06-22 21:00:58 +02:00

AGENTS.go.md

fix: resolve CI failures in Python tests and admin-compliance build

2026-04-19 16:41:39 +02:00

AGENTS.python.md

fix: resolve CI failures in Python tests and admin-compliance build

2026-04-19 16:41:39 +02:00

AGENTS.typescript.md

docs(agents): require build + lint + test locally before pushing [guardrail-change]

2026-04-19 16:38:21 +02:00

CONTRIBUTING.md

chore: replace all Coolify references with Orca

2026-04-19 16:33:56 +02:00

docker-compose.hetzner.yml

feat(mcp): HTTP+Bearer CRA-MCP-Server für den Repo-Scanner + Finding-Adapter

2026-06-15 18:30:47 +02:00

docker-compose.orca.yml

chore: replace all Coolify references with Orca

2026-04-19 16:33:56 +02:00

docker-compose.yml

feat(cra): Pull-Flow — Findings vom Scanner-MCP ziehen + assessen

2026-06-15 19:05:44 +02:00

dse_criteria_backup.json

feat(platform): live-wire AGB v2 + DSE v3 + Architektur-Tab (#29 )

2026-06-21 12:58:26 +00:00

dse_criteria_changelog.json

feat(platform): live-wire AGB v2 + DSE v3 + Architektur-Tab (#29 )

2026-06-21 12:58:26 +00:00

INFISICAL_SETUP.md

feat: wire breakpilot-compliance to Infisical for local dev

2026-06-22 21:00:58 +02:00

Makefile

feat: wire breakpilot-compliance to Infisical for local dev

2026-06-22 21:00:58 +02:00

mkdocs.yml

docs(architecture): RAG retrieval engine architecture set (01-09)

2026-06-25 09:25:22 +02:00

README.md

feat: wire breakpilot-compliance to Infisical for local dev

2026-06-22 21:00:58 +02:00

REFACTOR_PLAYBOOK.md

…

README.md

breakpilot-compliance

DSGVO/AI-Act compliance platform — 10 services, Go · Python · TypeScript

Overview

breakpilot-compliance is a multi-tenant DSGVO/EU AI Act compliance platform that provides an SDK for consent management, data subject requests (DSR), audit logging, iACE impact assessments, and document archival. It ships as 10 containerised services covering an admin dashboard, a developer portal, a Python/FastAPI backend, a Go AI compliance engine, TTS, and a decentralised document store on IPFS. Every service is deployed automatically via Gitea Actions → Orca on every push to main.

Architecture

Service	Tech	Port	Container
admin-compliance	Next.js 15	3007	bp-compliance-admin
backend-compliance	Python / FastAPI 0.123	8002	bp-compliance-backend
ai-compliance-sdk	Go 1.24 / Gin	8093	bp-compliance-ai-sdk
developer-portal	Next.js 15	3006	bp-compliance-developer-portal
breakpilot-compliance-sdk	TypeScript SDK (React/Vue/Angular/vanilla)	—	—
consent-sdk	JS/TS Consent SDK	—	—
compliance-tts-service	Python / Piper TTS	8095	bp-compliance-tts
document-crawler	Python / FastAPI	8098	bp-compliance-document-crawler
dsms-gateway	Python / FastAPI / IPFS	8082	bp-compliance-dsms-gateway
dsms-node	IPFS Kubo v0.24.0	—	bp-compliance-dsms-node

All containers share the external breakpilot-network Docker network and depend on breakpilot-core (Valkey, Vault, RAG service, Nginx reverse proxy).

Quick Start

Prerequisites: Docker, Go 1.24+, Python 3.12+, Node.js 20+, Infisical CLI

git clone ssh://git@gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-compliance.git
cd breakpilot-compliance

# One-time per machine: log in to the self-hosted Infisical instance
infisical login --domain https://secrets.meghsakha.com

# Start the full stack with secrets injected from Infisical (env=dev)
make dev

Secrets are pulled from Infisical (secrets.meghsakha.com) at runtime; .env files are not used. See INFISICAL_SETUP.md for full onboarding, and make help for the rest of the targets (dev-build, dev-down, secrets, secrets-set).

For the Orca/Hetzner production target (x86_64), use the override:

make dev ENV=prod  # or:
infisical run --env=prod -- docker compose -f docker-compose.yml -f docker-compose.hetzner.yml up -d

Development Workflow

Use feature branches off main. Supported prefixes: feat/, feature/, hotfix/.

git checkout main && git pull origin main
git checkout -b feat/my-change
# ... make changes ...
git push origin feat/my-change
# Open a PR → squash merge to main

Push to main triggers:

Gitea Actions — lint → test → validate (see CI Pipeline below)
Orca — automatic build + deploy (~3 min total)

Monitor status: https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance/actions

CI Pipeline

Defined in .gitea/workflows/ci.yaml.

Job	What it checks
`loc-budget`	All source files ≤ 500 LOC; soft target 300
`guardrail-integrity`	Commits touching guardrail files carry `[guardrail-change]`
`go-lint`	`golangci-lint` on `ai-compliance-sdk/`
`python-lint`	`ruff` + `mypy` on Python services
`nodejs-lint`	`tsc --noEmit` + ESLint on Next.js services
`test-go-ai-compliance`	`go test ./...` in `ai-compliance-sdk/`
`test-python-backend-compliance`	`pytest` in `backend-compliance/`
`test-python-document-crawler`	`pytest` in `document-crawler/`
`test-python-dsms-gateway`	`pytest test_main.py` in `dsms-gateway/`
`sbom-scan`	License + vulnerability scan via `syft` + `grype`
`validate-canonical-controls`	OpenAPI contract baseline diff

File Budget

Limit	Value	How to check
Soft target	300 LOC	`bash scripts/check-loc.sh`
Hard cap	500 LOC	Same; also enforced by `PreToolUse` hook + git pre-commit + CI
Exceptions	`.claude/rules/loc-exceptions.txt`	Require written rationale + `[guardrail-change]` commit marker

The .claude/settings.json PreToolUse hook blocks Claude Code from writing or editing files that would exceed the hard cap. The git pre-commit hook re-checks. CI is the final gate.

Links

	URL
Admin dashboard	https://admin-dev.breakpilot.ai
Developer portal	https://developers-dev.breakpilot.ai
Backend API	https://api-dev.breakpilot.ai
AI SDK API	https://sdk-dev.breakpilot.ai
Gitea repo	https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance
Gitea Actions	https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance/actions

Languages

TypeScript 39.8%

Python 35.3%

Go 22.4%

Shell 1.1%

PLpgSQL 0.7%

Other 0.4%