Files

T

Benjamin Admin 07e392913f feat(knowledge-intake): classify a document + assess its impact before extraction

Phase A1. The real knowledge production is not writing — it is TARGETED UPDATING: when 20 documents
arrive, which 5 change our knowledge and which 15 are ignorable? Before the parser, Knowledge Intake
classifies a new document (no content extraction) and intersects its signals with an index of the
existing knowledge to emit a Knowledge Package (an impact analysis).

- compliance/knowledge_intake/: build_knowledge_index(patterns, playbooks, reference_scenarios,
  obligation_index) + assess_document_impact(descriptor, index) -> KnowledgePackage. Deterministic,
  NO content extraction, NO LLM. Surfaces affected capabilities / playbooks / transition patterns /
  reference scenarios / (injected) obligations, whether it is a new domain, and a triage level
  (HIGH / LOW / NONE / NEW_DOMAIN) with a recommendation.
- ADR-006: Knowledge Intake = classify + impact before extraction; full factory Intake -> Package ->
  Parser -> Draft -> Review -> Published; phase order A1 Intake / A2 Draft / A3 Review.
- reference suite: "Knowledge Intake" section triages 3 example documents (CRA SBOM-FAQ -> high,
  14C/2PB/3RTS/2Obl; environmental guidance -> new_domain; marketing blog -> ignorable). Section
  lives in _helpers.py to keep generate.py under the 500-LOC budget.
- Honest known refinement surfaced by intake: regulation-ID normalization (CRA vs Cyber Resilience Act).

10 intake tests (60 with the adjacent modules), mypy --strict clean (16 files), check-loc 0.
Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-06-27 13:58:59 +02:00

auth

…

classroom_engine

…

compliance

feat(knowledge-intake): classify a document + assess its impact before extraction

2026-06-27 13:58:59 +02:00

knowledge

feat(playbook): Implementation Playbooks — the Berater renderer ("wie komme ich dort hin?")

2026-06-27 10:38:13 +02:00

middleware

…

migrations

fix(db): dedupe doc_check_controls 3x + unique constraint

2026-06-20 14:25:03 +02:00

reference_scenarios

feat(knowledge-intake): classify a document + assess its impact before extraction

2026-06-27 13:58:59 +02:00

scripts

chore(cra): align CRA module to the dev/demo tenant + demo-customer seed script

2026-06-14 15:52:49 +02:00

services

…

templates/gdpr

…

tests

feat(knowledge-intake): classify a document + assess its impact before extraction

2026-06-27 13:58:59 +02:00

consent_admin_api.py

…

consent_api.py

…

consent_client.py

…

database.py

…

Dockerfile

feat(audit): Screenshot+Tesseract-OCR Cookie-Extract als Vendor-Quelle C

2026-05-22 23:22:35 +02:00

gdpr_api.py

…

gdpr_export_service.py

…

main.py

feat(sdk): Kunden-Dokumente + CRA-Meldewesen, Screening aus Frontend genommen

2026-06-17 21:21:28 +02:00

migration_runner.py

…

mypy.ini

…

PHASE1_RUNBOOK.md

…

README.md

…

requirements-reranker.txt

…

requirements.txt

feat(cra): standalone CRA finding->Annex I risk mapper + MCP interface

2026-06-13 20:22:34 +02:00

README.md

backend-compliance

Python/FastAPI service implementing the DSGVO compliance API: DSR, DSFA, consent, controls, risks, evidence, audit, vendor management, ISMS, change requests, document generation.

Port: 8002 (container: bp-compliance-backend) Stack: Python 3.12, FastAPI, SQLAlchemy 2.x, Alembic, Keycloak auth.

Architecture

compliance/
├── api/            # Routers (thin, ≤30 LOC per handler)
├── services/       # Business logic
├── repositories/   # DB access
├── domain/         # Value objects, domain errors
├── schemas/        # Pydantic models, split per domain
└── db/models/      # SQLAlchemy ORM, one module per aggregate

The service follows this layered target structure but not all files are fully refactored yet. Phase 1 backlog is tracked in .claude/rules/loc-exceptions.txt (27 backend-compliance files currently excepted).

See ../AGENTS.python.md for the full convention and ../.claude/rules/architecture.md for the non-negotiable rules.

Run locally

cd backend-compliance
pip install -r requirements.txt
export COMPLIANCE_DATABASE_URL=...  # Postgres (Hetzner or local)
uvicorn main:app --reload --port 8002

Tests

pytest compliance/tests/ -v
pytest --cov=compliance --cov-report=term-missing

Layout: tests/unit/, tests/integration/, tests/contracts/. Contract tests diff /openapi.json against tests/contracts/openapi.baseline.json.

Public API surface

404+ endpoints across /api/v1/*. Grouped by domain: ai, audit, consent, dsfa, dsr, gdpr, vendor, evidence, change-requests, generation, projects, company-profile, isms. Every path is a contract — see the "Public endpoints" rule in the root CLAUDE.md.

Environment

Var	Purpose
`COMPLIANCE_DATABASE_URL`	Postgres DSN, `sslmode=require`
`KEYCLOAK_*`	Auth verification
`QDRANT_URL`, `QDRANT_API_KEY`	Vector search
`CORE_VALKEY_URL`	Session cache

Don't touch

Database schema, __tablename__, column names, existing migrations under migrations/. See root CLAUDE.md rule 3.