Files

T

Benjamin Admin 7df15010ff feat(onboarding): Observation Log — append-only JSONL calibration store (Task 59b/c v1)

Per the user's decision (2026-06-28): observations are CALIBRATION data for the knowledge base, NOT
business data and NOT product-DB data. So they live with the other versioned knowledge artifacts as an
append-only JSONL log under knowledge/observations/ — NO migration, NO DB. (A real persistence layer is
only warranted once thousands of onboardings exist; not before.)

  - ObservationRecord = Observation + log metadata (observation_id, timestamp [caller-stamped, no hidden
    clock], customer_archetype [anonymised — NEVER a real name], evidence, provenance, knowledge_version).
  - append_observation() writes one JSON line; append-only, lines are never rewritten. A later review is a
    NEW line with the same observation_id; load_observations(reconcile=True) keeps the latest per id.
  - load_observations() reads a single .jsonl or a directory of monthly .jsonl files.
  - aggregate_by_hypothesis() (59c) -> per-hypothesis distribution + confidence, COMPUTED from the log
    (computed-not-stored); the review gate (reviewed-only) is enforced in empirical_distribution/confidence.
  - review_queue() -> the unreviewed worklist. Observation -> Review -> Accepted -> recompute, never
    Observation -> confidence++. Nothing is ever written back to a hypothesis.

You can `rm` the log and recompute, `git diff` it over months, or rebuild confidence under a new policy —
fully consistent with computed-not-stored and the product/knowledge data separation.

Non-runtime (module + tests only, no endpoint) -> origin/main, NO dev deploy. 5 new tests (append-only,
review supersession, review-gate statistics, queue, monthly-file load); 27 onboarding tests pass, mypy
--strict clean (9 modules), check-loc 0. 59d (surface computed confidence at runtime) stays a later step.

2026-06-28 16:29:54 +02:00

auth

…

classroom_engine

…

compliance

feat(onboarding): Observation Log — append-only JSONL calibration store (Task 59b/c v1)

2026-06-28 16:29:54 +02:00

knowledge

feat(onboarding): Observation Log — append-only JSONL calibration store (Task 59b/c v1)

2026-06-28 16:29:54 +02:00

middleware

…

migrations

fix(db): dedupe doc_check_controls 3x + unique constraint

2026-06-20 14:25:03 +02:00

reference_scenarios

feat: Signal Producer interface + Normalizer — one signal language for all sources (before #58 )

2026-06-28 14:49:57 +02:00

scripts

chore(cra): align CRA module to the dev/demo tenant + demo-customer seed script

2026-06-14 15:52:49 +02:00

services

…

templates/gdpr

…

tests

feat(onboarding): Observation Log — append-only JSONL calibration store (Task 59b/c v1)

2026-06-28 16:29:54 +02:00

consent_admin_api.py

…

consent_api.py

…

consent_client.py

…

database.py

…

Dockerfile

feat(audit): Screenshot+Tesseract-OCR Cookie-Extract als Vendor-Quelle C

2026-05-22 23:22:35 +02:00

gdpr_api.py

…

gdpr_export_service.py

…

main.py

feat(sdk): Kunden-Dokumente + CRA-Meldewesen, Screening aus Frontend genommen

2026-06-17 21:21:28 +02:00

migration_runner.py

…

mypy.ini

…

PHASE1_RUNBOOK.md

…

README.md

…

requirements-reranker.txt

…

requirements.txt

feat(cra): standalone CRA finding->Annex I risk mapper + MCP interface

2026-06-13 20:22:34 +02:00

README.md

backend-compliance

Python/FastAPI service implementing the DSGVO compliance API: DSR, DSFA, consent, controls, risks, evidence, audit, vendor management, ISMS, change requests, document generation.

Port: 8002 (container: bp-compliance-backend) Stack: Python 3.12, FastAPI, SQLAlchemy 2.x, Alembic, Keycloak auth.

Architecture

compliance/
├── api/            # Routers (thin, ≤30 LOC per handler)
├── services/       # Business logic
├── repositories/   # DB access
├── domain/         # Value objects, domain errors
├── schemas/        # Pydantic models, split per domain
└── db/models/      # SQLAlchemy ORM, one module per aggregate

The service follows this layered target structure but not all files are fully refactored yet. Phase 1 backlog is tracked in .claude/rules/loc-exceptions.txt (27 backend-compliance files currently excepted).

See ../AGENTS.python.md for the full convention and ../.claude/rules/architecture.md for the non-negotiable rules.

Run locally

cd backend-compliance
pip install -r requirements.txt
export COMPLIANCE_DATABASE_URL=...  # Postgres (Hetzner or local)
uvicorn main:app --reload --port 8002

Tests

pytest compliance/tests/ -v
pytest --cov=compliance --cov-report=term-missing

Layout: tests/unit/, tests/integration/, tests/contracts/. Contract tests diff /openapi.json against tests/contracts/openapi.baseline.json.

Public API surface

404+ endpoints across /api/v1/*. Grouped by domain: ai, audit, consent, dsfa, dsr, gdpr, vendor, evidence, change-requests, generation, projects, company-profile, isms. Every path is a contract — see the "Public endpoints" rule in the root CLAUDE.md.

Environment

Var	Purpose
`COMPLIANCE_DATABASE_URL`	Postgres DSN, `sslmode=require`
`KEYCLOAK_*`	Auth verification
`QDRANT_URL`, `QDRANT_API_KEY`	Vector search
`CORE_VALKEY_URL`	Session cache

Don't touch

Database schema, __tablename__, column names, existing migrations under migrations/. See root CLAUDE.md rule 3.