Files
breakpilot-compliance/backend-compliance
Benjamin Admin c39787ad96 fix(onboarding): separate observation vs requirement signals — a demanded SBOM is not a present SBOM
Semantic correction of the knowledge base BEFORE the empirical loop (#59) is built — otherwise the
Observation Store would learn from already-misclassified signals. The Silent Pass conflated two kinds of
signal into one: an OBSERVATION ("I saw an SBOM in the repo") and a REQUIREMENT ("a tender DEMANDS an
SBOM"). They were aliased to the same canonical id, so a tender clause read as "SBOM already present" and
suppressed the very question that should have been asked.

Fix — make the kind explicit and authoritative (no new architecture, data + thin wiring):
  - `kind` ∈ {observation, requirement} on ProducedSignal (producer may declare) and on the canonical
    SignalVocabularyEntry (AUTHORITATIVE — a mislabelled producer cannot collapse the two).
  - Vocabulary split: sbom_file_found → sbom_present (obs) + sbom_required (req);
    security_txt_or_cvd_policy → cvd_policy_present (obs) + psirt_required (req); add signed_updates_required.
    requirement signals are intentionally UNMAPPED in intake_signal_map (they describe a target, not state).
  - silent_intake() consumes ONLY kind==observation; requirement signals are preserved in
    `requirements_seen` (visible/auditable) but NEVER become a detected capability.
  - normalize_signals() stamps the vocabulary's kind onto every IntakeSignal; unknown ids still pass through.

This is the same Observation-vs-Requirement split the Requirements Verification Platform rests on:
observations are reality, requirements are targets, and their comparison is the delta. A tender / OEM spec /
law now produces requirement signals; scanners / repos / documents produce observation signals.

Tests: rewrote the two test_signal_producer cases that previously ASSERTED the bug (tender == repo) to pin
the correct split; regression — `requires_sbom` yields no capability + stays in requirements_seen while
`cyclonedx_found` still detects sbom_creation; endpoint-level regression that a tender requirement does not
auto-detect and the gap stays asked; vocabulary-kind-overrides-mislabelled-producer. 25 onboarding tests
pass, mypy --strict clean, demo runs, check-loc 0. Runtime effect → deploy + smoke. (Fix A; partial-vs-
detected decoupling follows as Fix B before #59.)
2026-06-28 15:52:50 +02:00
..

backend-compliance

Python/FastAPI service implementing the DSGVO compliance API: DSR, DSFA, consent, controls, risks, evidence, audit, vendor management, ISMS, change requests, document generation.

Port: 8002 (container: bp-compliance-backend) Stack: Python 3.12, FastAPI, SQLAlchemy 2.x, Alembic, Keycloak auth.

Architecture

compliance/
├── api/            # Routers (thin, ≤30 LOC per handler)
├── services/       # Business logic
├── repositories/   # DB access
├── domain/         # Value objects, domain errors
├── schemas/        # Pydantic models, split per domain
└── db/models/      # SQLAlchemy ORM, one module per aggregate

The service follows this layered target structure but not all files are fully refactored yet. Phase 1 backlog is tracked in .claude/rules/loc-exceptions.txt (27 backend-compliance files currently excepted).

See ../AGENTS.python.md for the full convention and ../.claude/rules/architecture.md for the non-negotiable rules.

Run locally

cd backend-compliance
pip install -r requirements.txt
export COMPLIANCE_DATABASE_URL=...  # Postgres (Hetzner or local)
uvicorn main:app --reload --port 8002

Tests

pytest compliance/tests/ -v
pytest --cov=compliance --cov-report=term-missing

Layout: tests/unit/, tests/integration/, tests/contracts/. Contract tests diff /openapi.json against tests/contracts/openapi.baseline.json.

Public API surface

404+ endpoints across /api/v1/*. Grouped by domain: ai, audit, consent, dsfa, dsr, gdpr, vendor, evidence, change-requests, generation, projects, company-profile, isms. Every path is a contract — see the "Public endpoints" rule in the root CLAUDE.md.

Environment

Var Purpose
COMPLIANCE_DATABASE_URL Postgres DSN, sslmode=require
KEYCLOAK_* Auth verification
QDRANT_URL, QDRANT_API_KEY Vector search
CORE_VALKEY_URL Session cache

Don't touch

Database schema, __tablename__, column names, existing migrations under migrations/. See root CLAUDE.md rule 3.