Files

T

Benjamin Admin 873997c13b feat(vvt): V3 — LLM vendor extraction fallback for unknown CMPs

When the cookie text has no captured CMP payload (long-tail sites that
don't use ePaaS/OneTrust/Cookiebot/etc.) we now fall back to a Qwen → OVH
LLM cascade to extract a structured vendor list from the policy text.

New module backend/compliance/services/vendor_llm_extractor.py:
- extract_vendors_via_llm(cookie_text): runs Qwen first (local Ollama),
  then OVH if Qwen returns nothing usable.
- System prompt instructs the model to return STRICT JSON only:
  {vendors: [{name, country, purpose, category, opt_out_url,
   privacy_policy_url, persistence, cookies: [...]}]}
- Lenient JSON parser tolerates code-fences, prose wrappers, dict vs list.
- _normalize() caps array sizes (80 vendors, 30 cookies each), validates
  URLs (must be http(s)), trims fields to reasonable lengths.

Route integration (agent_compliance_check_routes.py):
- After named-CMP extract: if cmp_vendors is empty AND the cookie text
  has ≥500 words (otherwise it's likely navigation chrome), invoke the
  LLM extractor. Progress message 'Vendor-Liste per LLM extrahieren...'.
- Vendors then run through the same validate_vendor_urls + score_vendors
  pipeline → VVT table rendered identically regardless of source.

docker-compose.yml: backend-compliance gains OLLAMA_URL, CMP_LLM_MODEL,
OVH_LLM_URL/KEY/MODEL env vars (same names as consent-tester so the
configuration is unified).

This closes the 'every site eventually gets a VVT table' goal:
- Known CMP → V1/V2 structured extraction (fast, exact)
- Unknown CMP → V3 LLM extraction (slow, best-effort)
- No text at all → no vendors, but other compliance checks still run.

2026-05-17 09:55:42 +02:00

auth

Initial commit: breakpilot-compliance - Compliance SDK Platform

2026-02-11 23:47:28 +01:00

classroom_engine

Initial commit: breakpilot-compliance - Compliance SDK Platform

2026-02-11 23:47:28 +01:00

compliance

feat(vvt): V3 — LLM vendor extraction fallback for unknown CMPs

2026-05-17 09:55:42 +02:00

middleware

fix(quality): Ruff/CVE/TS-Fixes, 104 neue Tests, Complexity-Refactoring

2026-03-07 19:00:33 +01:00

migrations

feat(cmp): Phase 2 — script blocking + cookie tracking

2026-05-11 22:52:26 +02:00

scripts

feat: add policy library with 29 German policy templates

2026-03-14 22:37:33 +01:00

services

Initial commit: breakpilot-compliance - Compliance SDK Platform

2026-02-11 23:47:28 +01:00

templates/gdpr

Initial commit: breakpilot-compliance - Compliance SDK Platform

2026-02-11 23:47:28 +01:00

tests

feat: Cookie-Banner ↔ Backend Integration (DSR, Retention, Consent Proof)

2026-05-02 19:52:04 +02:00

consent_admin_api.py

Initial commit: breakpilot-compliance - Compliance SDK Platform

2026-02-11 23:47:28 +01:00

consent_api.py

Initial commit: breakpilot-compliance - Compliance SDK Platform

2026-02-11 23:47:28 +01:00

consent_client.py

refactor: phase 0 guardrails + phase 1 step 2 (models.py split)

2026-04-07 13:18:29 +02:00

database.py

Initial commit: breakpilot-compliance - Compliance SDK Platform

2026-02-11 23:47:28 +01:00

Dockerfile

fix(docker): make torch/sentence-transformers optional to unblock builds

2026-03-21 15:06:51 +01:00

gdpr_api.py

Initial commit: breakpilot-compliance - Compliance SDK Platform

2026-02-11 23:47:28 +01:00

gdpr_export_service.py

Initial commit: breakpilot-compliance - Compliance SDK Platform

2026-02-11 23:47:28 +01:00

main.py

feat(vendor-assessment): AVV/SCC/TOM/Sub-Processor checklists + assessment service

2026-05-12 23:14:54 +02:00

migration_runner.py

fix: migration runner strips BEGIN/COMMIT and guards missing tables

2026-03-14 21:59:10 +01:00

mypy.ini

chore: mypy cleanup — comprehensive disable headers for agent-created services

2026-04-10 11:23:43 +02:00

PHASE1_RUNBOOK.md

refactor: phase 0 guardrails + phase 1 step 2 (models.py split)

2026-04-07 13:18:29 +02:00

README.md

docs: update service READMEs for refactor progress and stale phase references

2026-04-19 16:07:23 +02:00

requirements-reranker.txt

fix(docker): make torch/sentence-transformers optional to unblock builds

2026-03-21 15:06:51 +01:00

requirements.txt

fix(docker): make torch/sentence-transformers optional to unblock builds

2026-03-21 15:06:51 +01:00

README.md

backend-compliance

Python/FastAPI service implementing the DSGVO compliance API: DSR, DSFA, consent, controls, risks, evidence, audit, vendor management, ISMS, change requests, document generation.

Port: 8002 (container: bp-compliance-backend) Stack: Python 3.12, FastAPI, SQLAlchemy 2.x, Alembic, Keycloak auth.

Architecture

compliance/
├── api/            # Routers (thin, ≤30 LOC per handler)
├── services/       # Business logic
├── repositories/   # DB access
├── domain/         # Value objects, domain errors
├── schemas/        # Pydantic models, split per domain
└── db/models/      # SQLAlchemy ORM, one module per aggregate

The service follows this layered target structure but not all files are fully refactored yet. Phase 1 backlog is tracked in .claude/rules/loc-exceptions.txt (27 backend-compliance files currently excepted).

See ../AGENTS.python.md for the full convention and ../.claude/rules/architecture.md for the non-negotiable rules.

Run locally

cd backend-compliance
pip install -r requirements.txt
export COMPLIANCE_DATABASE_URL=...  # Postgres (Hetzner or local)
uvicorn main:app --reload --port 8002

Tests

pytest compliance/tests/ -v
pytest --cov=compliance --cov-report=term-missing

Layout: tests/unit/, tests/integration/, tests/contracts/. Contract tests diff /openapi.json against tests/contracts/openapi.baseline.json.

Public API surface

404+ endpoints across /api/v1/*. Grouped by domain: ai, audit, consent, dsfa, dsr, gdpr, vendor, evidence, change-requests, generation, projects, company-profile, isms. Every path is a contract — see the "Public endpoints" rule in the root CLAUDE.md.

Environment

Var	Purpose
`COMPLIANCE_DATABASE_URL`	Postgres DSN, `sslmode=require`
`KEYCLOAK_*`	Auth verification
`QDRANT_URL`, `QDRANT_API_KEY`	Vector search
`CORE_VALKEY_URL`	Session cache

Don't touch

Database schema, __tablename__, column names, existing migrations under migrations/. See root CLAUDE.md rule 3.