Files
breakpilot-compliance/backend-compliance/compliance/api/vvt_routes.py
Sharang Parnerkar 3320ef94fc refactor: phase 0 guardrails + phase 1 step 2 (models.py split)
Squash of branch refactor/phase0-guardrails-and-models-split — 4 commits,
81 files, 173/173 pytest green, OpenAPI contract preserved (360 paths /
484 operations).

## Phase 0 — Architecture guardrails

Three defense-in-depth layers to keep the architecture rules enforced
regardless of who opens Claude Code in this repo:

  1. .claude/settings.json PreToolUse hook on Write/Edit blocks any file
     that would exceed the 500-line hard cap. Auto-loads in every Claude
     session in this repo.
  2. scripts/githooks/pre-commit (install via scripts/install-hooks.sh)
     enforces the LOC cap locally, freezes migrations/ without
     [migration-approved], and protects guardrail files without
     [guardrail-change].
  3. .gitea/workflows/ci.yaml gains loc-budget + guardrail-integrity +
     sbom-scan (syft+grype) jobs, adds mypy --strict for the new Python
     packages (compliance/{services,repositories,domain,schemas}), and
     tsc --noEmit for admin-compliance + developer-portal.

Per-language conventions documented in AGENTS.python.md, AGENTS.go.md,
AGENTS.typescript.md at the repo root — layering, tooling, and explicit
"what you may NOT do" lists. Root CLAUDE.md is prepended with the six
non-negotiable rules. Each of the 10 services gets a README.md.

scripts/check-loc.sh enforces soft 300 / hard 500 and surfaces the
current baseline of 205 hard + 161 soft violations so Phases 1-4 can
drain it incrementally. CI gates only CHANGED files in PRs so the
legacy baseline does not block unrelated work.

## Deprecation sweep

47 files. Pydantic V1 regex= -> pattern= (2 sites), class Config ->
ConfigDict in source_policy_router.py (schemas.py intentionally skipped;
it is the Phase 1 Step 3 split target). datetime.utcnow() ->
datetime.now(timezone.utc) everywhere including SQLAlchemy default=
callables. All DB columns already declare timezone=True, so this is a
latent-bug fix at the Python side, not a schema change.

DeprecationWarning count dropped from 158 to 35.

## Phase 1 Step 1 — Contract test harness

tests/contracts/test_openapi_baseline.py diffs the live FastAPI /openapi.json
against tests/contracts/openapi.baseline.json on every test run. Fails on
removed paths, removed status codes, or new required request body fields.
Regenerate only via tests/contracts/regenerate_baseline.py after a
consumer-updated contract change. This is the safety harness for all
subsequent refactor commits.

## Phase 1 Step 2 — models.py split (1466 -> 85 LOC shim)

compliance/db/models.py is decomposed into seven sibling aggregate modules
following the existing repo pattern (dsr_models.py, vvt_models.py, ...):

  regulation_models.py       (134) — Regulation, Requirement
  control_models.py          (279) — Control, Mapping, Evidence, Risk
  ai_system_models.py        (141) — AISystem, AuditExport
  service_module_models.py   (176) — ServiceModule, ModuleRegulation, ModuleRisk
  audit_session_models.py    (177) — AuditSession, AuditSignOff
  isms_governance_models.py  (323) — ISMSScope, Context, Policy, Objective, SoA
  isms_audit_models.py       (468) — Finding, CAPA, MgmtReview, InternalAudit,
                                     AuditTrail, Readiness

models.py becomes an 85-line re-export shim in dependency order so
existing imports continue to work unchanged. Schema is byte-identical:
__tablename__, column definitions, relationship strings, back_populates,
cascade directives all preserved.

All new sibling files are under the 500-line hard cap; largest is
isms_audit_models.py at 468. No file in compliance/db/ now exceeds
the hard cap.

## Phase 1 Step 3 — infrastructure only

backend-compliance/compliance/{schemas,domain,repositories}/ packages
are created as landing zones with docstrings. compliance/domain/
exports DomainError / NotFoundError / ConflictError / ValidationError /
PermissionError — the base classes services will use to raise
domain-level errors instead of HTTPException.

PHASE1_RUNBOOK.md at backend-compliance/PHASE1_RUNBOOK.md documents
the nine-step execution plan for Phase 1: snapshot baseline,
characterization tests, split models.py (this commit), split schemas.py
(next), extract services, extract repositories, mypy --strict, coverage.

## Verification

  backend-compliance/.venv-phase1: uv python install 3.12 + pip -r requirements.txt
  PYTHONPATH=. pytest compliance/tests/ tests/contracts/
  -> 173 passed, 0 failed, 35 warnings, OpenAPI 360/484 unchanged

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:18:29 +02:00

551 lines
18 KiB
Python

"""
FastAPI routes for VVT — Verzeichnis von Verarbeitungstaetigkeiten (Art. 30 DSGVO).
Endpoints:
GET /vvt/organization — Load organization header
PUT /vvt/organization — Save organization header
GET /vvt/activities — List activities (filter: status, business_function)
POST /vvt/activities — Create new activity
GET /vvt/activities/{id} — Get single activity
PUT /vvt/activities/{id} — Update activity
DELETE /vvt/activities/{id} — Delete activity
GET /vvt/audit-log — Audit trail (limit, offset)
GET /vvt/export — JSON export of all activities
GET /vvt/stats — Statistics
"""
import csv
import io
import logging
from datetime import datetime, timezone
from typing import Optional, List
from fastapi import APIRouter, Depends, HTTPException, Query, Request
from fastapi.responses import StreamingResponse
from sqlalchemy.orm import Session
from classroom_engine.database import get_db
from ..db.vvt_models import VVTOrganizationDB, VVTActivityDB, VVTAuditLogDB
from .schemas import (
VVTOrganizationUpdate, VVTOrganizationResponse,
VVTActivityCreate, VVTActivityUpdate, VVTActivityResponse,
VVTStatsResponse, VVTAuditLogEntry,
)
from .tenant_utils import get_tenant_id
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/vvt", tags=["compliance-vvt"])
def _log_audit(
db: Session,
tenant_id: str,
action: str,
entity_type: str,
entity_id=None,
changed_by: str = "system",
old_values=None,
new_values=None,
):
entry = VVTAuditLogDB(
tenant_id=tenant_id,
action=action,
entity_type=entity_type,
entity_id=entity_id,
changed_by=changed_by,
old_values=old_values,
new_values=new_values,
)
db.add(entry)
# ============================================================================
# Organization Header
# ============================================================================
@router.get("/organization", response_model=Optional[VVTOrganizationResponse])
async def get_organization(
tid: str = Depends(get_tenant_id),
db: Session = Depends(get_db),
):
"""Load the VVT organization header for the given tenant."""
org = (
db.query(VVTOrganizationDB)
.filter(VVTOrganizationDB.tenant_id == tid)
.order_by(VVTOrganizationDB.created_at)
.first()
)
if not org:
return None
return VVTOrganizationResponse(
id=str(org.id),
organization_name=org.organization_name,
industry=org.industry,
locations=org.locations or [],
employee_count=org.employee_count,
dpo_name=org.dpo_name,
dpo_contact=org.dpo_contact,
vvt_version=org.vvt_version or '1.0',
last_review_date=org.last_review_date,
next_review_date=org.next_review_date,
review_interval=org.review_interval or 'annual',
created_at=org.created_at,
updated_at=org.updated_at,
)
@router.put("/organization", response_model=VVTOrganizationResponse)
async def upsert_organization(
request: VVTOrganizationUpdate,
tid: str = Depends(get_tenant_id),
db: Session = Depends(get_db),
):
"""Create or update the VVT organization header."""
org = (
db.query(VVTOrganizationDB)
.filter(VVTOrganizationDB.tenant_id == tid)
.order_by(VVTOrganizationDB.created_at)
.first()
)
if not org:
data = request.dict(exclude_none=True)
if 'organization_name' not in data:
data['organization_name'] = 'Meine Organisation'
data['tenant_id'] = tid
org = VVTOrganizationDB(**data)
db.add(org)
else:
for field, value in request.dict(exclude_none=True).items():
setattr(org, field, value)
org.updated_at = datetime.now(timezone.utc)
db.commit()
db.refresh(org)
return VVTOrganizationResponse(
id=str(org.id),
organization_name=org.organization_name,
industry=org.industry,
locations=org.locations or [],
employee_count=org.employee_count,
dpo_name=org.dpo_name,
dpo_contact=org.dpo_contact,
vvt_version=org.vvt_version or '1.0',
last_review_date=org.last_review_date,
next_review_date=org.next_review_date,
review_interval=org.review_interval or 'annual',
created_at=org.created_at,
updated_at=org.updated_at,
)
# ============================================================================
# Activities
# ============================================================================
def _activity_to_response(act: VVTActivityDB) -> VVTActivityResponse:
return VVTActivityResponse(
id=str(act.id),
vvt_id=act.vvt_id,
name=act.name,
description=act.description,
purposes=act.purposes or [],
legal_bases=act.legal_bases or [],
data_subject_categories=act.data_subject_categories or [],
personal_data_categories=act.personal_data_categories or [],
recipient_categories=act.recipient_categories or [],
third_country_transfers=act.third_country_transfers or [],
retention_period=act.retention_period or {},
tom_description=act.tom_description,
business_function=act.business_function,
systems=act.systems or [],
deployment_model=act.deployment_model,
data_sources=act.data_sources or [],
data_flows=act.data_flows or [],
protection_level=act.protection_level or 'MEDIUM',
dpia_required=act.dpia_required or False,
structured_toms=act.structured_toms or {},
status=act.status or 'DRAFT',
responsible=act.responsible,
owner=act.owner,
last_reviewed_at=act.last_reviewed_at,
next_review_at=act.next_review_at,
created_by=act.created_by,
dsfa_id=str(act.dsfa_id) if act.dsfa_id else None,
created_at=act.created_at,
updated_at=act.updated_at,
)
@router.get("/activities", response_model=List[VVTActivityResponse])
async def list_activities(
status: Optional[str] = Query(None),
business_function: Optional[str] = Query(None),
search: Optional[str] = Query(None),
review_overdue: Optional[bool] = Query(None),
tid: str = Depends(get_tenant_id),
db: Session = Depends(get_db),
):
"""List all processing activities with optional filters."""
query = db.query(VVTActivityDB).filter(VVTActivityDB.tenant_id == tid)
if status:
query = query.filter(VVTActivityDB.status == status)
if business_function:
query = query.filter(VVTActivityDB.business_function == business_function)
if review_overdue:
now = datetime.now(timezone.utc)
query = query.filter(
VVTActivityDB.next_review_at.isnot(None),
VVTActivityDB.next_review_at < now,
)
if search:
term = f"%{search}%"
query = query.filter(
(VVTActivityDB.name.ilike(term)) |
(VVTActivityDB.description.ilike(term)) |
(VVTActivityDB.vvt_id.ilike(term))
)
activities = query.order_by(VVTActivityDB.created_at.desc()).all()
return [_activity_to_response(a) for a in activities]
@router.post("/activities", response_model=VVTActivityResponse, status_code=201)
async def create_activity(
request: VVTActivityCreate,
http_request: Request,
tid: str = Depends(get_tenant_id),
db: Session = Depends(get_db),
):
"""Create a new processing activity."""
# Check for duplicate vvt_id within tenant
existing = db.query(VVTActivityDB).filter(
VVTActivityDB.tenant_id == tid,
VVTActivityDB.vvt_id == request.vvt_id,
).first()
if existing:
raise HTTPException(
status_code=409,
detail=f"Activity with VVT-ID '{request.vvt_id}' already exists"
)
data = request.dict()
data['tenant_id'] = tid
# Set created_by from X-User-ID header if not provided in body
if not data.get('created_by'):
data['created_by'] = http_request.headers.get('X-User-ID', 'system')
act = VVTActivityDB(**data)
db.add(act)
db.flush() # get ID before audit log
_log_audit(
db,
tenant_id=tid,
action="CREATE",
entity_type="activity",
entity_id=act.id,
new_values={"vvt_id": act.vvt_id, "name": act.name, "status": act.status},
)
db.commit()
db.refresh(act)
return _activity_to_response(act)
@router.get("/activities/{activity_id}", response_model=VVTActivityResponse)
async def get_activity(
activity_id: str,
tid: str = Depends(get_tenant_id),
db: Session = Depends(get_db),
):
"""Get a single processing activity by ID."""
act = db.query(VVTActivityDB).filter(
VVTActivityDB.id == activity_id,
VVTActivityDB.tenant_id == tid,
).first()
if not act:
raise HTTPException(status_code=404, detail=f"Activity {activity_id} not found")
return _activity_to_response(act)
@router.put("/activities/{activity_id}", response_model=VVTActivityResponse)
async def update_activity(
activity_id: str,
request: VVTActivityUpdate,
tid: str = Depends(get_tenant_id),
db: Session = Depends(get_db),
):
"""Update a processing activity."""
act = db.query(VVTActivityDB).filter(
VVTActivityDB.id == activity_id,
VVTActivityDB.tenant_id == tid,
).first()
if not act:
raise HTTPException(status_code=404, detail=f"Activity {activity_id} not found")
old_values = {"name": act.name, "status": act.status}
updates = request.dict(exclude_none=True)
for field, value in updates.items():
setattr(act, field, value)
act.updated_at = datetime.now(timezone.utc)
_log_audit(
db,
tenant_id=tid,
action="UPDATE",
entity_type="activity",
entity_id=act.id,
old_values=old_values,
new_values=updates,
)
db.commit()
db.refresh(act)
return _activity_to_response(act)
@router.delete("/activities/{activity_id}")
async def delete_activity(
activity_id: str,
tid: str = Depends(get_tenant_id),
db: Session = Depends(get_db),
):
"""Delete a processing activity."""
act = db.query(VVTActivityDB).filter(
VVTActivityDB.id == activity_id,
VVTActivityDB.tenant_id == tid,
).first()
if not act:
raise HTTPException(status_code=404, detail=f"Activity {activity_id} not found")
_log_audit(
db,
tenant_id=tid,
action="DELETE",
entity_type="activity",
entity_id=act.id,
old_values={"vvt_id": act.vvt_id, "name": act.name},
)
db.delete(act)
db.commit()
return {"success": True, "message": f"Activity {activity_id} deleted"}
# ============================================================================
# Audit Log
# ============================================================================
@router.get("/audit-log", response_model=List[VVTAuditLogEntry])
async def get_audit_log(
limit: int = Query(50, ge=1, le=500),
offset: int = Query(0, ge=0),
tid: str = Depends(get_tenant_id),
db: Session = Depends(get_db),
):
"""Get the VVT audit trail."""
entries = (
db.query(VVTAuditLogDB)
.filter(VVTAuditLogDB.tenant_id == tid)
.order_by(VVTAuditLogDB.created_at.desc())
.offset(offset)
.limit(limit)
.all()
)
return [
VVTAuditLogEntry(
id=str(e.id),
action=e.action,
entity_type=e.entity_type,
entity_id=str(e.entity_id) if e.entity_id else None,
changed_by=e.changed_by,
old_values=e.old_values,
new_values=e.new_values,
created_at=e.created_at,
)
for e in entries
]
# ============================================================================
# Export & Stats
# ============================================================================
@router.get("/export")
async def export_activities(
format: str = Query("json", pattern="^(json|csv)$"),
tid: str = Depends(get_tenant_id),
db: Session = Depends(get_db),
):
"""Export all activities as JSON or CSV (semicolon-separated, DE locale)."""
org = (
db.query(VVTOrganizationDB)
.filter(VVTOrganizationDB.tenant_id == tid)
.order_by(VVTOrganizationDB.created_at)
.first()
)
activities = (
db.query(VVTActivityDB)
.filter(VVTActivityDB.tenant_id == tid)
.order_by(VVTActivityDB.created_at)
.all()
)
_log_audit(
db,
tenant_id=tid,
action="EXPORT",
entity_type="all_activities",
new_values={"count": len(activities), "format": format},
)
db.commit()
if format == "csv":
return _export_csv(activities)
return {
"exported_at": datetime.now(timezone.utc).isoformat(),
"organization": {
"name": org.organization_name if org else "",
"dpo_name": org.dpo_name if org else "",
"dpo_contact": org.dpo_contact if org else "",
"vvt_version": org.vvt_version if org else "1.0",
} if org else None,
"activities": [
{
"id": str(a.id),
"vvt_id": a.vvt_id,
"name": a.name,
"description": a.description,
"status": a.status,
"purposes": a.purposes,
"legal_bases": a.legal_bases,
"data_subject_categories": a.data_subject_categories,
"personal_data_categories": a.personal_data_categories,
"recipient_categories": a.recipient_categories,
"third_country_transfers": a.third_country_transfers,
"retention_period": a.retention_period,
"dpia_required": a.dpia_required,
"protection_level": a.protection_level,
"business_function": a.business_function,
"responsible": a.responsible,
"created_by": a.created_by,
"dsfa_id": str(a.dsfa_id) if a.dsfa_id else None,
"last_reviewed_at": a.last_reviewed_at.isoformat() if a.last_reviewed_at else None,
"next_review_at": a.next_review_at.isoformat() if a.next_review_at else None,
"created_at": a.created_at.isoformat(),
"updated_at": a.updated_at.isoformat() if a.updated_at else None,
}
for a in activities
],
}
def _export_csv(activities: list) -> StreamingResponse:
"""Generate semicolon-separated CSV with UTF-8 BOM for German Excel compatibility."""
output = io.StringIO()
# UTF-8 BOM for Excel
output.write('\ufeff')
writer = csv.writer(output, delimiter=';', quoting=csv.QUOTE_MINIMAL)
writer.writerow([
'ID', 'VVT-ID', 'Name', 'Zweck', 'Rechtsgrundlage',
'Datenkategorien', 'Betroffene', 'Empfaenger', 'Drittland',
'Aufbewahrung', 'Status', 'Verantwortlich', 'Erstellt von',
'Erstellt am',
])
for a in activities:
writer.writerow([
str(a.id),
a.vvt_id,
a.name,
'; '.join(a.purposes or []),
'; '.join(a.legal_bases or []),
'; '.join(a.personal_data_categories or []),
'; '.join(a.data_subject_categories or []),
'; '.join(a.recipient_categories or []),
'Ja' if a.third_country_transfers else 'Nein',
str(a.retention_period) if a.retention_period else '',
a.status or 'DRAFT',
a.responsible or '',
a.created_by or 'system',
a.created_at.strftime('%d.%m.%Y %H:%M') if a.created_at else '',
])
output.seek(0)
return StreamingResponse(
iter([output.getvalue()]),
media_type='text/csv; charset=utf-8',
headers={
'Content-Disposition': f'attachment; filename="vvt_export_{datetime.now(timezone.utc).strftime("%Y%m%d")}.csv"'
},
)
@router.get("/stats", response_model=VVTStatsResponse)
async def get_stats(
tid: str = Depends(get_tenant_id),
db: Session = Depends(get_db),
):
"""Get VVT statistics summary."""
activities = db.query(VVTActivityDB).filter(VVTActivityDB.tenant_id == tid).all()
by_status: dict = {}
by_bf: dict = {}
now = datetime.now(timezone.utc)
overdue_count = 0
for a in activities:
status = a.status or 'DRAFT'
bf = a.business_function or 'unknown'
by_status[status] = by_status.get(status, 0) + 1
by_bf[bf] = by_bf.get(bf, 0) + 1
if a.next_review_at and a.next_review_at < now:
overdue_count += 1
return VVTStatsResponse(
total=len(activities),
by_status=by_status,
by_business_function=by_bf,
dpia_required_count=sum(1 for a in activities if a.dpia_required),
third_country_count=sum(1 for a in activities if a.third_country_transfers),
draft_count=by_status.get('DRAFT', 0),
approved_count=by_status.get('APPROVED', 0),
overdue_review_count=overdue_count,
)
# ============================================================================
# Versioning
# ============================================================================
@router.get("/activities/{activity_id}/versions")
async def list_activity_versions(
activity_id: str,
tid: str = Depends(get_tenant_id),
db: Session = Depends(get_db),
):
"""List all versions for a VVT activity."""
from .versioning_utils import list_versions
return list_versions(db, "vvt_activity", activity_id, tid)
@router.get("/activities/{activity_id}/versions/{version_number}")
async def get_activity_version(
activity_id: str,
version_number: int,
tid: str = Depends(get_tenant_id),
db: Session = Depends(get_db),
):
"""Get a specific VVT activity version with full snapshot."""
from .versioning_utils import get_version
v = get_version(db, "vvt_activity", activity_id, version_number, tid)
if not v:
raise HTTPException(status_code=404, detail=f"Version {version_number} not found")
return v