phase0: add architecture guardrails, CI gates, per-language AGENTS.md

Non-negotiable structural rules that apply to every Claude Code session in
this repo and to every commit, enforced via three defense-in-depth layers:

  1. PreToolUse hook in .claude/settings.json blocks any Write/Edit that
     would push a file past the 500-line hard cap. Auto-loads for any
     Claude session in this repo regardless of who launched it.
  2. scripts/githooks/pre-commit (installed via scripts/install-hooks.sh)
     enforces the LOC cap, freezes migrations/ unless [migration-approved],
     and protects guardrail files unless [guardrail-change] is present.
  3. .gitea/workflows/ci.yaml gets loc-budget + guardrail-integrity jobs,
     plus mypy --strict on new Python packages, tsc --noEmit on Node
     services, and a syft+grype SBOM scan.

Per-language conventions are documented in AGENTS.python.md / AGENTS.go.md /
AGENTS.typescript.md at the repo root — layering (router->service->repo for
Python, hexagonal for Go, colocation for Next.js), tooling baseline, and
explicit "what you may NOT do" lists.

Adds scripts/check-loc.sh (soft 300 / hard 500, reports 205 hard and 161
soft violations in the current codebase) plus .claude/rules/loc-exceptions.txt
(initially empty — the list is designed to shrink over time).

Per-service READMEs for all 10 services + PHASE1_RUNBOOK.md for the
backend-compliance refactor. Skeleton packages (compliance/{domain,
repositories,schemas}) are the landing zone for the clean-arch rewrite that
begins in Phase 1.

CLAUDE.md is prepended with the six non-negotiable rules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Sharang Parnerkar
2026-04-07 13:09:26 +02:00
parent 1dfea51919
commit 512b7a0f6c
25 changed files with 1308 additions and 10 deletions

View File

@@ -0,0 +1,181 @@
# Phase 1 Runbook — backend-compliance refactor
This document is the step-by-step execution guide for Phase 1 of the repo refactor plan at `~/.claude/plans/vectorized-purring-barto.md`. It exists because the refactor must be driven from a session that can actually run `pytest` against the service, and every step must be verified green before moving to the next.
## Prerequisites
- Python 3.12 venv with `backend-compliance/requirements.txt` installed.
- Local Postgres reachable via `COMPLIANCE_DATABASE_URL` (use the compose db).
- Existing 48 pytest test files pass from a clean checkout: `pytest compliance/tests/ -v` → all green. **Do not proceed until this is true.**
## Step 0 — Record the baseline
```bash
cd backend-compliance
pytest compliance/tests/ -v --tb=short | tee /tmp/baseline.txt
pytest --cov=compliance --cov-report=term | tee /tmp/baseline-coverage.txt
python tests/contracts/regenerate_baseline.py # creates openapi.baseline.json
git add tests/contracts/openapi.baseline.json
git commit -m "phase1: pin OpenAPI baseline before refactor"
```
The baseline file is the contract. From this point forward, `pytest tests/contracts/` MUST stay green.
## Step 1 — Characterization tests (before any code move)
For each oversized route file we will refactor, add a happy-path + 1-error-path test **before** touching the source. These are called "characterization tests" and their purpose is to freeze current observable behavior so the refactor cannot change it silently.
Oversized route files to cover (ordered by size):
| File | LOC | Endpoints to cover |
|---|---:|---|
| `compliance/api/isms_routes.py` | 1676 | one happy + one 4xx per route |
| `compliance/api/dsr_routes.py` | 1176 | same |
| `compliance/api/vvt_routes.py` | *N* | same |
| `compliance/api/dsfa_routes.py` | *N* | same |
| `compliance/api/tom_routes.py` | *N* | same |
| `compliance/api/schemas.py` | 1899 | N/A (covered transitively) |
| `compliance/db/models.py` | 1466 | N/A (covered by existing + route tests) |
| `compliance/db/repository.py` | 1547 | add unit tests per repo class as they are extracted |
Use `httpx.AsyncClient` + factory fixtures; see `AGENTS.python.md`. Place under `tests/integration/test_<domain>_contract.py`.
Commit: `phase1: characterization tests for <domain> routes`.
## Step 2 — Split `compliance/db/models.py` (1466 → <500 per file)
⚠️ **Atomic step.** A `compliance/db/models/` package CANNOT coexist with the existing `compliance/db/models.py` module — Python's import system shadows the module with the package, breaking every `from compliance.db.models import X` call. The directory skeleton was intentionally NOT pre-created for this reason. Do the following in **one commit**:
1. Create `compliance/db/models/` directory with `__init__.py` (re-export shim — see template below).
2. Move aggregate model classes into `compliance/db/models/<aggregate>.py` modules.
3. Delete the old `compliance/db/models.py` file in the same commit.
Strategy uses a **re-export shim** so no import sites change:
1. For each aggregate, create `compliance/db/models/<aggregate>.py` containing the model classes. Copy verbatim; do not rename `__tablename__`, columns, or relationship strings.
2. Aggregate suggestions (verify by reading `models.py`):
- `dsr.py` (DSR requests, exports)
- `dsfa.py`
- `vvt.py`
- `tom.py`
- `ai.py` (AI systems, compliance checks)
- `consent.py`
- `evidence.py`
- `vendor.py`
- `audit.py`
- `policy.py`
- `project.py`
3. After every aggregate is moved, replace `compliance/db/models.py` with:
```python
"""Re-export shim — see compliance.db.models package."""
from compliance.db.models.dsr import * # noqa: F401,F403
from compliance.db.models.dsfa import * # noqa: F401,F403
# ... one per module
```
This keeps `from compliance.db.models import XYZ` working everywhere it's used today.
4. Run `pytest` after every move. Green → commit. Red → revert that move and investigate.
5. Existing aggregate-level files (`compliance/db/dsr_models.py`, `vvt_models.py`, `tom_models.py`, etc.) should be folded into the new `compliance/db/models/` package in the same pass — do not leave two parallel naming conventions.
**Do not** add `__init__.py` star-imports that change `Base.metadata` discovery order. Alembic's autogenerate depends on it. Verify via: `alembic check` if the env is set up.
## Step 3 — Split `compliance/api/schemas.py` (1899 → per domain)
Mirror the models split:
1. For each domain, create `compliance/schemas/<domain>.py` with the Pydantic models.
2. Replace `compliance/api/schemas.py` with a re-export shim.
3. Keep `Create`/`Update`/`Read` variants separated; do not merge them into unions.
4. Run `pytest` + contract test after each domain. Green → commit.
## Step 4 — Extract services (router → service delegation)
For each route file > 500 LOC, pull handler bodies into a service class under `compliance/services/<domain>_service.py` (new-style domain services, not the utility `compliance/services/` modules that already exist — consider renaming those to `compliance/services/_legacy/` if collisions arise).
Router handlers become:
```python
@router.post("/dsr/requests", response_model=DSRRequestRead, status_code=201)
async def create_dsr_request(
payload: DSRRequestCreate,
service: DSRService = Depends(get_dsr_service),
tenant_id: UUID = Depends(get_tenant_id),
) -> DSRRequestRead:
try:
return await service.create(tenant_id, payload)
except ConflictError as exc:
raise HTTPException(409, str(exc)) from exc
except NotFoundError as exc:
raise HTTPException(404, str(exc)) from exc
```
Rules:
- Handler body ≤ 30 LOC.
- Service raises domain errors (`compliance.domain`), never `HTTPException`.
- Inject service via `Depends` on a factory that wires the repository.
Run tests after each router is thinned. Contract test must stay green.
## Step 5 — Extract repositories
`compliance/db/repository.py` (1547) and `compliance/db/isms_repository.py` (838) split into:
```
compliance/repositories/
├── dsr_repository.py
├── dsfa_repository.py
├── vvt_repository.py
├── isms_repository.py # <500 LOC, split if needed
└── ...
```
Each repository class:
- Takes `AsyncSession` (or equivalent) in constructor.
- Exposes intent-named methods (`get_pending_for_tenant`, not `select_where`).
- Returns ORM instances or domain VOs. No `Row`.
- No business logic.
Unit-test every repo class against the compose Postgres with a transactional fixture (begin → rollback).
## Step 6 — mypy --strict on new packages
CI already runs `mypy --strict` against `compliance/{services,repositories,domain,schemas}/`. After every extraction, verify locally:
```bash
mypy --strict --ignore-missing-imports compliance/schemas compliance/repositories compliance/domain compliance/services
```
If you have type errors, fix them in the extracted module. **Do not** add `# type: ignore` blanket waivers. If a third-party lib is poorly typed, add it to `[mypy.overrides]` in `pyproject.toml`/`mypy.ini` with a one-line rationale.
## Step 7 — Expand test coverage
- Unit tests per service (mocked repo).
- Integration tests per repository (real db, transactional).
- Contract test stays green.
- Target: 80% coverage on new code. Never decrease the service baseline.
## Step 8 — Guardrail enforcement
After Phase 1 completes, `compliance/db/models.py`, `compliance/db/repository.py`, and `compliance/api/schemas.py` are either re-export shims (≤50 LOC each) or deleted. No file in `backend-compliance/compliance/` exceeds 500 LOC. Run:
```bash
../scripts/check-loc.sh backend-compliance/
```
Any remaining hard violations → document in `.claude/rules/loc-exceptions.txt` with rationale, or keep splitting.
## Done when
- `pytest compliance/tests/ tests/ -v` all green.
- `pytest tests/contracts/` green — OpenAPI has no removals, no renames, no new required request fields.
- Coverage ≥ baseline.
- `mypy --strict` clean on new packages.
- `scripts/check-loc.sh backend-compliance/` reports 0 hard violations in new/touched files (legacy allowlisted in `loc-exceptions.txt` only with rationale).
- CI all green on PR.
## Pitfalls
- **Do not change `__tablename__` or column names.** Even a rename breaks the DB contract.
- **Do not change relationship back_populates / backref strings.** SQLAlchemy resolves these by name at mapper configuration.
- **Do not change route paths or pydantic field names.** Contract test will catch most — but JSON field aliasing (`Field(alias=...)`) is easy to break accidentally.
- **Do not eagerly reformat unrelated code.** Keep the diff reviewable. One PR per major step.
- **Do not bypass the pre-commit hook.** If a file legitimately must be >500 LOC during an intermediate step, squash commits at the end so the final state is clean.

View File

@@ -0,0 +1,55 @@
# backend-compliance
Python/FastAPI service implementing the DSGVO compliance API: DSR, DSFA, consent, controls, risks, evidence, audit, vendor management, ISMS, change requests, document generation.
**Port:** `8002` (container: `bp-compliance-backend`)
**Stack:** Python 3.12, FastAPI, SQLAlchemy 2.x, Alembic, Keycloak auth.
## Architecture (target — Phase 1)
```
compliance/
├── api/ # Routers (thin, ≤30 LOC per handler)
├── services/ # Business logic
├── repositories/ # DB access
├── domain/ # Value objects, domain errors
├── schemas/ # Pydantic models, split per domain
└── db/models/ # SQLAlchemy ORM, one module per aggregate
```
See `../AGENTS.python.md` for the full convention and `../.claude/rules/architecture.md` for the non-negotiable rules.
## Run locally
```bash
cd backend-compliance
pip install -r requirements.txt
export COMPLIANCE_DATABASE_URL=... # Postgres (Hetzner or local)
uvicorn main:app --reload --port 8002
```
## Tests
```bash
pytest compliance/tests/ -v
pytest --cov=compliance --cov-report=term-missing
```
Layout: `tests/unit/`, `tests/integration/`, `tests/contracts/`. Contract tests diff `/openapi.json` against `tests/contracts/openapi.baseline.json`.
## Public API surface
404+ endpoints across `/api/v1/*`. Grouped by domain: `ai`, `audit`, `consent`, `dsfa`, `dsr`, `gdpr`, `vendor`, `evidence`, `change-requests`, `generation`, `projects`, `company-profile`, `isms`. Every path is a contract — see the "Public endpoints" rule in the root `CLAUDE.md`.
## Environment
| Var | Purpose |
|-----|---------|
| `COMPLIANCE_DATABASE_URL` | Postgres DSN, `sslmode=require` |
| `KEYCLOAK_*` | Auth verification |
| `QDRANT_URL`, `QDRANT_API_KEY` | Vector search |
| `CORE_VALKEY_URL` | Session cache |
## Don't touch
Database schema, `__tablename__`, column names, existing migrations under `migrations/`. See root `CLAUDE.md` rule 3.

View File

@@ -0,0 +1,30 @@
"""Domain layer: value objects, enums, and domain exceptions.
Pure Python — no FastAPI, no SQLAlchemy, no HTTP concerns. Upper layers depend on
this package; it depends on nothing except the standard library and small libraries
like ``pydantic`` or ``attrs``.
"""
class DomainError(Exception):
"""Base class for all domain-level errors.
Services raise subclasses of this; the HTTP layer is responsible for mapping
them to status codes. Never raise ``HTTPException`` from a service.
"""
class NotFoundError(DomainError):
"""Requested entity does not exist."""
class ConflictError(DomainError):
"""Operation conflicts with the current state (e.g. duplicate, stale version)."""
class ValidationError(DomainError):
"""Input failed domain-level validation (beyond what Pydantic catches)."""
class PermissionError(DomainError):
"""Caller lacks permission for the operation."""

View File

@@ -0,0 +1,10 @@
"""Repository layer: database access.
Each aggregate gets its own module (e.g. ``dsr_repository.py``) exposing a single
class with intent-named methods. Repositories own SQLAlchemy session usage; they
do not run business logic, and they do not import anything from
``compliance.api`` or ``compliance.services``.
Phase 1 refactor target: ``compliance.db.repository`` (1547 lines) is being
decomposed into per-aggregate modules under this package.
"""

View File

@@ -0,0 +1,11 @@
"""Pydantic schemas, split per domain.
Phase 1 refactor target: the monolithic ``compliance.api.schemas`` module (1899 lines)
is being decomposed into one module per domain under this package. Until every domain
has been migrated, ``compliance.api.schemas`` re-exports from here so existing imports
continue to work unchanged.
New code MUST import from the specific domain module (e.g.
``from compliance.schemas.dsr import DSRRequestCreate``) rather than from
``compliance.api.schemas``.
"""