phase0: add architecture guardrails, CI gates, per-language AGENTS.md
Non-negotiable structural rules that apply to every Claude Code session in
this repo and to every commit, enforced via three defense-in-depth layers:
1. PreToolUse hook in .claude/settings.json blocks any Write/Edit that
would push a file past the 500-line hard cap. Auto-loads for any
Claude session in this repo regardless of who launched it.
2. scripts/githooks/pre-commit (installed via scripts/install-hooks.sh)
enforces the LOC cap, freezes migrations/ unless [migration-approved],
and protects guardrail files unless [guardrail-change] is present.
3. .gitea/workflows/ci.yaml gets loc-budget + guardrail-integrity jobs,
plus mypy --strict on new Python packages, tsc --noEmit on Node
services, and a syft+grype SBOM scan.
Per-language conventions are documented in AGENTS.python.md / AGENTS.go.md /
AGENTS.typescript.md at the repo root — layering (router->service->repo for
Python, hexagonal for Go, colocation for Next.js), tooling baseline, and
explicit "what you may NOT do" lists.
Adds scripts/check-loc.sh (soft 300 / hard 500, reports 205 hard and 161
soft violations in the current codebase) plus .claude/rules/loc-exceptions.txt
(initially empty — the list is designed to shrink over time).
Per-service READMEs for all 10 services + PHASE1_RUNBOOK.md for the
backend-compliance refactor. Skeleton packages (compliance/{domain,
repositories,schemas}) are the landing zone for the clean-arch rewrite that
begins in Phase 1.
CLAUDE.md is prepended with the six non-negotiable rules.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
181
backend-compliance/PHASE1_RUNBOOK.md
Normal file
181
backend-compliance/PHASE1_RUNBOOK.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# Phase 1 Runbook — backend-compliance refactor
|
||||
|
||||
This document is the step-by-step execution guide for Phase 1 of the repo refactor plan at `~/.claude/plans/vectorized-purring-barto.md`. It exists because the refactor must be driven from a session that can actually run `pytest` against the service, and every step must be verified green before moving to the next.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.12 venv with `backend-compliance/requirements.txt` installed.
|
||||
- Local Postgres reachable via `COMPLIANCE_DATABASE_URL` (use the compose db).
|
||||
- Existing 48 pytest test files pass from a clean checkout: `pytest compliance/tests/ -v` → all green. **Do not proceed until this is true.**
|
||||
|
||||
## Step 0 — Record the baseline
|
||||
|
||||
```bash
|
||||
cd backend-compliance
|
||||
pytest compliance/tests/ -v --tb=short | tee /tmp/baseline.txt
|
||||
pytest --cov=compliance --cov-report=term | tee /tmp/baseline-coverage.txt
|
||||
python tests/contracts/regenerate_baseline.py # creates openapi.baseline.json
|
||||
git add tests/contracts/openapi.baseline.json
|
||||
git commit -m "phase1: pin OpenAPI baseline before refactor"
|
||||
```
|
||||
|
||||
The baseline file is the contract. From this point forward, `pytest tests/contracts/` MUST stay green.
|
||||
|
||||
## Step 1 — Characterization tests (before any code move)
|
||||
|
||||
For each oversized route file we will refactor, add a happy-path + 1-error-path test **before** touching the source. These are called "characterization tests" and their purpose is to freeze current observable behavior so the refactor cannot change it silently.
|
||||
|
||||
Oversized route files to cover (ordered by size):
|
||||
|
||||
| File | LOC | Endpoints to cover |
|
||||
|---|---:|---|
|
||||
| `compliance/api/isms_routes.py` | 1676 | one happy + one 4xx per route |
|
||||
| `compliance/api/dsr_routes.py` | 1176 | same |
|
||||
| `compliance/api/vvt_routes.py` | *N* | same |
|
||||
| `compliance/api/dsfa_routes.py` | *N* | same |
|
||||
| `compliance/api/tom_routes.py` | *N* | same |
|
||||
| `compliance/api/schemas.py` | 1899 | N/A (covered transitively) |
|
||||
| `compliance/db/models.py` | 1466 | N/A (covered by existing + route tests) |
|
||||
| `compliance/db/repository.py` | 1547 | add unit tests per repo class as they are extracted |
|
||||
|
||||
Use `httpx.AsyncClient` + factory fixtures; see `AGENTS.python.md`. Place under `tests/integration/test_<domain>_contract.py`.
|
||||
|
||||
Commit: `phase1: characterization tests for <domain> routes`.
|
||||
|
||||
## Step 2 — Split `compliance/db/models.py` (1466 → <500 per file)
|
||||
|
||||
⚠️ **Atomic step.** A `compliance/db/models/` package CANNOT coexist with the existing `compliance/db/models.py` module — Python's import system shadows the module with the package, breaking every `from compliance.db.models import X` call. The directory skeleton was intentionally NOT pre-created for this reason. Do the following in **one commit**:
|
||||
|
||||
1. Create `compliance/db/models/` directory with `__init__.py` (re-export shim — see template below).
|
||||
2. Move aggregate model classes into `compliance/db/models/<aggregate>.py` modules.
|
||||
3. Delete the old `compliance/db/models.py` file in the same commit.
|
||||
|
||||
Strategy uses a **re-export shim** so no import sites change:
|
||||
|
||||
1. For each aggregate, create `compliance/db/models/<aggregate>.py` containing the model classes. Copy verbatim; do not rename `__tablename__`, columns, or relationship strings.
|
||||
2. Aggregate suggestions (verify by reading `models.py`):
|
||||
- `dsr.py` (DSR requests, exports)
|
||||
- `dsfa.py`
|
||||
- `vvt.py`
|
||||
- `tom.py`
|
||||
- `ai.py` (AI systems, compliance checks)
|
||||
- `consent.py`
|
||||
- `evidence.py`
|
||||
- `vendor.py`
|
||||
- `audit.py`
|
||||
- `policy.py`
|
||||
- `project.py`
|
||||
3. After every aggregate is moved, replace `compliance/db/models.py` with:
|
||||
```python
|
||||
"""Re-export shim — see compliance.db.models package."""
|
||||
from compliance.db.models.dsr import * # noqa: F401,F403
|
||||
from compliance.db.models.dsfa import * # noqa: F401,F403
|
||||
# ... one per module
|
||||
```
|
||||
This keeps `from compliance.db.models import XYZ` working everywhere it's used today.
|
||||
4. Run `pytest` after every move. Green → commit. Red → revert that move and investigate.
|
||||
5. Existing aggregate-level files (`compliance/db/dsr_models.py`, `vvt_models.py`, `tom_models.py`, etc.) should be folded into the new `compliance/db/models/` package in the same pass — do not leave two parallel naming conventions.
|
||||
|
||||
**Do not** add `__init__.py` star-imports that change `Base.metadata` discovery order. Alembic's autogenerate depends on it. Verify via: `alembic check` if the env is set up.
|
||||
|
||||
## Step 3 — Split `compliance/api/schemas.py` (1899 → per domain)
|
||||
|
||||
Mirror the models split:
|
||||
|
||||
1. For each domain, create `compliance/schemas/<domain>.py` with the Pydantic models.
|
||||
2. Replace `compliance/api/schemas.py` with a re-export shim.
|
||||
3. Keep `Create`/`Update`/`Read` variants separated; do not merge them into unions.
|
||||
4. Run `pytest` + contract test after each domain. Green → commit.
|
||||
|
||||
## Step 4 — Extract services (router → service delegation)
|
||||
|
||||
For each route file > 500 LOC, pull handler bodies into a service class under `compliance/services/<domain>_service.py` (new-style domain services, not the utility `compliance/services/` modules that already exist — consider renaming those to `compliance/services/_legacy/` if collisions arise).
|
||||
|
||||
Router handlers become:
|
||||
|
||||
```python
|
||||
@router.post("/dsr/requests", response_model=DSRRequestRead, status_code=201)
|
||||
async def create_dsr_request(
|
||||
payload: DSRRequestCreate,
|
||||
service: DSRService = Depends(get_dsr_service),
|
||||
tenant_id: UUID = Depends(get_tenant_id),
|
||||
) -> DSRRequestRead:
|
||||
try:
|
||||
return await service.create(tenant_id, payload)
|
||||
except ConflictError as exc:
|
||||
raise HTTPException(409, str(exc)) from exc
|
||||
except NotFoundError as exc:
|
||||
raise HTTPException(404, str(exc)) from exc
|
||||
```
|
||||
|
||||
Rules:
|
||||
- Handler body ≤ 30 LOC.
|
||||
- Service raises domain errors (`compliance.domain`), never `HTTPException`.
|
||||
- Inject service via `Depends` on a factory that wires the repository.
|
||||
|
||||
Run tests after each router is thinned. Contract test must stay green.
|
||||
|
||||
## Step 5 — Extract repositories
|
||||
|
||||
`compliance/db/repository.py` (1547) and `compliance/db/isms_repository.py` (838) split into:
|
||||
|
||||
```
|
||||
compliance/repositories/
|
||||
├── dsr_repository.py
|
||||
├── dsfa_repository.py
|
||||
├── vvt_repository.py
|
||||
├── isms_repository.py # <500 LOC, split if needed
|
||||
└── ...
|
||||
```
|
||||
|
||||
Each repository class:
|
||||
- Takes `AsyncSession` (or equivalent) in constructor.
|
||||
- Exposes intent-named methods (`get_pending_for_tenant`, not `select_where`).
|
||||
- Returns ORM instances or domain VOs. No `Row`.
|
||||
- No business logic.
|
||||
|
||||
Unit-test every repo class against the compose Postgres with a transactional fixture (begin → rollback).
|
||||
|
||||
## Step 6 — mypy --strict on new packages
|
||||
|
||||
CI already runs `mypy --strict` against `compliance/{services,repositories,domain,schemas}/`. After every extraction, verify locally:
|
||||
|
||||
```bash
|
||||
mypy --strict --ignore-missing-imports compliance/schemas compliance/repositories compliance/domain compliance/services
|
||||
```
|
||||
|
||||
If you have type errors, fix them in the extracted module. **Do not** add `# type: ignore` blanket waivers. If a third-party lib is poorly typed, add it to `[mypy.overrides]` in `pyproject.toml`/`mypy.ini` with a one-line rationale.
|
||||
|
||||
## Step 7 — Expand test coverage
|
||||
|
||||
- Unit tests per service (mocked repo).
|
||||
- Integration tests per repository (real db, transactional).
|
||||
- Contract test stays green.
|
||||
- Target: 80% coverage on new code. Never decrease the service baseline.
|
||||
|
||||
## Step 8 — Guardrail enforcement
|
||||
|
||||
After Phase 1 completes, `compliance/db/models.py`, `compliance/db/repository.py`, and `compliance/api/schemas.py` are either re-export shims (≤50 LOC each) or deleted. No file in `backend-compliance/compliance/` exceeds 500 LOC. Run:
|
||||
|
||||
```bash
|
||||
../scripts/check-loc.sh backend-compliance/
|
||||
```
|
||||
|
||||
Any remaining hard violations → document in `.claude/rules/loc-exceptions.txt` with rationale, or keep splitting.
|
||||
|
||||
## Done when
|
||||
|
||||
- `pytest compliance/tests/ tests/ -v` all green.
|
||||
- `pytest tests/contracts/` green — OpenAPI has no removals, no renames, no new required request fields.
|
||||
- Coverage ≥ baseline.
|
||||
- `mypy --strict` clean on new packages.
|
||||
- `scripts/check-loc.sh backend-compliance/` reports 0 hard violations in new/touched files (legacy allowlisted in `loc-exceptions.txt` only with rationale).
|
||||
- CI all green on PR.
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **Do not change `__tablename__` or column names.** Even a rename breaks the DB contract.
|
||||
- **Do not change relationship back_populates / backref strings.** SQLAlchemy resolves these by name at mapper configuration.
|
||||
- **Do not change route paths or pydantic field names.** Contract test will catch most — but JSON field aliasing (`Field(alias=...)`) is easy to break accidentally.
|
||||
- **Do not eagerly reformat unrelated code.** Keep the diff reviewable. One PR per major step.
|
||||
- **Do not bypass the pre-commit hook.** If a file legitimately must be >500 LOC during an intermediate step, squash commits at the end so the final state is clean.
|
||||
55
backend-compliance/README.md
Normal file
55
backend-compliance/README.md
Normal file
@@ -0,0 +1,55 @@
|
||||
# backend-compliance
|
||||
|
||||
Python/FastAPI service implementing the DSGVO compliance API: DSR, DSFA, consent, controls, risks, evidence, audit, vendor management, ISMS, change requests, document generation.
|
||||
|
||||
**Port:** `8002` (container: `bp-compliance-backend`)
|
||||
**Stack:** Python 3.12, FastAPI, SQLAlchemy 2.x, Alembic, Keycloak auth.
|
||||
|
||||
## Architecture (target — Phase 1)
|
||||
|
||||
```
|
||||
compliance/
|
||||
├── api/ # Routers (thin, ≤30 LOC per handler)
|
||||
├── services/ # Business logic
|
||||
├── repositories/ # DB access
|
||||
├── domain/ # Value objects, domain errors
|
||||
├── schemas/ # Pydantic models, split per domain
|
||||
└── db/models/ # SQLAlchemy ORM, one module per aggregate
|
||||
```
|
||||
|
||||
See `../AGENTS.python.md` for the full convention and `../.claude/rules/architecture.md` for the non-negotiable rules.
|
||||
|
||||
## Run locally
|
||||
|
||||
```bash
|
||||
cd backend-compliance
|
||||
pip install -r requirements.txt
|
||||
export COMPLIANCE_DATABASE_URL=... # Postgres (Hetzner or local)
|
||||
uvicorn main:app --reload --port 8002
|
||||
```
|
||||
|
||||
## Tests
|
||||
|
||||
```bash
|
||||
pytest compliance/tests/ -v
|
||||
pytest --cov=compliance --cov-report=term-missing
|
||||
```
|
||||
|
||||
Layout: `tests/unit/`, `tests/integration/`, `tests/contracts/`. Contract tests diff `/openapi.json` against `tests/contracts/openapi.baseline.json`.
|
||||
|
||||
## Public API surface
|
||||
|
||||
404+ endpoints across `/api/v1/*`. Grouped by domain: `ai`, `audit`, `consent`, `dsfa`, `dsr`, `gdpr`, `vendor`, `evidence`, `change-requests`, `generation`, `projects`, `company-profile`, `isms`. Every path is a contract — see the "Public endpoints" rule in the root `CLAUDE.md`.
|
||||
|
||||
## Environment
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `COMPLIANCE_DATABASE_URL` | Postgres DSN, `sslmode=require` |
|
||||
| `KEYCLOAK_*` | Auth verification |
|
||||
| `QDRANT_URL`, `QDRANT_API_KEY` | Vector search |
|
||||
| `CORE_VALKEY_URL` | Session cache |
|
||||
|
||||
## Don't touch
|
||||
|
||||
Database schema, `__tablename__`, column names, existing migrations under `migrations/`. See root `CLAUDE.md` rule 3.
|
||||
30
backend-compliance/compliance/domain/__init__.py
Normal file
30
backend-compliance/compliance/domain/__init__.py
Normal file
@@ -0,0 +1,30 @@
|
||||
"""Domain layer: value objects, enums, and domain exceptions.
|
||||
|
||||
Pure Python — no FastAPI, no SQLAlchemy, no HTTP concerns. Upper layers depend on
|
||||
this package; it depends on nothing except the standard library and small libraries
|
||||
like ``pydantic`` or ``attrs``.
|
||||
"""
|
||||
|
||||
|
||||
class DomainError(Exception):
|
||||
"""Base class for all domain-level errors.
|
||||
|
||||
Services raise subclasses of this; the HTTP layer is responsible for mapping
|
||||
them to status codes. Never raise ``HTTPException`` from a service.
|
||||
"""
|
||||
|
||||
|
||||
class NotFoundError(DomainError):
|
||||
"""Requested entity does not exist."""
|
||||
|
||||
|
||||
class ConflictError(DomainError):
|
||||
"""Operation conflicts with the current state (e.g. duplicate, stale version)."""
|
||||
|
||||
|
||||
class ValidationError(DomainError):
|
||||
"""Input failed domain-level validation (beyond what Pydantic catches)."""
|
||||
|
||||
|
||||
class PermissionError(DomainError):
|
||||
"""Caller lacks permission for the operation."""
|
||||
10
backend-compliance/compliance/repositories/__init__.py
Normal file
10
backend-compliance/compliance/repositories/__init__.py
Normal file
@@ -0,0 +1,10 @@
|
||||
"""Repository layer: database access.
|
||||
|
||||
Each aggregate gets its own module (e.g. ``dsr_repository.py``) exposing a single
|
||||
class with intent-named methods. Repositories own SQLAlchemy session usage; they
|
||||
do not run business logic, and they do not import anything from
|
||||
``compliance.api`` or ``compliance.services``.
|
||||
|
||||
Phase 1 refactor target: ``compliance.db.repository`` (1547 lines) is being
|
||||
decomposed into per-aggregate modules under this package.
|
||||
"""
|
||||
11
backend-compliance/compliance/schemas/__init__.py
Normal file
11
backend-compliance/compliance/schemas/__init__.py
Normal file
@@ -0,0 +1,11 @@
|
||||
"""Pydantic schemas, split per domain.
|
||||
|
||||
Phase 1 refactor target: the monolithic ``compliance.api.schemas`` module (1899 lines)
|
||||
is being decomposed into one module per domain under this package. Until every domain
|
||||
has been migrated, ``compliance.api.schemas`` re-exports from here so existing imports
|
||||
continue to work unchanged.
|
||||
|
||||
New code MUST import from the specific domain module (e.g.
|
||||
``from compliance.schemas.dsr import DSRRequestCreate``) rather than from
|
||||
``compliance.api.schemas``.
|
||||
"""
|
||||
Reference in New Issue
Block a user