Squash of branch refactor/phase0-guardrails-and-models-split — 4 commits,
81 files, 173/173 pytest green, OpenAPI contract preserved (360 paths /
484 operations).
## Phase 0 — Architecture guardrails
Three defense-in-depth layers to keep the architecture rules enforced
regardless of who opens Claude Code in this repo:
1. .claude/settings.json PreToolUse hook on Write/Edit blocks any file
that would exceed the 500-line hard cap. Auto-loads in every Claude
session in this repo.
2. scripts/githooks/pre-commit (install via scripts/install-hooks.sh)
enforces the LOC cap locally, freezes migrations/ without
[migration-approved], and protects guardrail files without
[guardrail-change].
3. .gitea/workflows/ci.yaml gains loc-budget + guardrail-integrity +
sbom-scan (syft+grype) jobs, adds mypy --strict for the new Python
packages (compliance/{services,repositories,domain,schemas}), and
tsc --noEmit for admin-compliance + developer-portal.
Per-language conventions documented in AGENTS.python.md, AGENTS.go.md,
AGENTS.typescript.md at the repo root — layering, tooling, and explicit
"what you may NOT do" lists. Root CLAUDE.md is prepended with the six
non-negotiable rules. Each of the 10 services gets a README.md.
scripts/check-loc.sh enforces soft 300 / hard 500 and surfaces the
current baseline of 205 hard + 161 soft violations so Phases 1-4 can
drain it incrementally. CI gates only CHANGED files in PRs so the
legacy baseline does not block unrelated work.
## Deprecation sweep
47 files. Pydantic V1 regex= -> pattern= (2 sites), class Config ->
ConfigDict in source_policy_router.py (schemas.py intentionally skipped;
it is the Phase 1 Step 3 split target). datetime.utcnow() ->
datetime.now(timezone.utc) everywhere including SQLAlchemy default=
callables. All DB columns already declare timezone=True, so this is a
latent-bug fix at the Python side, not a schema change.
DeprecationWarning count dropped from 158 to 35.
## Phase 1 Step 1 — Contract test harness
tests/contracts/test_openapi_baseline.py diffs the live FastAPI /openapi.json
against tests/contracts/openapi.baseline.json on every test run. Fails on
removed paths, removed status codes, or new required request body fields.
Regenerate only via tests/contracts/regenerate_baseline.py after a
consumer-updated contract change. This is the safety harness for all
subsequent refactor commits.
## Phase 1 Step 2 — models.py split (1466 -> 85 LOC shim)
compliance/db/models.py is decomposed into seven sibling aggregate modules
following the existing repo pattern (dsr_models.py, vvt_models.py, ...):
regulation_models.py (134) — Regulation, Requirement
control_models.py (279) — Control, Mapping, Evidence, Risk
ai_system_models.py (141) — AISystem, AuditExport
service_module_models.py (176) — ServiceModule, ModuleRegulation, ModuleRisk
audit_session_models.py (177) — AuditSession, AuditSignOff
isms_governance_models.py (323) — ISMSScope, Context, Policy, Objective, SoA
isms_audit_models.py (468) — Finding, CAPA, MgmtReview, InternalAudit,
AuditTrail, Readiness
models.py becomes an 85-line re-export shim in dependency order so
existing imports continue to work unchanged. Schema is byte-identical:
__tablename__, column definitions, relationship strings, back_populates,
cascade directives all preserved.
All new sibling files are under the 500-line hard cap; largest is
isms_audit_models.py at 468. No file in compliance/db/ now exceeds
the hard cap.
## Phase 1 Step 3 — infrastructure only
backend-compliance/compliance/{schemas,domain,repositories}/ packages
are created as landing zones with docstrings. compliance/domain/
exports DomainError / NotFoundError / ConflictError / ValidationError /
PermissionError — the base classes services will use to raise
domain-level errors instead of HTTPException.
PHASE1_RUNBOOK.md at backend-compliance/PHASE1_RUNBOOK.md documents
the nine-step execution plan for Phase 1: snapshot baseline,
characterization tests, split models.py (this commit), split schemas.py
(next), extract services, extract repositories, mypy --strict, coverage.
## Verification
backend-compliance/.venv-phase1: uv python install 3.12 + pip -r requirements.txt
PYTHONPATH=. pytest compliance/tests/ tests/contracts/
-> 173 passed, 0 failed, 35 warnings, OpenAPI 360/484 unchanged
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
182 lines
8.7 KiB
Markdown
182 lines
8.7 KiB
Markdown
# Phase 1 Runbook — backend-compliance refactor
|
|
|
|
This document is the step-by-step execution guide for Phase 1 of the repo refactor plan at `~/.claude/plans/vectorized-purring-barto.md`. It exists because the refactor must be driven from a session that can actually run `pytest` against the service, and every step must be verified green before moving to the next.
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.12 venv with `backend-compliance/requirements.txt` installed.
|
|
- Local Postgres reachable via `COMPLIANCE_DATABASE_URL` (use the compose db).
|
|
- Existing 48 pytest test files pass from a clean checkout: `pytest compliance/tests/ -v` → all green. **Do not proceed until this is true.**
|
|
|
|
## Step 0 — Record the baseline
|
|
|
|
```bash
|
|
cd backend-compliance
|
|
pytest compliance/tests/ -v --tb=short | tee /tmp/baseline.txt
|
|
pytest --cov=compliance --cov-report=term | tee /tmp/baseline-coverage.txt
|
|
python tests/contracts/regenerate_baseline.py # creates openapi.baseline.json
|
|
git add tests/contracts/openapi.baseline.json
|
|
git commit -m "phase1: pin OpenAPI baseline before refactor"
|
|
```
|
|
|
|
The baseline file is the contract. From this point forward, `pytest tests/contracts/` MUST stay green.
|
|
|
|
## Step 1 — Characterization tests (before any code move)
|
|
|
|
For each oversized route file we will refactor, add a happy-path + 1-error-path test **before** touching the source. These are called "characterization tests" and their purpose is to freeze current observable behavior so the refactor cannot change it silently.
|
|
|
|
Oversized route files to cover (ordered by size):
|
|
|
|
| File | LOC | Endpoints to cover |
|
|
|---|---:|---|
|
|
| `compliance/api/isms_routes.py` | 1676 | one happy + one 4xx per route |
|
|
| `compliance/api/dsr_routes.py` | 1176 | same |
|
|
| `compliance/api/vvt_routes.py` | *N* | same |
|
|
| `compliance/api/dsfa_routes.py` | *N* | same |
|
|
| `compliance/api/tom_routes.py` | *N* | same |
|
|
| `compliance/api/schemas.py` | 1899 | N/A (covered transitively) |
|
|
| `compliance/db/models.py` | 1466 | N/A (covered by existing + route tests) |
|
|
| `compliance/db/repository.py` | 1547 | add unit tests per repo class as they are extracted |
|
|
|
|
Use `httpx.AsyncClient` + factory fixtures; see `AGENTS.python.md`. Place under `tests/integration/test_<domain>_contract.py`.
|
|
|
|
Commit: `phase1: characterization tests for <domain> routes`.
|
|
|
|
## Step 2 — Split `compliance/db/models.py` (1466 → <500 per file)
|
|
|
|
⚠️ **Atomic step.** A `compliance/db/models/` package CANNOT coexist with the existing `compliance/db/models.py` module — Python's import system shadows the module with the package, breaking every `from compliance.db.models import X` call. The directory skeleton was intentionally NOT pre-created for this reason. Do the following in **one commit**:
|
|
|
|
1. Create `compliance/db/models/` directory with `__init__.py` (re-export shim — see template below).
|
|
2. Move aggregate model classes into `compliance/db/models/<aggregate>.py` modules.
|
|
3. Delete the old `compliance/db/models.py` file in the same commit.
|
|
|
|
Strategy uses a **re-export shim** so no import sites change:
|
|
|
|
1. For each aggregate, create `compliance/db/models/<aggregate>.py` containing the model classes. Copy verbatim; do not rename `__tablename__`, columns, or relationship strings.
|
|
2. Aggregate suggestions (verify by reading `models.py`):
|
|
- `dsr.py` (DSR requests, exports)
|
|
- `dsfa.py`
|
|
- `vvt.py`
|
|
- `tom.py`
|
|
- `ai.py` (AI systems, compliance checks)
|
|
- `consent.py`
|
|
- `evidence.py`
|
|
- `vendor.py`
|
|
- `audit.py`
|
|
- `policy.py`
|
|
- `project.py`
|
|
3. After every aggregate is moved, replace `compliance/db/models.py` with:
|
|
```python
|
|
"""Re-export shim — see compliance.db.models package."""
|
|
from compliance.db.models.dsr import * # noqa: F401,F403
|
|
from compliance.db.models.dsfa import * # noqa: F401,F403
|
|
# ... one per module
|
|
```
|
|
This keeps `from compliance.db.models import XYZ` working everywhere it's used today.
|
|
4. Run `pytest` after every move. Green → commit. Red → revert that move and investigate.
|
|
5. Existing aggregate-level files (`compliance/db/dsr_models.py`, `vvt_models.py`, `tom_models.py`, etc.) should be folded into the new `compliance/db/models/` package in the same pass — do not leave two parallel naming conventions.
|
|
|
|
**Do not** add `__init__.py` star-imports that change `Base.metadata` discovery order. Alembic's autogenerate depends on it. Verify via: `alembic check` if the env is set up.
|
|
|
|
## Step 3 — Split `compliance/api/schemas.py` (1899 → per domain)
|
|
|
|
Mirror the models split:
|
|
|
|
1. For each domain, create `compliance/schemas/<domain>.py` with the Pydantic models.
|
|
2. Replace `compliance/api/schemas.py` with a re-export shim.
|
|
3. Keep `Create`/`Update`/`Read` variants separated; do not merge them into unions.
|
|
4. Run `pytest` + contract test after each domain. Green → commit.
|
|
|
|
## Step 4 — Extract services (router → service delegation)
|
|
|
|
For each route file > 500 LOC, pull handler bodies into a service class under `compliance/services/<domain>_service.py` (new-style domain services, not the utility `compliance/services/` modules that already exist — consider renaming those to `compliance/services/_legacy/` if collisions arise).
|
|
|
|
Router handlers become:
|
|
|
|
```python
|
|
@router.post("/dsr/requests", response_model=DSRRequestRead, status_code=201)
|
|
async def create_dsr_request(
|
|
payload: DSRRequestCreate,
|
|
service: DSRService = Depends(get_dsr_service),
|
|
tenant_id: UUID = Depends(get_tenant_id),
|
|
) -> DSRRequestRead:
|
|
try:
|
|
return await service.create(tenant_id, payload)
|
|
except ConflictError as exc:
|
|
raise HTTPException(409, str(exc)) from exc
|
|
except NotFoundError as exc:
|
|
raise HTTPException(404, str(exc)) from exc
|
|
```
|
|
|
|
Rules:
|
|
- Handler body ≤ 30 LOC.
|
|
- Service raises domain errors (`compliance.domain`), never `HTTPException`.
|
|
- Inject service via `Depends` on a factory that wires the repository.
|
|
|
|
Run tests after each router is thinned. Contract test must stay green.
|
|
|
|
## Step 5 — Extract repositories
|
|
|
|
`compliance/db/repository.py` (1547) and `compliance/db/isms_repository.py` (838) split into:
|
|
|
|
```
|
|
compliance/repositories/
|
|
├── dsr_repository.py
|
|
├── dsfa_repository.py
|
|
├── vvt_repository.py
|
|
├── isms_repository.py # <500 LOC, split if needed
|
|
└── ...
|
|
```
|
|
|
|
Each repository class:
|
|
- Takes `AsyncSession` (or equivalent) in constructor.
|
|
- Exposes intent-named methods (`get_pending_for_tenant`, not `select_where`).
|
|
- Returns ORM instances or domain VOs. No `Row`.
|
|
- No business logic.
|
|
|
|
Unit-test every repo class against the compose Postgres with a transactional fixture (begin → rollback).
|
|
|
|
## Step 6 — mypy --strict on new packages
|
|
|
|
CI already runs `mypy --strict` against `compliance/{services,repositories,domain,schemas}/`. After every extraction, verify locally:
|
|
|
|
```bash
|
|
mypy --strict --ignore-missing-imports compliance/schemas compliance/repositories compliance/domain compliance/services
|
|
```
|
|
|
|
If you have type errors, fix them in the extracted module. **Do not** add `# type: ignore` blanket waivers. If a third-party lib is poorly typed, add it to `[mypy.overrides]` in `pyproject.toml`/`mypy.ini` with a one-line rationale.
|
|
|
|
## Step 7 — Expand test coverage
|
|
|
|
- Unit tests per service (mocked repo).
|
|
- Integration tests per repository (real db, transactional).
|
|
- Contract test stays green.
|
|
- Target: 80% coverage on new code. Never decrease the service baseline.
|
|
|
|
## Step 8 — Guardrail enforcement
|
|
|
|
After Phase 1 completes, `compliance/db/models.py`, `compliance/db/repository.py`, and `compliance/api/schemas.py` are either re-export shims (≤50 LOC each) or deleted. No file in `backend-compliance/compliance/` exceeds 500 LOC. Run:
|
|
|
|
```bash
|
|
../scripts/check-loc.sh backend-compliance/
|
|
```
|
|
|
|
Any remaining hard violations → document in `.claude/rules/loc-exceptions.txt` with rationale, or keep splitting.
|
|
|
|
## Done when
|
|
|
|
- `pytest compliance/tests/ tests/ -v` all green.
|
|
- `pytest tests/contracts/` green — OpenAPI has no removals, no renames, no new required request fields.
|
|
- Coverage ≥ baseline.
|
|
- `mypy --strict` clean on new packages.
|
|
- `scripts/check-loc.sh backend-compliance/` reports 0 hard violations in new/touched files (legacy allowlisted in `loc-exceptions.txt` only with rationale).
|
|
- CI all green on PR.
|
|
|
|
## Pitfalls
|
|
|
|
- **Do not change `__tablename__` or column names.** Even a rename breaks the DB contract.
|
|
- **Do not change relationship back_populates / backref strings.** SQLAlchemy resolves these by name at mapper configuration.
|
|
- **Do not change route paths or pydantic field names.** Contract test will catch most — but JSON field aliasing (`Field(alias=...)`) is easy to break accidentally.
|
|
- **Do not eagerly reformat unrelated code.** Keep the diff reviewable. One PR per major step.
|
|
- **Do not bypass the pre-commit hook.** If a file legitimately must be >500 LOC during an intermediate step, squash commits at the end so the final state is clean.
|