diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 724410e..89d4317 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -12,6 +12,33 @@ +## First-Time Setup & Claude Code Onboarding + +**For humans:** Read this CLAUDE.md top to bottom before your first commit. Then read `AGENTS..md` for the service you are working on (`AGENTS.python.md`, `AGENTS.go.md`, or `AGENTS.typescript.md`). + +**For Claude Code sessions — things that cause first-commit failures:** + +1. **Wrong branch.** Run `git branch --show-current` before touching any file. The answer must be `coolify`. If it is `main`, run `git checkout coolify` before proceeding. + +2. **PreToolUse hook blocks your write.** The `PreToolUse` hooks in `.claude/settings.json` will reject Write/Edit operations on any file that would push its line count past 500. This is intentional — split the file into smaller modules instead of trying to bypass the hook. + +3. **Missing `[guardrail-change]` marker.** The `guardrail-integrity` CI job fails if you modify a guardrail file without the marker in the commit message body. See the table below. + +4. **Never `git add -A` or `git add .`.** Stage files individually by path. `git add -A` risks committing `.env`, `node_modules/`, `.next/`, compiled binaries, and other artifacts that must never enter the repo. + +5. **LOC check before push.** After any session, run `bash scripts/check-loc.sh`. It must exit 0 before you push. The git pre-commit hook runs this automatically, but run it manually first to catch issues early. + +### Commit message quick reference + +| Marker | Required when touching | +|--------|----------------------| +| `[guardrail-change]` | `.claude/settings.json`, `scripts/check-loc.sh`, `scripts/githooks/pre-commit`, `.claude/rules/loc-exceptions.txt`, any `AGENTS.*.md` | +| `[migration-approved]` | Anything under `migrations/` or `alembic/versions/` | + +Add the marker anywhere in the commit message body or footer — the CI job does a plain-text grep for it. + +--- + ## Entwicklungsumgebung (WICHTIG - IMMER ZUERST LESEN) ### Zwei-Rechner-Setup + Coolify diff --git a/.gitignore b/.gitignore index a8dffe4..4f44fe2 100644 --- a/.gitignore +++ b/.gitignore @@ -11,6 +11,10 @@ secrets/ # Node node_modules/ .next/ +dist/ +.turbo/ +pnpm-lock.yaml +.pnpm-store/ # Python __pycache__/ diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..92edbfd --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,203 @@ +# Contributing to breakpilot-compliance + +--- + +## 1. Getting Started + +```bash +git clone https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance.git +cd breakpilot-compliance +git checkout coolify # always base work off coolify, NOT main +``` + +**Branch conventions** (branch from `coolify`): + +| Prefix | Use for | +|--------|---------| +| `feature/` | New functionality | +| `fix/` | Bug fixes | +| `chore/` | Tooling, deps, CI, docs | + +Example: `git checkout -b feature/ai-sdk-risk-scoring` + +--- + +## 2. Dev Environment + +Each service runs independently. Start only what you need. + +**Go — ai-compliance-sdk** +```bash +cd ai-compliance-sdk +go run ./cmd/server +``` + +**Python — backend-compliance** +```bash +cd backend-compliance +pip install -r requirements.txt +uvicorn main:app --reload +``` + +**Python — dsms-gateway / document-crawler / compliance-tts-service** +```bash +cd +pip install -r requirements.txt +uvicorn main:app --reload --port +``` + +**Node.js — admin-compliance** +```bash +cd admin-compliance +npm install +npm run dev # http://localhost:3007 +``` + +**Node.js — developer-portal** +```bash +cd developer-portal +npm install +npm run dev # http://localhost:3006 +``` + +**All services together (local Docker)** +```bash +docker compose up -d +``` + +Config lives in `.env` (not committed). Copy `.env.example` and fill in `COMPLIANCE_DATABASE_URL`, `QDRANT_URL`, `QDRANT_API_KEY`, and Vault tokens. + +--- + +## 3. Before Your First Commit + +Run all of these locally. CI will run the same checks and fail if they don't pass. + +**LOC budget (mandatory)** +```bash +bash scripts/check-loc.sh # must exit 0 +``` + +**Go lint** +```bash +cd ai-compliance-sdk +golangci-lint run --timeout 5m ./... +``` + +**Python lint** +```bash +cd backend-compliance +ruff check . +mypy compliance/ # only if mypy.ini exists +``` + +**TypeScript type-check** +```bash +cd admin-compliance +npx tsc --noEmit +``` + +**Tests** +```bash +# Go +cd ai-compliance-sdk && go test ./... + +# Python backend +cd backend-compliance && pytest + +# DSMS gateway +cd dsms-gateway && pytest test_main.py +``` + +If any step fails, fix it before committing. The git pre-commit hook re-runs `check-loc.sh` automatically. + +--- + +## 4. Commit Message Rules + +Use [Conventional Commits](https://www.conventionalcommits.org/) style: + +``` +(): + +[optional body] +[optional footer] +``` + +Types: `feat`, `fix`, `chore`, `refactor`, `test`, `docs`, `ci`. + +### `[guardrail-change]` marker — REQUIRED + +Add `[guardrail-change]` anywhere in the commit message body (or footer) when your changeset touches **any** of these files: + +| File / path | Reason protected | +|-------------|-----------------| +| `.claude/settings.json` | PreToolUse/PostToolUse hooks | +| `scripts/check-loc.sh` | LOC enforcement script | +| `scripts/githooks/pre-commit` | Git hook | +| `.claude/rules/loc-exceptions.txt` | Exception registry | +| `AGENTS.*.md` (any) | Per-language architecture rules | + +The `guardrail-integrity` CI job checks for this marker and **fails the build** if it is missing. + +**Valid guardrail commit example:** +``` +chore(guardrail): add exception for generated protobuf file + +proto/generated/compliance.pb.go exceeds 500 LOC because it is +machine-generated and cannot be split. Added to loc-exceptions.txt +with rationale. + +[guardrail-change] +``` + +--- + +## 5. Architecture Rules (Non-Negotiable) + +### File budget +- **500 LOC hard cap** on every non-test, non-generated source file. +- The `PreToolUse` hook in `.claude/settings.json` blocks Claude Code from creating or editing files that would breach this limit. +- Exceptions require a written rationale in `.claude/rules/loc-exceptions.txt` plus `[guardrail-change]` in the commit. + +### Clean architecture per service +- Python (FastAPI): `api → services → repositories → db.models`. Handlers ≤ 30 LOC. See `AGENTS.python.md`. +- Go (Gin): Standard Go Project Layout + hexagonal. `cmd/` is thin wiring. See `AGENTS.go.md`. +- TypeScript (Next.js 15): server-first, push client boundary deep, colocate `_components/` + `_hooks/` per route. See `AGENTS.typescript.md`. + +### Database is frozen +- No new Alembic migrations, no `ALTER TABLE`, no `__tablename__` or column renames. +- The pre-commit hook blocks any change under `migrations/` or `alembic/versions/` unless the commit message contains `[migration-approved]`. + +### Public endpoints are a contract +- Any change to a route path, HTTP method, status code, request schema, or response schema in `backend-compliance/`, `ai-compliance-sdk/`, `dsms-gateway/`, `document-crawler/`, or `compliance-tts-service/` **must** be accompanied by a matching update in every consumer (`admin-compliance/`, `developer-portal/`, `breakpilot-compliance-sdk/`, `consent-sdk/`) in the **same changeset**. +- OpenAPI baseline snapshots live in `tests/contracts/`. Contract tests fail on any drift. + +--- + +## 6. Pull Requests + +- **Target branch: `coolify`** — never open a PR directly against `main`. +- Keep PRs focused; one logical change per PR. + +**PR checklist before requesting review:** + +- [ ] `bash scripts/check-loc.sh` exits 0 +- [ ] All lint checks pass (go, python, tsc) +- [ ] All tests pass locally +- [ ] No endpoint drift without consumer updates in the same PR +- [ ] `[guardrail-change]` present in commit message if guardrail files were touched +- [ ] Docs updated if new endpoints, config vars, or architecture changed + +--- + +## 7. Claude Code Users + +This section is for AI-assisted development sessions using Claude Code. + +- **Always verify your branch first:** `git branch --show-current` must return `coolify`. If it returns `main`, switch before doing anything. +- The `.claude/settings.json` `PreToolUse` hooks will automatically block Write/Edit operations on files that would exceed 500 lines. This is intentional — split the file instead. +- If the `guardrail-integrity` CI job fails, check that your commit message body includes `[guardrail-change]`. Add it and amend or create a fixup commit. +- **Never use `git add -A` or `git add .`** — always stage specific files by path to avoid accidentally committing `.env`, `node_modules/`, `.next/`, or compiled binaries. +- After every session: `bash scripts/check-loc.sh` must exit 0 before pushing. +- Read `CLAUDE.md` and the relevant `AGENTS..md` before starting work on a service. diff --git a/README.md b/README.md new file mode 100644 index 0000000..9b58e25 --- /dev/null +++ b/README.md @@ -0,0 +1,132 @@ +# breakpilot-compliance + +**DSGVO/AI-Act compliance platform — 10 services, Go · Python · TypeScript** + +[![CI](https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance/actions/workflows/ci.yaml/badge.svg)](https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance/actions) +![Go](https://img.shields.io/badge/Go-1.24-00ADD8?logo=go&logoColor=white) +![Python](https://img.shields.io/badge/Python-3.12-3776AB?logo=python&logoColor=white) +![Node.js](https://img.shields.io/badge/Node.js-20-339933?logo=node.js&logoColor=white) +![TypeScript](https://img.shields.io/badge/TypeScript-strict-3178C6?logo=typescript&logoColor=white) +![FastAPI](https://img.shields.io/badge/FastAPI-0.123-009688?logo=fastapi&logoColor=white) +![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg) +![DSGVO](https://img.shields.io/badge/DSGVO-compliant-green) +![AI Act](https://img.shields.io/badge/EU%20AI%20Act-compliant-green) +![LOC guard](https://img.shields.io/badge/LOC%20guard-500%20hard%20cap-orange) +![Services](https://img.shields.io/badge/services-10-blueviolet) + +--- + +## Overview + +breakpilot-compliance is a multi-tenant DSGVO/EU AI Act compliance platform that provides an SDK for consent management, data subject requests (DSR), audit logging, iACE impact assessments, and document archival. It ships as 10 containerised services covering an admin dashboard, a developer portal, a Python/FastAPI backend, a Go AI compliance engine, TTS, and a decentralised document store on IPFS. Every service is deployed automatically via Gitea Actions → Coolify on the `coolify` branch. + +--- + +## Architecture + +| Service | Tech | Port | Container | +|---------|------|------|-----------| +| admin-compliance | Next.js 15 | 3007 | bp-compliance-admin | +| backend-compliance | Python / FastAPI 0.123 | 8002 | bp-compliance-backend | +| ai-compliance-sdk | Go 1.24 / Gin | 8093 | bp-compliance-ai-sdk | +| developer-portal | Next.js 15 | 3006 | bp-compliance-developer-portal | +| breakpilot-compliance-sdk | TypeScript SDK (React/Vue/Angular/vanilla) | — | — | +| consent-sdk | JS/TS Consent SDK | — | — | +| compliance-tts-service | Python / Piper TTS | 8095 | bp-compliance-tts | +| document-crawler | Python / FastAPI | 8098 | bp-compliance-document-crawler | +| dsms-gateway | Python / FastAPI / IPFS | 8082 | bp-compliance-dsms-gateway | +| dsms-node | IPFS Kubo v0.24.0 | — | bp-compliance-dsms-node | + +All containers share the external `breakpilot-network` Docker network and depend on `breakpilot-core` (Valkey, Vault, RAG service, Nginx reverse proxy). + +--- + +## Quick Start + +**Prerequisites:** Docker, Go 1.24+, Python 3.12+, Node.js 20+ + +```bash +git clone https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance.git +cd breakpilot-compliance + +# Copy and populate secrets (never commit .env) +cp .env.example .env + +# Start all services +docker compose up -d +``` + +For the Coolify/Hetzner production target (x86_64), use the override: + +```bash +docker compose -f docker-compose.yml -f docker-compose.hetzner.yml up -d +``` + +--- + +## Development Workflow + +Work on the `coolify` branch. Push to **both** remotes to trigger CI and deploy: + +```bash +git checkout coolify +# ... make changes ... +git push origin coolify && git push gitea coolify +``` + +Push to `gitea` triggers: +1. **Gitea Actions** — lint → test → validate (see CI Pipeline below) +2. **Coolify** — automatic build + deploy (~3 min total) + +Monitor status: + +--- + +## CI Pipeline + +Defined in `.gitea/workflows/ci.yaml`. + +| Job | What it checks | +|-----|----------------| +| `loc-budget` | All source files ≤ 500 LOC; soft target 300 | +| `guardrail-integrity` | Commits touching guardrail files carry `[guardrail-change]` | +| `go-lint` | `golangci-lint` on `ai-compliance-sdk/` | +| `python-lint` | `ruff` + `mypy` on Python services | +| `nodejs-lint` | `tsc --noEmit` + ESLint on Next.js services | +| `test-go-ai-compliance` | `go test ./...` in `ai-compliance-sdk/` | +| `test-python-backend-compliance` | `pytest` in `backend-compliance/` | +| `test-python-document-crawler` | `pytest` in `document-crawler/` | +| `test-python-dsms-gateway` | `pytest test_main.py` in `dsms-gateway/` | +| `sbom-scan` | License + vulnerability scan via `syft` + `grype` | +| `validate-canonical-controls` | OpenAPI contract baseline diff | + +--- + +## File Budget + +| Limit | Value | How to check | +|-------|-------|--------------| +| Soft target | 300 LOC | `bash scripts/check-loc.sh` | +| Hard cap | 500 LOC | Same; also enforced by `PreToolUse` hook + git pre-commit + CI | +| Exceptions | `.claude/rules/loc-exceptions.txt` | Require written rationale + `[guardrail-change]` commit marker | + +The `.claude/settings.json` `PreToolUse` hook blocks Claude Code from writing or editing files that would exceed the hard cap. The git pre-commit hook re-checks. CI is the final gate. + +--- + +## Links + +| | URL | +|-|-----| +| Admin dashboard | | +| Developer portal | | +| Backend API | | +| AI SDK API | | +| Gitea repo | | +| Gitea Actions | | + +--- + +## License + +Apache-2.0. See [LICENSE](LICENSE). diff --git a/REFACTOR_PLAYBOOK.md b/REFACTOR_PLAYBOOK.md new file mode 100644 index 0000000..a4d4a79 --- /dev/null +++ b/REFACTOR_PLAYBOOK.md @@ -0,0 +1,915 @@ + +--- + +## 1.9 `AGENTS.python.md` — Python / FastAPI conventions + +```markdown +# AGENTS.python.md — Python Service Conventions + +## Layered architecture (FastAPI) + + +## 1. Guardrail files (drop these in first) + +These artifacts enforce the rules without you or Claude having to remember them. Install them as **Phase 0**, before touching any real code. + +### 1.1 `.claude/CLAUDE.md` — loaded into every Claude session + +```markdown +# + +> **NON-NEGOTIABLE STRUCTURE RULES** (enforced by `.claude/settings.json` hook, git pre-commit, and CI): +> 1. **File-size budget:** soft target **300** lines, **hard cap 500** lines for any non-test, non-generated source file. Anything larger → split it. Exceptions are listed in `.claude/rules/loc-exceptions.txt` and require a written rationale. +> 2. **Clean architecture per service.** Routers/handlers stay thin (≤30 lines per handler) and delegate to services; services use repositories; repositories own DB I/O. See `AGENTS.python.md` / `AGENTS.go.md` / `AGENTS.typescript.md`. +> 3. **Do not touch the database schema.** No new migrations, no `ALTER TABLE`, no model field renames without an explicit migration plan reviewed by the DB owner. +> 4. **Public endpoints are a contract.** Any change to a path/method/status/schema in a backend must be accompanied by a matching update in **every** consumer. OpenAPI snapshot tests in `tests/contracts/` are the gate. +> 5. **Tests are not optional.** New code without tests fails CI. Refactors must preserve coverage and add a characterization test before splitting an oversized file. +> 6. **Do not bypass the guardrails.** Do not edit `.claude/settings.json`, `scripts/check-loc.sh`, or the loc-exceptions list to silence violations. If a rule is wrong, raise it in a PR description. +> +> These rules apply to every Claude Code session opened inside this repository, regardless of who launched it. They are loaded automatically via this CLAUDE.md. +``` + +Keep project-specific notes (dev environment, URLs, tech stack) under this header. + +### 1.2 `.claude/settings.json` — PreToolUse LOC hook + +First line of defense. Blocks Write/Edit operations that would create or push a file past 500 lines. This stops Claude from ever producing oversized files. + +```json +{ + "hooks": { + "PreToolUse": [ + { + "matcher": "Write", + "hooks": [ + { + "type": "command", + "command": "f=$(jq -r '.tool_input.file_path // empty'); [ -z \"$f\" ] && exit 0; lines=$(printf '%s' \"$(jq -r '.tool_input.content // empty')\" | awk 'END{print NR}'); if [ \"${lines:-0}\" -gt 500 ]; then echo '{\"decision\":\"block\",\"reason\":\"guardrail: file exceeds the 500-line hard cap. Split it into smaller modules per the layering rules in AGENTS..md.\"}'; exit 0; fi", + "shell": "bash", + "timeout": 5 + } + ] + }, + { + "matcher": "Edit", + "hooks": [ + { + "type": "command", + "command": "f=$(jq -r '.tool_input.file_path // empty'); [ -z \"$f\" ] || [ ! -f \"$f\" ] && exit 0; case \"$f\" in *.md|*.json|*.yaml|*.yml|*test*|*tests/*|*node_modules/*|*.next/*|*migrations/*) exit 0 ;; esac; new_str=$(jq -r '.tool_input.new_string // empty'); old_str=$(jq -r '.tool_input.old_string // empty'); old_lines=$(printf '%s' \"$old_str\" | awk 'END{print NR}'); new_lines=$(printf '%s' \"$new_str\" | awk 'END{print NR}'); cur=$(wc -l < \"$f\" | tr -d ' '); proj=$((cur - old_lines + new_lines)); if [ \"$proj\" -gt 500 ]; then echo \"{\\\"decision\\\":\\\"block\\\",\\\"reason\\\":\\\"guardrail: this edit would push $f to ~$proj lines (hard cap is 500). Split the file before continuing.\\\"}\"; fi; exit 0", + "shell": "bash", + "timeout": 5 + } + ] + } + ] + } +} +``` + +### 1.3 `.claude/rules/architecture.md` — auto-loaded architecture rule + +```markdown +# Architecture Rules (auto-loaded) + +Non-negotiable. Applied to every Claude Code session in this repo. + +## File-size budget +- **Soft target:** 300 lines. **Hard cap:** 500 lines. +- Enforced by PreToolUse hook, pre-commit hook, and CI. +- Exceptions live in `.claude/rules/loc-exceptions.txt` and require `[guardrail-change]` in the commit message. This list should SHRINK over time. + +## Clean architecture +- Python: see `AGENTS.python.md`. Layering: api → services → repositories → db.models. +- Go: see `AGENTS.go.md`. Standard Go Project Layout + hexagonal. +- TypeScript: see `AGENTS.typescript.md`. Server-by-default, push client boundary deep, colocate `_components/` and `_hooks/` per route. + +## Database is frozen +- No new migrations. No `ALTER TABLE`. No column renames. +- Pre-commit hook blocks any change under `migrations/` unless commit message contains `[migration-approved]`. + +## Public endpoints are a contract +- Any change to path/method/status/schema must update every consumer in the same change set. +- OpenAPI baseline at `tests/contracts/openapi.baseline.json`. Contract tests fail on drift. + +## Tests +- New code without tests fails CI. +- Refactors preserve coverage. Before splitting an oversized file, add a characterization test pinning current behavior. +- Layout: `tests/unit/`, `tests/integration/`, `tests/contracts/`, `tests/e2e/`. + +## Guardrails are protected +- Edits to `.claude/settings.json`, `scripts/check-loc.sh`, `scripts/githooks/pre-commit`, `.claude/rules/loc-exceptions.txt`, or any `AGENTS.*.md` require `[guardrail-change]` in the commit message. +- If Claude thinks a rule is wrong, surface it to the user. Do not silently weaken. + +## Tooling baseline +- Python: `ruff`, `mypy --strict` on new modules, `pytest --cov`. +- Go: `golangci-lint` strict, `go vet`, table-driven tests. +- TS: `tsc --noEmit` strict, ESLint type-aware, Vitest, Playwright. +- All: dependency caching in CI, license/SBOM scan via `syft`+`grype`. +``` + +### 1.4 `.claude/rules/loc-exceptions.txt` + +``` +# loc-exceptions.txt — files allowed to exceed the 500-line hard cap. +# +# Format: one repo-relative path per line. Comments start with '#'. +# Each exception MUST be preceded by a comment explaining why splitting is not viable. +# Goal: this list SHRINKS over time. + +# --- Example entries --- +# Static data catalogs — splitting fragments lookup tables without improving readability. +# src/catalogs/country-data.ts +# src/catalogs/industry-taxonomy.ts + +# Generated files — regenerated from schemas. +# api/generated/types.ts +``` + + +### 1.5 `scripts/check-loc.sh` + +```bash +#!/usr/bin/env bash +# check-loc.sh — File-size budget enforcer. Soft: 300. Hard: 500. +# +# Usage: +# scripts/check-loc.sh # scan whole repo +# scripts/check-loc.sh --changed # only files changed vs origin/main +# scripts/check-loc.sh path/to/file.py # check specific files +# scripts/check-loc.sh --json # machine-readable output +# Exit codes: 0 clean, 1 hard violation, 2 bad invocation. + +set -euo pipefail +SOFT=300 +HARD=500 +REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +EXCEPTIONS_FILE="$REPO_ROOT/.claude/rules/loc-exceptions.txt" + +CHANGED_ONLY=0; JSON=0; TARGETS=() +for arg in "$@"; do + case "$arg" in + --changed) CHANGED_ONLY=1 ;; + --json) JSON=1 ;; + -h|--help) sed -n '2,10p' "$0"; exit 0 ;; + -*) echo "unknown flag: $arg" >&2; exit 2 ;; + *) TARGETS+=("$arg") ;; + esac +done + +is_excluded() { + local f="$1" + case "$f" in + */node_modules/*|*/.next/*|*/.git/*|*/dist/*|*/build/*|*/__pycache__/*|*/vendor/*) return 0 ;; + */migrations/*|*/alembic/versions/*) return 0 ;; + *_test.go|*.test.ts|*.test.tsx|*.spec.ts|*.spec.tsx) return 0 ;; + */tests/*|*/test/*) return 0 ;; + *.md|*.json|*.yaml|*.yml|*.lock|*.sum|*.mod|*.toml|*.cfg|*.ini) return 0 ;; + *.svg|*.png|*.jpg|*.jpeg|*.gif|*.ico|*.pdf|*.woff|*.woff2|*.ttf) return 0 ;; + *.generated.*|*.gen.*|*_pb.go|*_pb2.py|*.pb.go) return 0 ;; + esac + return 1 +} +is_in_exceptions() { + [[ -f "$EXCEPTIONS_FILE" ]] || return 1 + local rel="${1#$REPO_ROOT/}" + grep -Fxq "$rel" "$EXCEPTIONS_FILE" +} +collect_targets() { + if (( ${#TARGETS[@]} > 0 )); then printf '%s\n' "${TARGETS[@]}" + elif (( CHANGED_ONLY )); then + git -C "$REPO_ROOT" diff --name-only --diff-filter=AM origin/main...HEAD 2>/dev/null \ + || git -C "$REPO_ROOT" diff --name-only --diff-filter=AM HEAD + else git -C "$REPO_ROOT" ls-files; fi +} + +violations_hard=(); violations_soft=() +while IFS= read -r f; do + [[ -z "$f" ]] && continue + abs="$f"; [[ "$abs" != /* ]] && abs="$REPO_ROOT/$f" + [[ -f "$abs" ]] || continue + is_excluded "$abs" && continue + is_in_exceptions "$abs" && continue + loc=$(wc -l < "$abs" | tr -d ' ') + if (( loc > HARD )); then violations_hard+=("$loc $f") + elif (( loc > SOFT )); then violations_soft+=("$loc $f"); fi +done < <(collect_targets) + +if (( JSON )); then + printf '{"hard":[' + first=1; for v in "${violations_hard[@]}"; do + loc="${v%% *}"; path="${v#* }" + (( first )) || printf ','; first=0 + printf '{"loc":%s,"path":"%s"}' "$loc" "$path" + done + printf '],"soft":[' + first=1; for v in "${violations_soft[@]}"; do + loc="${v%% *}"; path="${v#* }" + (( first )) || printf ','; first=0 + printf '{"loc":%s,"path":"%s"}' "$loc" "$path" + done + printf ']}\n' +else + if (( ${#violations_soft[@]} > 0 )); then + echo "::warning:: $((${#violations_soft[@]})) file(s) exceed soft target ($SOFT lines):" + printf ' %s\n' "${violations_soft[@]}" | sort -rn + fi + if (( ${#violations_hard[@]} > 0 )); then + echo "::error:: $((${#violations_hard[@]})) file(s) exceed HARD CAP ($HARD lines) — split required:" + printf ' %s\n' "${violations_hard[@]}" | sort -rn + fi +fi +(( ${#violations_hard[@]} == 0 )) +``` + +Make executable: `chmod +x scripts/check-loc.sh`. + +### 1.6 `scripts/githooks/pre-commit` + +```bash +#!/usr/bin/env bash +# pre-commit — enforces structural guardrails. +# +# 1. Blocks commits that introduce a non-test, non-generated source file > 500 LOC. +# 2. Blocks commits touching migrations/ unless commit message contains [migration-approved]. +# 3. Blocks edits to guardrail files unless [guardrail-change] is in the commit message. + +set -euo pipefail +REPO_ROOT="$(git rev-parse --show-toplevel)" + +mapfile -t staged < <(git diff --cached --name-only --diff-filter=ACM) +[[ ${#staged[@]} -eq 0 ]] && exit 0 + +# 1. LOC budget on staged files. +loc_targets=() +for f in "${staged[@]}"; do + [[ -f "$REPO_ROOT/$f" ]] && loc_targets+=("$REPO_ROOT/$f") +done +if [[ ${#loc_targets[@]} -gt 0 ]]; then + if ! "$REPO_ROOT/scripts/check-loc.sh" "${loc_targets[@]}"; then + echo; echo "Commit blocked: file-size budget violated." + echo "Split the file (preferred) or add to .claude/rules/loc-exceptions.txt." + exit 1 + fi +fi + +# 2. Migrations frozen unless approved. +if printf '%s\n' "${staged[@]}" | grep -qE '(^|/)(migrations|alembic/versions)/'; then + if ! grep -q '\[migration-approved\]' "$(git rev-parse --git-dir)/COMMIT_EDITMSG" 2>/dev/null; then + echo "Commit blocked: this change touches a migrations directory." + echo "Add '[migration-approved]' to your commit message if approved." + exit 1 + fi +fi + +# 3. Guardrail files protected. +guarded='^(\.claude/settings\.json|\.claude/rules/loc-exceptions\.txt|scripts/check-loc\.sh|scripts/githooks/pre-commit|AGENTS\.(python|go|typescript)\.md)$' +if printf '%s\n' "${staged[@]}" | grep -qE "$guarded"; then + if ! grep -q '\[guardrail-change\]' "$(git rev-parse --git-dir)/COMMIT_EDITMSG" 2>/dev/null; then + echo "Commit blocked: this change modifies guardrail files." + echo "Add '[guardrail-change]' to your commit message and explain why in the body." + exit 1 + fi +fi +exit 0 +``` + +### 1.7 `scripts/install-hooks.sh` + +```bash +#!/usr/bin/env bash +# install-hooks.sh — installs git hooks that enforce repo guardrails locally. +# Idempotent. Safe to re-run. Run once per clone: bash scripts/install-hooks.sh +set -euo pipefail +REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +HOOKS_DIR="$REPO_ROOT/.git/hooks" +SRC_DIR="$REPO_ROOT/scripts/githooks" + +[[ -d "$REPO_ROOT/.git" ]] || { echo "Not a git repository: $REPO_ROOT" >&2; exit 1; } +mkdir -p "$HOOKS_DIR" +for hook in pre-commit; do + src="$SRC_DIR/$hook"; dst="$HOOKS_DIR/$hook" + if [[ -f "$src" ]]; then cp "$src" "$dst"; chmod +x "$dst"; echo "installed: $dst"; fi +done +echo "Done. Hooks active for this clone." +``` + +### 1.8 CI additions (`.github/workflows/ci.yaml` or `.gitea/workflows/ci.yaml`) + +Add a `loc-budget` job that fails on hard violations: + +```yaml +jobs: + loc-budget: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Check file-size budget + run: bash scripts/check-loc.sh --changed + + python-lint: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: ruff + run: pip install ruff && ruff check . + - name: mypy on new modules + run: pip install mypy && mypy --strict services/ repositories/ domain/ + + go-lint: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: golangci-lint + uses: golangci/golangci-lint-action@v4 + with: { version: latest } + + ts-lint: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - run: npm ci && npx tsc --noEmit && npx next build + + contract-tests: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - run: pytest tests/contracts/ -v + + license-sbom-scan: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: anchore/sbom-action@v0 + - uses: anchore/scan-action@v3 +``` + +--- + + +### 1.9 `AGENTS.python.md` (Python / FastAPI) + +````markdown +# AGENTS.python.md — Python Service Conventions + +## Layered architecture + +``` +/ +├── api/ # HTTP layer — routers only. Thin (≤30 LOC per handler). +│ └── _routes.py +├── services/ # Business logic. Pure-ish; no FastAPI imports. +├── repositories/ # DB access. Owns SQLAlchemy session usage. +├── domain/ # Value objects, enums, domain exceptions. +├── schemas/ # Pydantic models, split per domain. Never one giant schemas.py. +└── db/models/ # SQLAlchemy ORM, one module per aggregate. __tablename__ frozen. +``` + +Dependency direction: `api → services → repositories → db.models`. Lower layers must not import upper. + +## Routers +- One `APIRouter` per domain file. Handlers ≤30 LOC. +- Parse request → call service → map domain errors → return response model. +- Inject services via `Depends`. No globals. + +```python +@router.post("/items", response_model=ItemRead, status_code=201) +async def create_item( + payload: ItemCreate, + service: ItemService = Depends(get_item_service), + tenant_id: UUID = Depends(get_tenant_id), +) -> ItemRead: + with translate_domain_errors(): + return await service.create(tenant_id, payload) +``` + +## Domain errors + translator + +```python +# domain/errors.py +class DomainError(Exception): ... +class NotFoundError(DomainError): ... +class ConflictError(DomainError): ... +class ValidationError(DomainError): ... +class PermissionError(DomainError): ... + +# api/_http_errors.py +from contextlib import contextmanager +from fastapi import HTTPException + +@contextmanager +def translate_domain_errors(): + try: yield + except NotFoundError as e: raise HTTPException(404, str(e)) from e + except ConflictError as e: raise HTTPException(409, str(e)) from e + except ValidationError as e: raise HTTPException(400, str(e)) from e + except PermissionError as e: raise HTTPException(403, str(e)) from e +``` + +## Services +- Constructor takes repository interface, not concrete. +- No FastAPI / HTTP knowledge. +- Raise domain exceptions, never HTTPException. + +## Repositories +- Intent-named methods (`get_pending_for_tenant`), not CRUD-named (`select_where`). +- Session injected. No business logic. +- Return ORM models or domain VOs; never `Row`. + +## Schemas (Pydantic v2) +- One module per domain. ≤300 lines. +- `model_config = ConfigDict(from_attributes=True, frozen=True)` for reads. +- Separate `*Create`, `*Update`, `*Read`. + +## Tests +- `tests/unit/`, `tests/integration/`, `tests/contracts/`. +- Unit tests mock repository via `AsyncMock`. +- Integration tests use real Postgres from compose via transactional fixture (rollback per test). +- Contract tests diff `/openapi.json` against `tests/contracts/openapi.baseline.json`. +- Naming: `test___.py::TestX::test_method`. +- `pytest-asyncio` mode = `auto`. Coverage target: 80% new code. + +## Tooling +- `ruff check` + `ruff format` (line length 100). +- `mypy --strict` on `services/`, `repositories/`, `domain/` first. Expand outward via per-module overrides in mypy.ini: + +```ini +[mypy] +strict = True + +[mypy-.services.*] +strict = True + +[mypy-.legacy.*] +# Legacy modules not yet refactored — expand strictness over time. +ignore_errors = True +``` + +## What you may NOT do +- Add a new migration. +- Rename `__tablename__`, column, or enum value. +- Change route contract without simultaneous consumer update. +- Catch `Exception` broadly. +- Put business logic in a router or a Pydantic validator. +- Create a file > 500 lines. +```` + +### 1.10 `AGENTS.go.md` (Go / Gin or chi) + +````markdown +# AGENTS.go.md — Go Service Conventions + +## Layered architecture (Standard Go Project Layout + hexagonal) + +``` +/ +├── cmd/server/main.go # Thin: parse flags → app.New → app.Run. < 50 LOC. +├── internal/ +│ ├── app/ # Wiring: config + DI + lifecycle. +│ ├── domain// # Pure types, interfaces, errors. No I/O. +│ ├── service// # Business logic. Depends on domain interfaces. +│ ├── repository/postgres// # Concrete repos. +│ ├── transport/http/ +│ │ ├── handler// +│ │ ├── middleware/ +│ │ └── router.go +│ └── platform/ # DB pool, logger, config, tracing. +└── pkg/ # Importable by other repos. Empty unless needed. +``` + +Direction: `transport → service → domain ← repository`. `domain` imports no siblings. + +## Handlers +- ≤40 LOC. Bind → call service → map error via `httperr.Write(c, err)` → respond. + +```go +func (h *ItemHandler) Create(c *gin.Context) { + var req CreateItemRequest + if err := c.ShouldBindJSON(&req); err != nil { + httperr.Write(c, httperr.BadRequest(err)); return + } + out, err := h.svc.Create(c.Request.Context(), req.ToInput()) + if err != nil { httperr.Write(c, err); return } + c.JSON(http.StatusCreated, out) +} +``` + +## Errors — single `httperr` package +```go +switch { +case errors.Is(err, domain.ErrNotFound): return 404 +case errors.Is(err, domain.ErrConflict): return 409 +case errors.As(err, &validationErr): return 422 +default: return 500 +} +``` +Never `panic` in request handling. Recovery middleware logs and returns 500. + +## Services +- Struct + constructor + interface methods. No package-level state. +- `context.Context` first arg always. +- Return `(value, error)`. Wrap with `fmt.Errorf("create item: %w", err)`. +- Domain errors as sentinel vars or typed; match with `errors.Is` / `errors.As`. + +## Repositories +- Interface in `domain//repository.go`. Impl in `repository/postgres//`. +- One file per query group; no file > 500 LOC. +- `pgx`/`sqlc` over hand-rolled SQL. No ORM globals. Everything takes `ctx`. + +## Tests +- Co-located `*_test.go`. Table-driven for service logic. +- Handlers via `httptest.NewRecorder`. +- Repos via `testcontainers-go` (or the compose Postgres). Never mocks at SQL boundary. +- Coverage target: 80% on `service/`. + +## Tooling (`golangci-lint` strict config) +- Linters: `errcheck, govet, staticcheck, revive, gosec, gocyclo(max 15), gocognit(max 20), unused, ineffassign, errorlint, nilerr, nolintlint, contextcheck`. +- `gofumpt` formatting. `go vet ./...` clean. `go mod tidy` clean. + +## What you may NOT do +- Touch DB schema/migrations. +- Add a new top-level package under `internal/` without review. +- `import "C"`, unsafe, reflection-heavy code. +- Non-trivial setup in `init()`. Wire in `internal/app`. +- File > 500 lines. +- Change route contract without updating consumers. +```` + +### 1.11 `AGENTS.typescript.md` (TypeScript / Next.js) + +````markdown +# AGENTS.typescript.md — TypeScript / Next.js Conventions + +## Layered architecture (Next.js 15 App Router) + +``` +app/ +├── / +│ ├── page.tsx # Server Component by default. ≤200 LOC. +│ ├── layout.tsx +│ ├── _components/ # Private folder; colocated UI. Each file ≤300 LOC. +│ ├── _hooks/ # Client hooks for this route. +│ ├── _server/ # Server actions, data loaders for this route. +│ └── loading.tsx / error.tsx +├── api//route.ts # Thin handler. Delegates to lib/server//. +lib/ +├── / # Pure helpers, types, zod schemas. Reusable. +└── server// # Server-only logic; uses "server-only". +components/ # Truly shared, app-wide components. +``` + +Server vs Client: default is Server Component. Add `"use client"` only when state/effects/browser APIs needed. Push client boundary as deep as possible. + +## API routes (route.ts) +- One handler per HTTP method, ≤40 LOC. +- Validate with `zod`. Reject invalid → 400. +- Delegate to `lib/server//`. + +```ts +export async function POST(req: Request) { + const parsed = CreateItemSchema.safeParse(await req.json()); + if (!parsed.success) + return NextResponse.json({ error: parsed.error.flatten() }, { status: 400 }); + const result = await itemService.create(parsed.data); + return NextResponse.json(result, { status: 201 }); +} +``` + +## Page components +- Pages > 300 lines → split into colocated `_components/`. +- Server Components fetch data; pass plain objects to Client Components. +- No data fetching in `useEffect` for server-renderable data. +- State: prefer URL state (`searchParams`) + Server Components over global stores. + +## Types — barrel re-export pattern for splitting monolithic type files + +```ts +// lib/sdk/types/index.ts +export * from './enums' +export * from './vendor' +export * from './dsfa' +// consumers still `import { Foo } from '@/lib/sdk/types'` +``` + +Rules: no `any`. No `as unknown as`. All DTOs are zod schemas; infer via `z.infer`. + +## Tests +- Unit: **Vitest** (`*.test.ts`/`*.test.tsx`), colocated. +- Hooks: `@testing-library/react` `renderHook`. +- E2E: **Playwright** (`tests/e2e/`), one spec per top-level page minimum. +- Coverage: 70% on `lib/`, smoke on `app/`. + +## Tooling +- `tsc --noEmit` clean (strict, `noUncheckedIndexedAccess: true`). +- ESLint with `@typescript-eslint`, type-aware rules on. +- `next build` clean. No `@ts-ignore`. `@ts-expect-error` only with a reason comment. + +## What you may NOT do +- Business logic in `page.tsx` or `route.ts`. +- Cross-app module imports. +- `dangerouslySetInnerHTML` without explicit sanitization. +- Backend API calls from Client Components when a Server Component/Action would do. +- Change route contract without updating consumers in the same change. +- File > 500 lines. +- Globally disable lint/type rules — fix the root cause. +```` + +--- + + +## 2. Phase plan — behavior-preserving refactor + +Work in phases. Each phase ends green (tests pass, build clean, contract baseline unchanged). Do **not** skip ahead. + +### Phase 0 — Foundation (single PR, low risk) + +**Goal:** Set up rails. No code refactors yet. + +1. Drop in all files from Section 1. Install hooks: `bash scripts/install-hooks.sh`. +2. Populate `.claude/rules/loc-exceptions.txt` with grandfathered entries (one line each, with a comment rationale) so CI doesn't fail day 1. +3. Append the non-negotiable rules block to root `CLAUDE.md`. +4. Add per-language `AGENTS.*.md` at repo root. +5. Add the CI jobs from §1.8. +6. Per-service `README.md` + `CLAUDE.md` stubs: what it does, run/test commands, layered architecture diagram, env vars, API surface link. + +**Verification:** CI green; loc-budget job passes with allowlist; next Claude session loads the rules automatically. + +### Phase 1 — Backend service (Python/FastAPI) + +**Critical targets:** any `routes.py` / `schemas.py` / `repository.py` / `models.py` over 500 LOC. + +**Steps:** + +1. **Snapshot the API contract:** `curl /openapi.json > tests/contracts/openapi.baseline.json`. Add a contract test that diffs current vs baseline and fails on any path/method/param drift. +2. **Characterization tests first.** For each oversized route file, add `TestClient` tests exercising every endpoint (happy path + one error path). Use `httpx.AsyncClient` + factory fixtures. +3. **Split models.py per aggregate.** Keep a shim: `from .db.models import *` re-exports so existing imports keep working. One module per aggregate; `__tablename__` unchanged (no migration). +4. **Split schemas.py** similarly with a re-export shim. +5. **Extract service layer.** Each route handler delegates to a `*Service` class injected via `Depends`. Handlers shrink to ≤30 LOC. +6. **Repository extraction** from the giant repository file; one class per aggregate. +7. **`mypy --strict` scoped to new packages first.** Expand outward via `mypy.ini` per-module overrides. +8. **Tests:** unit tests per service (mocked repo), repo tests against a transactional fixture (real Postgres), integration tests at API layer. + +**Gotchas we hit:** +- Tests that patch module-level symbols (e.g. `SessionLocal`, `scan_X`) break when you move logic behind `Depends`. Fix: re-export the symbol from the route module, or have the service lookup use the module-level symbol directly so the patch still takes effect. +- `from __future__ import annotations` can break Pydantic TypeAdapter forward refs. Remove it where it conflicts. +- Sibling test file status codes drift when you introduce the domain-error translator (e.g. 422 → 400). Update assertions in the same commit. + +**Verification:** all pytest files green. Characterization tests green. Contract test green (no drift). `mypy` clean on new packages. Coverage ≥ baseline + 10%. + +### Phase 2 — Go backend + +**Critical targets:** any handler / store / rules file over 500 LOC. + +**Steps:** +1. OpenAPI/Swagger snapshot (or generate via `swag`) → contract tests. +2. Generate handler-level tests with `httptest` for every endpoint pre-refactor. +3. Define hexagonal layout (see AGENTS.go.md). Move incrementally with type aliases for back-compat where needed. +4. Replace ad-hoc error handling with `errors.Is/As` + a single `httperr` package. +5. Add `golangci-lint` strict config; fix new findings only (don't chase legacy lint). +6. Table-driven service tests. `testcontainers-go` for repo layer. + +**Verification:** `go test ./...` passes; `golangci-lint run` clean; contract tests green; no DB schema diff. + +### Phase 3 — Frontend (Next.js) + +**Biggest beast — expect this to dominate.** Critical targets: `page.tsx` / monolithic types / API routes over 500 LOC. + +**Per oversized page:** +1. Extract presentational components into `app//_components/` (private folder, Next.js convention). +2. Move data fetching into Server Components / Server Actions; Client Components become small. +3. Hooks → `app//_hooks/`. +4. Pure helpers → `lib//`. +5. Add Vitest unit tests for hooks and pure helpers; Playwright smoke tests for each top-level page. + +**Monolithic types file:** use barrel re-export pattern. +- Create `types/` directory with domain files. +- Create `types/index.ts` with `export * from './'` lines. +- **Critical:** TypeScript won't allow both `types.ts` AND `types/index.ts` — delete the file, atomic swap to directory. + +**API routes (`route.ts`):** same router→service split as backend. Each `route.ts` becomes a thin handler delegating to `lib/server//`. + +**Endpoint preservation:** if any internal route URL changes, grep every consumer (SDK packages, developer portal, sibling apps) and update in the same change. + +**Gotchas:** +- Pre-existing type bugs often surface when you try to build. Fix them as drive-by if they block your refactor; otherwise document in a separate follow-up. +- `useClient` component imports from `'../provider'` that rely on re-exports: preserve the re-export or update importers in the same commit. +- Next.js build can fail at page-manifest stage with unrelated prerender errors. Run `next build` fresh (not from cache) to see real status. + +**Verification:** `next build` clean; `tsc --noEmit` clean; Playwright smoke tests pass; visual diff check on key pages (manual + screenshots in PR). + +### Phase 4 — SDKs & smaller services + +Apply the same patterns at smaller scale: +- **SDK packages (0 tests):** add Vitest unit tests for public surface before/while splitting. +- **Manager/Client classes:** extract config defaults, side-effect helpers (e.g. Google Consent Mode wiring), framework adapters into sibling files. Keep the main class as orchestration. +- **Framework adapters (React/Vue/Angular):** each component/composable/service/module goes in its own sibling file; the entry `index.ts` is a thin barrel of re-exports. +- **Doc monoliths (`index.md` thousands of lines):** split per topic with mkdocs nav. + +### Phase 5 — CI hardening & governance + +1. Promote `loc-budget` from warning → blocking once the allowlist has drained to legitimate exceptions only. +2. Add mutation testing in nightly (`mutmut` for Python, `gomutesting` for Go). +3. Add `dependabot`/`renovate` for npm + pip + go mod. +4. Add release tagging workflow. +5. Write ADRs (`docs/adr/`) capturing the architecture decisions from phases 1–3. +6. Distill recurring patterns into `.claude/rules/` updates. + +--- + +## 3. Agent prompt templates + +When the work volume is big, parallelize with subagents. These prompts were battle-tested in practice. + +### 3.1 Backend route file split (Python) + +> You are working in `` on branch ``. Every source file must be under 500 LOC (hard cap enforced by a PreToolUse hook); soft target 300. +> +> **Task:** split `_routes.py` (NNN LOC) following the router → service → repository layering described in `AGENTS.python.md`. +> +> **Steps:** +> 1. Snapshot the relevant slice of `/openapi.json` and add a contract test that pins current behavior. +> 2. Add characterization tests for every endpoint in this file (happy path + one error path) using `httpx.AsyncClient`. +> 3. Extract each route handler's business logic into a `Service` class in `/services/_service.py`. Inject via `Depends(get__service)`. +> 4. Raise domain errors (`NotFoundError`, `ConflictError`, `ValidationError`), never `HTTPException`. Use the `translate_domain_errors()` context manager in handlers. +> 5. Move DB access to `/repositories/_repository.py`. Session injected. +> 6. Split Pydantic schemas from the giant `schemas.py` into `/schemas/.py` if >300 lines. +> +> **Constraints:** +> - Behavior preservation. No route rename/method/status/schema changes. +> - Tests that patch module-level symbols must keep working — re-export the symbol or refactor the lookup so the patch still takes effect. +> - Run `pytest` after each step. Commit each file as its own commit. +> - Push at end: `git push origin `. +> +> When done, report: (a) new LOC counts, (b) test results, (c) mypy status, (d) commit SHAs. Under 300 words. + +### 3.2 Go handler file split + +> You are working in `` on branch ``. Hard cap 500 LOC. +> +> **Task:** split `/handlers/_handler.go` (NNN LOC) into a hexagonal layout per `AGENTS.go.md`. +> +> **Steps:** +> 1. Add `httptest` tests for every endpoint pre-refactor. +> 2. Define `internal/domain//` with types + interfaces + sentinel errors. +> 3. Create `internal/service//` with business logic implementing domain interfaces. +> 4. Create `internal/repository/postgres//` splitting queries by group. +> 5. Thin handlers under `internal/transport/http/handler//`. Each handler ≤40 LOC. Error mapping via `internal/platform/httperr`. +> 6. Use `errors.Is` / `errors.As` for domain error matching. +> +> **Constraints:** +> - No DB schema change. +> - Table-driven service tests. `testcontainers-go` (or compose Postgres) for repo tests. +> - `golangci-lint run` clean. +> +> Report new LOC, test status, lint status, commit SHAs. Under 300 words. + +### 3.3 Next.js page split (the one we parallelized heavily) + +> You are working in `` on branch ``. Every source file must be under 500 LOC (hard cap enforced by a PreToolUse hook); soft target 300. Other agents are working on OTHER pages in parallel — stay in your lane. +> +> **Task:** split the following Next.js 15 App Router client pages into colocated components so each `page.tsx` drops below 500 LOC. +> +> 1. `admin-compliance/app/sdk//page.tsx` (NNNN LOC) +> 2. `admin-compliance/app/sdk//page.tsx` (NNNN LOC) +> +> **Pattern** (reference `admin-compliance/app/sdk//` for "done"): +> - Create `_components/` subdirectory (Next.js private folder, won't create routes). +> - Extract each logically-grouped section (forms, tables, modals, tabs, headers, cards) into its own component file. Name files after the component. +> - Create `_hooks/` for custom hooks that were inline. +> - Create `_types.ts` or `_data.ts` for hoisted types or data arrays. +> - Remaining `page.tsx` wires extracted pieces — aim for under 300 LOC, hard cap 500. +> - Preserve `'use client'` when present on original. +> - DO NOT rename any exports that other files import. Grep first before moving. +> +> **Constraints:** +> - Behavior preservation. No logic changes, no improvements. +> - Imports must resolve (relative `./_components/Foo`). +> - Run `cd admin-compliance && npx next build` after each file is done. Don't commit broken builds. +> - DO NOT edit `.claude/settings.json`, `scripts/check-loc.sh`, `loc-exceptions.txt`, or any `AGENTS.*.md`. +> - Commit each page as its own commit: `refactor(admin): split page.tsx into colocated components`. HEREDOC body, include `Co-Authored-By:` trailer. +> - Pull before push: `git pull --rebase origin `, then `git push origin `. +> +> **Coordination:** DO NOT touch ``. You own only ``. +> +> When done, report: (a) each file's new LOC count, (b) how many `_components` were created, (c) whether `next build` is clean, (d) commit SHAs. Under 300 words. +> +> If the LOC hook blocks a Write, split further. If you hit rate limits partway, commit what's done and report progress honestly. + +### 3.4 Monolithic types file split (TypeScript) + +> ``, branch ``. Hard cap 500 LOC. +> +> **Task:** split `/types.ts` (NNNN LOC) into per-domain modules under `/types/`. +> +> **Steps:** +> 1. Identify domain groupings (enums, API DTOs, one group per business aggregate). +> 2. Create `/types/` directory with `.ts` files. +> 3. Create `/types/index.ts` barrel: `export * from './'` per file. +> 4. **Atomic swap:** delete the old `types.ts` in the same commit as the new `types/` directory. TypeScript won't resolve both a file and a directory with the same stem. +> 5. Grep every consumer — imports from `'/types'` should still work via the barrel. No consumer file changes needed unless there's a name collision. +> 6. Resolve collisions by renaming the less-canonical export (e.g. if two modules both export `LegalDocument`, rename the RAG one to `RagLegalDocument`). +> +> **Verification:** `tsc --noEmit` clean, `next build` clean. +> +> Report new LOC per file, collisions resolved, consumer updates, commit SHAs. + +### 3.5 Agent orchestration rules (from hard-won experience) + +When you spawn multiple agents in parallel: + +1. **Own disjoint paths.** Give each agent a bounded list of files under specific directories. Spell out the "do NOT touch" list explicitly. +2. **Always instruct `git pull --rebase origin ` before push.** Agents running in parallel will push and cause non-fast-forward rejects without this. +3. **Instruct `commit each file as its own commit`** — not a single mega-commit. Makes revert surgical. +4. **Ask for concise reports (≤300 words):** new LOC counts, component counts, build status, commit SHAs. +5. **Tell them to commit partial progress on rate-limit.** If they don't, their partial work lives in the working tree and you have to chase it with `git status` after. (We hit this — 4 agents silently left uncommitted work.) +6. **Don't give an agent more than 2 big files at once.** Each page-split in practice took ~10–20 minutes + ~150k tokens. Two is a comfortable batch. +7. **Reference a prior "done" example.** Commit SHAs are gold — the agent can inspect exactly the style you want. +8. **Run one final `next build` / `pytest` / `go test` yourself after all agents finish.** Agent reports of "build clean" can be scoped (e.g. only their files); you want the whole-repo gate. + +--- + +## 4. Workflow loop (per file) + +``` +1. Read the oversized file end to end. Identify 3–6 extraction sections. +2. Write characterization test (if backend) — pin behavior. +3. Create the sibling files one at a time. + - If the PreToolUse hook blocks (file still > 500), split further. +4. Edit the root file: replace extracted bodies with imports + delegations. +5. Run the full verification: pytest / next build / go test. +6. Run LOC check: scripts/check-loc.sh +7. Commit with a scoped message and a 1–2 line body explaining why. +8. Push. +``` + +## 5. Commit message conventions + +``` +refactor(): + + + + + +Co-Authored-By: Claude Opus 4.6 (1M context) +``` + +Markers that unlock pre-commit guards: +- `[migration-approved]` — allows changes under `migrations/` / `alembic/versions/`. +- `[guardrail-change]` — allows changes to `.claude/settings.json`, `.claude/rules/loc-exceptions.txt`, `scripts/check-loc.sh`, `scripts/githooks/pre-commit`, or any `AGENTS.*.md`. + +Good examples from our session: +- `refactor(consent-sdk): split ConsentManager + framework adapters under 500 LOC` +- `refactor(compliance-sdk): split client/provider/embed/state under 500 LOC` +- `refactor(admin): split whistleblower page.tsx + restore scope helpers` +- `chore: document data-catalog + legacy-service LOC exceptions` (with `[guardrail-change]` body) + +## 6. Verification commands cheatsheet + +```bash +# LOC budget +scripts/check-loc.sh --changed # only changed files +scripts/check-loc.sh # whole repo +scripts/check-loc.sh --json # for CI parsing + +# Python +pytest --cov= --cov-report=term-missing +ruff check . +mypy --strict /services /repositories + +# Go +go test ./... -cover +golangci-lint run +go vet ./... + +# TypeScript +npx tsc --noEmit +npx next build # from the Next.js app dir +npm test -- --run # vitest one-shot +npx playwright test tests/e2e # e2e smoke + +# Contracts +pytest tests/contracts/ # OpenAPI snapshot diff +``` + +## 7. Out of scope (don't drift) + +- DB schema / migrations — unless separate green-lit plan. +- New features. This is a refactor. +- Public endpoint renames without simultaneous consumer fix-up (exception: intra-monorepo URLs when you do the grep sweep). +- Unrelated dead code cleanup — do it in a separate PR. +- Bundling refactors across services in one commit — one service = one commit. + +## 8. Memory / session handoff + +If using Claude Code with persistent memory, save a `project_refactor_status.md` in your memory store after each phase: +- What's done (files split, LOC before → after). +- What's in progress (current file, blocker if any). +- What's deferred (pre-existing bugs surfaced but left for follow-up). +- Key patterns established (so next session doesn't rediscover them). + +This lets you resume after context compacts or after rate-limit windows without losing the thread. + +--- + +That's the whole methodology. Install Section 1, follow Section 2 phase-by-phase, use Section 3 to parallelize the grind. The guardrails do the policing so you don't have to remember anything. +