phase0: add architecture guardrails, CI gates, per-language AGENTS.md

Non-negotiable structural rules that apply to every Claude Code session in
this repo and to every commit, enforced via three defense-in-depth layers:

  1. PreToolUse hook in .claude/settings.json blocks any Write/Edit that
     would push a file past the 500-line hard cap. Auto-loads for any
     Claude session in this repo regardless of who launched it.
  2. scripts/githooks/pre-commit (installed via scripts/install-hooks.sh)
     enforces the LOC cap, freezes migrations/ unless [migration-approved],
     and protects guardrail files unless [guardrail-change] is present.
  3. .gitea/workflows/ci.yaml gets loc-budget + guardrail-integrity jobs,
     plus mypy --strict on new Python packages, tsc --noEmit on Node
     services, and a syft+grype SBOM scan.

Per-language conventions are documented in AGENTS.python.md / AGENTS.go.md /
AGENTS.typescript.md at the repo root — layering (router->service->repo for
Python, hexagonal for Go, colocation for Next.js), tooling baseline, and
explicit "what you may NOT do" lists.

Adds scripts/check-loc.sh (soft 300 / hard 500, reports 205 hard and 161
soft violations in the current codebase) plus .claude/rules/loc-exceptions.txt
(initially empty — the list is designed to shrink over time).

Per-service READMEs for all 10 services + PHASE1_RUNBOOK.md for the
backend-compliance refactor. Skeleton packages (compliance/{domain,
repositories,schemas}) are the landing zone for the clean-arch rewrite that
begins in Phase 1.

CLAUDE.md is prepended with the six non-negotiable rules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Sharang Parnerkar
2026-04-07 13:09:26 +02:00
parent 1dfea51919
commit 512b7a0f6c
25 changed files with 1308 additions and 10 deletions

View File

@@ -1,5 +1,17 @@
# BreakPilot Compliance - DSGVO/AI-Act SDK Platform
> **NON-NEGOTIABLE STRUCTURE RULES** (enforced by `.claude/settings.json` hook, git pre-commit, and CI):
> 1. **File-size budget:** soft target **300** lines, **hard cap 500** lines for any non-test, non-generated source file. Anything larger → split it. Exceptions are listed in `.claude/rules/loc-exceptions.txt` and require a written rationale.
> 2. **Clean architecture per service.** Routers/handlers stay thin (≤30 lines per handler) and delegate to services; services use repositories; repositories own DB I/O. See `AGENTS.python.md` / `AGENTS.go.md` / `AGENTS.typescript.md`.
> 3. **Do not touch the database schema.** No new Alembic migrations, no `ALTER TABLE`, no model field renames without an explicit migration plan reviewed by the DB owner. SQLAlchemy `__tablename__` and column names are frozen.
> 4. **Public endpoints are a contract.** Any change to a path, method, status code, request schema, or response schema in `backend-compliance/`, `ai-compliance-sdk/`, `dsms-gateway/`, `document-crawler/`, or `compliance-tts-service/` must be accompanied by a matching update in **every** consumer (`admin-compliance/`, `developer-portal/`, `breakpilot-compliance-sdk/`, `consent-sdk/`). Use the OpenAPI snapshot tests in `tests/contracts/` as the gate.
> 5. **Tests are not optional.** New code without tests fails CI. Refactors must preserve coverage and add a characterization test before splitting an oversized file.
> 6. **Do not bypass the guardrails.** Do not edit `.claude/settings.json`, `scripts/check-loc.sh`, or the loc-exceptions list to silence violations. If a rule is wrong, raise it in a PR description.
>
> These rules apply to **every** Claude Code session opened inside this repository, regardless of who launched it. They are loaded automatically via this `CLAUDE.md`.
## Entwicklungsumgebung (WICHTIG - IMMER ZUERST LESEN)
### Zwei-Rechner-Setup + Hetzner

View File

@@ -0,0 +1,43 @@
# Architecture Rules (auto-loaded)
These rules apply to **every** Claude Code session in this repository, regardless of who launched it. They are non-negotiable.
## File-size budget
- **Soft target:** 300 lines per non-test, non-generated source file.
- **Hard cap:** 500 lines. The PreToolUse hook in `.claude/settings.json` blocks Write/Edit operations that would create or push a file past 500. The git pre-commit hook re-checks. CI is the final gate.
- Exceptions live in `.claude/rules/loc-exceptions.txt` and require a written rationale plus `[guardrail-change]` in the commit message. The exceptions list should shrink over time, not grow.
## Clean architecture
- Python (FastAPI): see `AGENTS.python.md`. Layering: `api → services → repositories → db.models`. Routers ≤30 LOC per handler. Schemas split per domain.
- Go (Gin): see `AGENTS.go.md`. Standard Go Project Layout + hexagonal. `cmd/` thin, wiring in `internal/app`.
- TypeScript (Next.js): see `AGENTS.typescript.md`. Server-by-default, push the client boundary deep, colocate `_components/` and `_hooks/` per route.
## Database is frozen
- No new Alembic migrations. No `ALTER TABLE`. No `__tablename__` or column renames.
- The pre-commit hook blocks any change under `migrations/` or `alembic/versions/` unless the commit message contains `[migration-approved]`.
## Public endpoints are a contract
- Any change to a path/method/status/request schema/response schema in a backend service must update every consumer in the same change set.
- Each backend service has an OpenAPI baseline at `tests/contracts/openapi.baseline.json`. Contract tests fail on drift.
## Tests
- New code without tests fails CI.
- Refactors must preserve coverage. Before splitting an oversized file, add a characterization test that pins current behavior.
- Layout: `tests/unit/`, `tests/integration/`, `tests/contracts/`, `tests/e2e/`.
## Guardrails are themselves protected
- Edits to `.claude/settings.json`, `scripts/check-loc.sh`, `scripts/githooks/pre-commit`, `.claude/rules/loc-exceptions.txt`, or any `AGENTS.*.md` require `[guardrail-change]` in the commit message. The pre-commit hook enforces this.
- If you (Claude) think a rule is wrong, surface it to the user. Do not silently weaken it.
## Tooling baseline
- Python: `ruff`, `mypy --strict` on new modules, `pytest --cov`.
- Go: `golangci-lint` strict config, `go vet`, table-driven tests.
- TS: `tsc --noEmit` strict, ESLint type-aware, Vitest, Playwright.
- All three: dependency caching in CI, license/SBOM scan via `syft`+`grype`.

View File

@@ -0,0 +1,8 @@
# loc-exceptions.txt — files allowed to exceed the 500-line hard cap.
#
# Format: one repo-relative path per line. Comments start with '#' and are ignored.
# Each exception MUST be preceded by a comment explaining why splitting is not viable.
#
# Phase 0 baseline: this list is initially empty. Phases 1-4 will add grandfathered
# entries as we encounter legitimate exceptions (e.g. large generated data tables).
# The goal is for this list to SHRINK over time, never grow.

28
.claude/settings.json Normal file
View File

@@ -0,0 +1,28 @@
{
"hooks": {
"PreToolUse": [
{
"matcher": "Write",
"hooks": [
{
"type": "command",
"command": "f=$(jq -r '.tool_input.file_path // empty'); [ -z \"$f\" ] && exit 0; lines=$(printf '%s' \"$(jq -r '.tool_input.content // empty')\" | awk 'END{print NR}'); if [ \"${lines:-0}\" -gt 500 ]; then echo '{\"decision\":\"block\",\"reason\":\"breakpilot guardrail: file exceeds the 500-line hard cap. Split it into smaller modules per the layering rules in AGENTS.<lang>.md. If this is generated/data code, add an entry to .claude/rules/loc-exceptions.txt with rationale and reference [guardrail-change].\"}'; exit 0; fi",
"shell": "bash",
"timeout": 5
}
]
},
{
"matcher": "Edit",
"hooks": [
{
"type": "command",
"command": "f=$(jq -r '.tool_input.file_path // empty'); [ -z \"$f\" ] || [ ! -f \"$f\" ] && exit 0; case \"$f\" in *.md|*.json|*.yaml|*.yml|*test*|*tests/*|*node_modules/*|*.next/*|*migrations/*) exit 0 ;; esac; new_str=$(jq -r '.tool_input.new_string // empty'); old_str=$(jq -r '.tool_input.old_string // empty'); old_lines=$(printf '%s' \"$old_str\" | awk 'END{print NR}'); new_lines=$(printf '%s' \"$new_str\" | awk 'END{print NR}'); cur=$(wc -l < \"$f\" | tr -d ' '); proj=$((cur - old_lines + new_lines)); if [ \"$proj\" -gt 500 ]; then echo \"{\\\"decision\\\":\\\"block\\\",\\\"reason\\\":\\\"breakpilot guardrail: this edit would push $f to ~$proj lines (hard cap is 500). Split the file before continuing. See AGENTS.<lang>.md for the layering rules.\\\"}\"; fi; exit 0",
"shell": "bash",
"timeout": 5
}
]
}
]
}
}

View File

@@ -19,6 +19,55 @@ on:
branches: [main, develop]
jobs:
# ========================================
# Guardrails — LOC budget + architecture gates
# Runs on every push/PR. Fails fast and cheap.
# ========================================
loc-budget:
runs-on: docker
container: alpine:3.20
steps:
- name: Checkout
run: |
apk add --no-cache git bash
git clone --depth 50 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
- name: Enforce 500-line hard cap on changed files
run: |
chmod +x scripts/check-loc.sh
if [ "${GITHUB_EVENT_NAME}" = "pull_request" ]; then
git fetch origin ${GITHUB_BASE_REF}:base
mapfile -t changed < <(git diff --name-only --diff-filter=ACM base...HEAD)
[ ${#changed[@]} -eq 0 ] && { echo "No changed files."; exit 0; }
scripts/check-loc.sh "${changed[@]}"
else
# Push to main: only warn on whole-repo state; blocking gate is on PRs.
scripts/check-loc.sh || true
fi
# Phase 0 intentionally gates only changed files so the 205-file legacy
# baseline doesn't block every PR. Phases 1-4 drain the baseline; Phase 5
# flips this to a whole-repo blocking gate.
guardrail-integrity:
runs-on: docker
container: alpine:3.20
if: github.event_name == 'pull_request'
steps:
- name: Checkout
run: |
apk add --no-cache git bash
git clone --depth 20 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
git fetch origin ${GITHUB_BASE_REF}:base
- name: Require [guardrail-change] label in PR commits touching guardrails
run: |
changed=$(git diff --name-only base...HEAD)
echo "$changed" | grep -E '^(\.claude/settings\.json|\.claude/rules/loc-exceptions\.txt|scripts/check-loc\.sh|scripts/githooks/pre-commit|AGENTS\.(python|go|typescript)\.md)$' || exit 0
if ! git log base..HEAD --format=%B | grep -q '\[guardrail-change\]'; then
echo "::error:: Guardrail files were modified but no commit in this PR carries [guardrail-change]."
echo "If intentional, amend one commit message with [guardrail-change] and explain why in the body."
exit 1
fi
# ========================================
# Lint (nur bei PRs)
# ========================================
@@ -47,13 +96,29 @@ jobs:
run: |
apt-get update -qq && apt-get install -y -qq git > /dev/null 2>&1
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
- name: Lint Python services
- name: Lint Python services (ruff)
run: |
pip install --quiet ruff
for svc in backend-compliance document-crawler dsms-gateway; do
fail=0
for svc in backend-compliance document-crawler dsms-gateway compliance-tts-service; do
if [ -d "$svc" ]; then
echo "=== Linting $svc ==="
ruff check "$svc/" --output-format=github || true
echo "=== ruff: $svc ==="
ruff check "$svc/" --output-format=github || fail=1
fi
done
exit $fail
- name: Type-check new modules (mypy --strict)
# Scoped to the layered packages we own. Expand this list as Phase 1+ refactors land.
run: |
pip install --quiet mypy
for pkg in \
backend-compliance/compliance/services \
backend-compliance/compliance/repositories \
backend-compliance/compliance/domain \
backend-compliance/compliance/schemas; do
if [ -d "$pkg" ]; then
echo "=== mypy --strict: $pkg ==="
mypy --strict --ignore-missing-imports "$pkg" || exit 1
fi
done
@@ -66,17 +131,20 @@ jobs:
run: |
apk add --no-cache git
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
- name: Lint Node.js services
- name: Lint + type-check Node.js services
run: |
fail=0
for svc in admin-compliance developer-portal; do
if [ -d "$svc" ]; then
echo "=== Linting $svc ==="
cd "$svc"
npm ci --silent 2>/dev/null || npm install --silent
npx next lint || true
cd ..
echo "=== $svc: install ==="
(cd "$svc" && (npm ci --silent 2>/dev/null || npm install --silent))
echo "=== $svc: next lint ==="
(cd "$svc" && npx next lint) || fail=1
echo "=== $svc: tsc --noEmit ==="
(cd "$svc" && npx tsc --noEmit) || fail=1
fi
done
exit $fail
# ========================================
# Unit Tests
@@ -169,6 +237,32 @@ jobs:
pip install --quiet --no-cache-dir pytest pytest-asyncio
python -m pytest test_main.py -v --tb=short
# ========================================
# SBOM + license scan (compliance product → we eat our own dog food)
# ========================================
sbom-scan:
runs-on: docker
if: github.event_name == 'pull_request'
container: alpine:3.20
steps:
- name: Checkout
run: |
apk add --no-cache git curl bash
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
- name: Install syft + grype
run: |
curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin
curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin
- name: Generate SBOM
run: |
mkdir -p sbom-out
syft dir:. -o cyclonedx-json=sbom-out/sbom.cdx.json -q
- name: Vulnerability scan (fail on high+)
run: |
grype sbom:sbom-out/sbom.cdx.json --fail-on high -q || true
# Initially non-blocking ('|| true'). Flip to blocking after baseline is clean.
# ========================================
# Validate Canonical Controls
# ========================================
@@ -194,6 +288,7 @@ jobs:
runs-on: docker
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
needs:
- loc-budget
- test-go-ai-compliance
- test-python-backend-compliance
- test-python-document-crawler

126
AGENTS.go.md Normal file
View File

@@ -0,0 +1,126 @@
# AGENTS.go.md — Go Service Conventions
Applies to: `ai-compliance-sdk/`.
## Layered architecture (Gin)
Follows [Standard Go Project Layout](https://github.com/golang-standards/project-layout) + hexagonal/clean-arch.
```
ai-compliance-sdk/
├── cmd/server/main.go # Thin: parse flags → app.New → app.Run. <50 LOC.
├── internal/
│ ├── app/ # Wiring: config + DI graph + lifecycle.
│ ├── domain/ # Pure types, interfaces, errors. No I/O imports.
│ │ └── <aggregate>/
│ ├── service/ # Business logic. Depends on domain interfaces only.
│ │ └── <aggregate>/
│ ├── repository/postgres/ # Concrete repo implementations.
│ │ └── <aggregate>/
│ ├── transport/http/ # Gin handlers. Thin. One handler per file group.
│ │ ├── handler/<aggregate>/
│ │ ├── middleware/
│ │ └── router.go
│ └── platform/ # DB pool, logger, config, tracing.
└── pkg/ # Importable by other repos. Empty unless needed.
```
**Dependency direction:** `transport → service → domain ← repository`. `domain` imports nothing from siblings.
## Handlers
- One handler = one Gin function. ≤40 LOC.
- Bind → call service → map domain error to HTTP via `httperr.Write(c, err)` → respond.
- Return early on errors. No business logic, no SQL.
```go
func (h *IACEHandler) Create(c *gin.Context) {
var req CreateIACERequest
if err := c.ShouldBindJSON(&req); err != nil {
httperr.Write(c, httperr.BadRequest(err))
return
}
out, err := h.svc.Create(c.Request.Context(), req.ToInput())
if err != nil {
httperr.Write(c, err)
return
}
c.JSON(http.StatusCreated, out)
}
```
## Services
- Struct + constructor + interface methods. No package-level state.
- Take `context.Context` as first arg always. Propagate to repos.
- Return `(value, error)`. Wrap with `fmt.Errorf("create iace: %w", err)`.
- Domain errors implemented as sentinel vars or typed errors; matched with `errors.Is` / `errors.As`.
## Repositories
- Interface lives in `domain/<aggregate>/repository.go`. Implementation in `repository/postgres/<aggregate>/`.
- One file per query group; no file >500 LOC.
- Use `pgx`/`sqlc` over hand-rolled string SQL when feasible. No ORM globals.
- All queries take `ctx`. No background goroutines without explicit lifecycle.
## Errors
Single `internal/platform/httperr` package maps `error` → HTTP status:
```go
switch {
case errors.Is(err, domain.ErrNotFound): return 404
case errors.Is(err, domain.ErrConflict): return 409
case errors.As(err, &validationErr): return 422
default: return 500
}
```
Never `panic` in request handling. `recover` middleware logs and returns 500.
## Tests
- Co-located `*_test.go`.
- **Table-driven** tests for service logic; use `t.Run(tt.name, ...)`.
- Handlers tested with `httptest.NewRecorder`.
- Repos tested with `testcontainers-go` (or the existing compose Postgres) — never mocks at the SQL boundary.
- Coverage target: 80% on `service/`. CI fails on regression.
```go
func TestIACEService_Create(t *testing.T) {
tests := []struct {
name string
input service.CreateInput
setup func(*mockRepo)
wantErr error
}{
{"happy path", validInput(), func(r *mockRepo) { r.createReturns(nil) }, nil},
{"conflict", validInput(), func(r *mockRepo) { r.createReturns(domain.ErrConflict) }, domain.ErrConflict},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) { /* ... */ })
}
}
```
## Tooling
- `golangci-lint` with: `errcheck, govet, staticcheck, revive, gosec, gocyclo (max 15), gocognit (max 20), unused, ineffassign, errorlint, nilerr, nolintlint, contextcheck`.
- `gofumpt` formatting.
- `go vet ./...` clean.
- `go mod tidy` clean — no unused deps.
## Concurrency
- Goroutines must have a clear lifecycle owner (struct method that started them must stop them).
- Pass `ctx` everywhere. Cancellation respected.
- No global mutexes for request data. Use per-request context.
## What you may NOT do
- Touch DB schema/migrations.
- Add a new top-level package directly under `internal/` without architectural review.
- `import "C"`, unsafe, reflection-heavy code.
- Use `init()` for non-trivial setup. Wire it in `internal/app`.
- Create a file >500 lines.
- Change a public route's contract without updating consumers.

94
AGENTS.python.md Normal file
View File

@@ -0,0 +1,94 @@
# AGENTS.python.md — Python Service Conventions
Applies to: `backend-compliance/`, `document-crawler/`, `dsms-gateway/`, `compliance-tts-service/`.
## Layered architecture (FastAPI)
```
compliance/
├── api/ # HTTP layer — routers only. Thin (≤30 LOC per handler).
│ └── <domain>_routes.py
├── services/ # Business logic. Pure-ish; no FastAPI imports.
│ └── <domain>_service.py
├── repositories/ # DB access. Owns SQLAlchemy session usage.
│ └── <domain>_repository.py
├── domain/ # Value objects, enums, domain exceptions.
├── schemas/ # Pydantic models, split per domain. NEVER one giant schemas.py.
│ └── <domain>.py
└── db/
└── models/ # SQLAlchemy ORM, one module per aggregate. __tablename__ frozen.
```
**Dependency direction:** `api → services → repositories → db.models`. Lower layers must not import upper layers.
## Routers
- One `APIRouter` per domain file.
- Handlers do exactly: parse request → call service → map domain errors to HTTPException → return response model.
- Inject services via `Depends`. No globals.
- Tag routes; document with summary + response_model.
```python
@router.post("/dsr/requests", response_model=DSRRequestRead, status_code=201)
async def create_dsr_request(
payload: DSRRequestCreate,
service: DSRService = Depends(get_dsr_service),
tenant_id: UUID = Depends(get_tenant_id),
) -> DSRRequestRead:
try:
return await service.create(tenant_id, payload)
except DSRConflict as exc:
raise HTTPException(409, str(exc)) from exc
```
## Services
- Constructor takes the repository (interface, not concrete).
- No `Request`, `Response`, or HTTP knowledge.
- Raise domain exceptions (e.g. `DSRConflict`, `DSRNotFound`), never `HTTPException`.
- Return domain objects or Pydantic schemas — pick one and stay consistent inside a service.
## Repositories
- Methods are intent-named (`get_pending_for_tenant`), not CRUD-named (`select_where`).
- Sessions injected, not constructed inside.
- No business logic. No cross-aggregate joins for unrelated workflows — that belongs in a service.
- Return ORM models or domain VOs; never `Row`.
## Schemas (Pydantic v2)
- One module per domain. Module ≤300 lines.
- Use `model_config = ConfigDict(from_attributes=True, frozen=True)` for read models.
- Separate `*Create`, `*Update`, `*Read`. No giant union schemas.
## Tests (`pytest`)
- Layout: `tests/unit/`, `tests/integration/`, `tests/contracts/`.
- Unit tests mock the repository. Use `pytest.fixture` + `unittest.mock.AsyncMock`.
- Integration tests run against the real Postgres from `docker-compose.yml` via a transactional fixture (rollback after each test).
- Contract tests diff `/openapi.json` against `tests/contracts/openapi.baseline.json`.
- Naming: `test_<unit>_<scenario>_<expected>.py::TestClass::test_method`.
- `pytest-asyncio` mode = `auto`. Mark slow tests with `@pytest.mark.slow`.
- Coverage target: 80% for new code; never decrease the service baseline.
## Tooling
- `ruff check` + `ruff format` (line length 100).
- `mypy --strict` on `services/`, `repositories/`, `domain/`. Expand outward.
- `pip-audit` in CI.
- Async-first: prefer `httpx.AsyncClient`, `asyncpg`/`SQLAlchemy 2.x async`.
## Errors & logging
- Domain errors inherit from a single `DomainError` base per service.
- Log via `structlog` with bound context (`tenant_id`, `request_id`). Never log secrets, PII, or full request bodies.
- Audit-relevant actions go through the audit logger, not the application logger.
## What you may NOT do
- Add a new Alembic migration.
- Rename a `__tablename__`, column, or enum value.
- Change a public route's path/method/status/schema without simultaneous dashboard fix.
- Catch `Exception` broadly — catch the specific domain or library error.
- Put business logic in a router or in a Pydantic validator.
- Create a new file >500 lines. Period.

85
AGENTS.typescript.md Normal file
View File

@@ -0,0 +1,85 @@
# AGENTS.typescript.md — TypeScript / Next.js Conventions
Applies to: `admin-compliance/`, `developer-portal/`, `breakpilot-compliance-sdk/`, `consent-sdk/`, `dsms-node/` (where applicable).
## Layered architecture (Next.js 15 App Router)
```
app/
├── <route>/
│ ├── page.tsx # Server Component by default. ≤200 LOC.
│ ├── layout.tsx
│ ├── _components/ # Private folder; not routable. Colocated UI.
│ │ └── <Component>.tsx # Each file ≤300 LOC.
│ ├── _hooks/ # Client hooks for this route.
│ ├── _server/ # Server actions, data loaders for this route.
│ └── loading.tsx / error.tsx
├── api/
│ └── <domain>/route.ts # Thin handler. Delegates to lib/server/<domain>/.
lib/
├── <domain>/ # Pure helpers, types, schemas (zod). Reusable.
└── server/<domain>/ # Server-only logic; uses "server-only" import.
components/ # Truly shared, app-wide components.
```
**Server vs Client:** Default is Server Component. Add `"use client"` only when you need state, effects, or browser APIs. Push the boundary as deep as possible.
## API routes (route.ts)
- One handler per HTTP method, ≤40 LOC.
- Validate input with `zod`. Reject invalid → 400.
- Delegate to `lib/server/<domain>/`. No business logic in `route.ts`.
- Always return `NextResponse.json(..., { status })`. Never throw to the framework.
```ts
export async function POST(req: Request) {
const parsed = CreateDSRSchema.safeParse(await req.json());
if (!parsed.success) return NextResponse.json({ error: parsed.error.flatten() }, { status: 400 });
const result = await dsrService.create(parsed.data);
return NextResponse.json(result, { status: 201 });
}
```
## Page components
- Pages >300 lines must be split into colocated `_components/`.
- Server Components fetch data; pass plain objects to Client Components.
- No data fetching in `useEffect` for server-renderable data.
- State management: prefer URL state (`searchParams`) and Server Components over global stores.
## Types
- `lib/sdk/types.ts` is being split into `lib/sdk/types/<domain>.ts`. Mirror backend domain boundaries.
- All API DTOs are zod schemas; infer types via `z.infer`.
- No `any`. No `as unknown as`. If you reach for it, the type is wrong.
## Tests
- Unit: **Vitest** (`*.test.ts`/`*.test.tsx`), colocated.
- Hooks: `@testing-library/react` `renderHook`.
- E2E: **Playwright** (`tests/e2e/`), one spec per top-level page, smoke happy path minimum.
- Snapshot tests sparingly — only for stable output (CSV, JSON-LD).
- Coverage target: 70% on `lib/`, smoke coverage on `app/`.
## Tooling
- `tsc --noEmit` clean (strict mode, `noUncheckedIndexedAccess: true`).
- ESLint with `@typescript-eslint`, `eslint-config-next`, type-aware rules on.
- `prettier`.
- `next build` clean. No `// @ts-ignore`. `// @ts-expect-error` only with a comment explaining why.
## Performance
- Use `next/dynamic` for heavy client-only components.
- Image: `next/image` with explicit width/height.
- Avoid waterfalls — `Promise.all` for parallel data fetches in Server Components.
## What you may NOT do
- Put business logic in a `page.tsx` or `route.ts`.
- Reach across module boundaries (e.g. `admin-compliance` importing from `developer-portal`).
- Use `dangerouslySetInnerHTML` without explicit sanitization.
- Call backend APIs directly from Client Components when a Server Component or Server Action would do.
- Change a public API route's path/method/schema without updating SDK consumers in the same change.
- Create a file >500 lines.
- Disable a lint or type rule globally to silence a finding — fix the root cause.

View File

@@ -0,0 +1,51 @@
# admin-compliance
Next.js 15 dashboard for BreakPilot Compliance — SDK module UI, company profile, DSR, DSFA, VVT, TOM, consent, AI Act, training, audit, change requests, etc. Also hosts 96+ API routes that proxy/orchestrate backend services.
**Port:** `3007` (container: `bp-compliance-admin`)
**Stack:** Next.js 15 App Router, React 18, TailwindCSS, TypeScript strict.
## Architecture (target — Phase 3)
```
app/
├── <route>/
│ ├── page.tsx # Server Component (≤200 LOC)
│ ├── _components/ # Colocated UI, each ≤300 LOC
│ ├── _hooks/ # Client hooks
│ └── _server/ # Server actions
├── api/<domain>/route.ts # Thin handlers → lib/server/<domain>/
lib/
├── <domain>/ # Pure helpers, zod schemas
└── server/<domain>/ # "server-only" logic
components/ # App-wide shared UI
```
See `../AGENTS.typescript.md`.
## Run locally
```bash
cd admin-compliance
npm install
npm run dev # http://localhost:3007
```
## Tests
```bash
npm test # Vitest unit + component tests
npx playwright test # E2E
npx tsc --noEmit # Type-check
npx next lint
```
## Known debt (Phase 3 targets)
- `app/sdk/company-profile/page.tsx` (3017 LOC), `tom-generator/controls/loader.ts` (2521), `lib/sdk/types.ts` (2511), `app/sdk/loeschfristen/page.tsx` (2322), `app/sdk/dsb-portal/page.tsx` (2068) — all must be split.
- 0 test files for 182 monolithic pages. Phase 3 adds Playwright smoke + Vitest unit coverage.
## Don't touch
- Backend API paths without updating `backend-compliance/` in the same change.
- `lib/sdk/types.ts` in large contiguous chunks — it's being domain-split.

View File

@@ -0,0 +1,55 @@
# ai-compliance-sdk
Go/Gin service providing AI-Act compliance analysis: iACE impact assessments, UCCA rules engine, hazard library, training/academy, audit, escalation, portfolio, RBAC, RAG, whistleblower, workshop.
**Port:** `8090` → exposed `8093` (container: `bp-compliance-ai-sdk`)
**Stack:** Go 1.24, Gin, pgx, Postgres.
## Architecture (target — Phase 2)
```
cmd/server/main.go # Thin entrypoint (<50 LOC)
internal/
├── app/ # Wiring + lifecycle
├── domain/<aggregate>/ # Types, interfaces, errors
├── service/<aggregate>/ # Business logic
├── repository/postgres/ # Repo implementations
├── transport/http/ # Gin handlers + middleware + router
└── platform/ # DB pool, logger, config, httperr
```
See `../AGENTS.go.md` for the full convention.
## Run locally
```bash
cd ai-compliance-sdk
go mod download
export COMPLIANCE_DATABASE_URL=...
go run ./cmd/server
```
## Tests
```bash
go test -race -cover ./...
golangci-lint run --timeout 5m ./...
```
Co-located `*_test.go`, table-driven. Repo layer uses testcontainers-go (or the compose Postgres) — no SQL mocks.
## Public API surface
Handlers under `internal/api/handlers/` (Phase 2 moves to `internal/transport/http/handler/`). Health at `GET /health`. iACE, UCCA, training, academy, portfolio, escalation, audit, rag, whistleblower, workshop subresources. Every route is a contract.
## Environment
| Var | Purpose |
|-----|---------|
| `COMPLIANCE_DATABASE_URL` | Postgres DSN |
| `LLM_GATEWAY_URL` | LLM router for rag/iACE |
| `QDRANT_URL` | Vector search |
## Don't touch
DB schema. Hand-rolled migrations elsewhere own it.

View File

@@ -0,0 +1,181 @@
# Phase 1 Runbook — backend-compliance refactor
This document is the step-by-step execution guide for Phase 1 of the repo refactor plan at `~/.claude/plans/vectorized-purring-barto.md`. It exists because the refactor must be driven from a session that can actually run `pytest` against the service, and every step must be verified green before moving to the next.
## Prerequisites
- Python 3.12 venv with `backend-compliance/requirements.txt` installed.
- Local Postgres reachable via `COMPLIANCE_DATABASE_URL` (use the compose db).
- Existing 48 pytest test files pass from a clean checkout: `pytest compliance/tests/ -v` → all green. **Do not proceed until this is true.**
## Step 0 — Record the baseline
```bash
cd backend-compliance
pytest compliance/tests/ -v --tb=short | tee /tmp/baseline.txt
pytest --cov=compliance --cov-report=term | tee /tmp/baseline-coverage.txt
python tests/contracts/regenerate_baseline.py # creates openapi.baseline.json
git add tests/contracts/openapi.baseline.json
git commit -m "phase1: pin OpenAPI baseline before refactor"
```
The baseline file is the contract. From this point forward, `pytest tests/contracts/` MUST stay green.
## Step 1 — Characterization tests (before any code move)
For each oversized route file we will refactor, add a happy-path + 1-error-path test **before** touching the source. These are called "characterization tests" and their purpose is to freeze current observable behavior so the refactor cannot change it silently.
Oversized route files to cover (ordered by size):
| File | LOC | Endpoints to cover |
|---|---:|---|
| `compliance/api/isms_routes.py` | 1676 | one happy + one 4xx per route |
| `compliance/api/dsr_routes.py` | 1176 | same |
| `compliance/api/vvt_routes.py` | *N* | same |
| `compliance/api/dsfa_routes.py` | *N* | same |
| `compliance/api/tom_routes.py` | *N* | same |
| `compliance/api/schemas.py` | 1899 | N/A (covered transitively) |
| `compliance/db/models.py` | 1466 | N/A (covered by existing + route tests) |
| `compliance/db/repository.py` | 1547 | add unit tests per repo class as they are extracted |
Use `httpx.AsyncClient` + factory fixtures; see `AGENTS.python.md`. Place under `tests/integration/test_<domain>_contract.py`.
Commit: `phase1: characterization tests for <domain> routes`.
## Step 2 — Split `compliance/db/models.py` (1466 → <500 per file)
⚠️ **Atomic step.** A `compliance/db/models/` package CANNOT coexist with the existing `compliance/db/models.py` module — Python's import system shadows the module with the package, breaking every `from compliance.db.models import X` call. The directory skeleton was intentionally NOT pre-created for this reason. Do the following in **one commit**:
1. Create `compliance/db/models/` directory with `__init__.py` (re-export shim — see template below).
2. Move aggregate model classes into `compliance/db/models/<aggregate>.py` modules.
3. Delete the old `compliance/db/models.py` file in the same commit.
Strategy uses a **re-export shim** so no import sites change:
1. For each aggregate, create `compliance/db/models/<aggregate>.py` containing the model classes. Copy verbatim; do not rename `__tablename__`, columns, or relationship strings.
2. Aggregate suggestions (verify by reading `models.py`):
- `dsr.py` (DSR requests, exports)
- `dsfa.py`
- `vvt.py`
- `tom.py`
- `ai.py` (AI systems, compliance checks)
- `consent.py`
- `evidence.py`
- `vendor.py`
- `audit.py`
- `policy.py`
- `project.py`
3. After every aggregate is moved, replace `compliance/db/models.py` with:
```python
"""Re-export shim — see compliance.db.models package."""
from compliance.db.models.dsr import * # noqa: F401,F403
from compliance.db.models.dsfa import * # noqa: F401,F403
# ... one per module
```
This keeps `from compliance.db.models import XYZ` working everywhere it's used today.
4. Run `pytest` after every move. Green → commit. Red → revert that move and investigate.
5. Existing aggregate-level files (`compliance/db/dsr_models.py`, `vvt_models.py`, `tom_models.py`, etc.) should be folded into the new `compliance/db/models/` package in the same pass — do not leave two parallel naming conventions.
**Do not** add `__init__.py` star-imports that change `Base.metadata` discovery order. Alembic's autogenerate depends on it. Verify via: `alembic check` if the env is set up.
## Step 3 — Split `compliance/api/schemas.py` (1899 → per domain)
Mirror the models split:
1. For each domain, create `compliance/schemas/<domain>.py` with the Pydantic models.
2. Replace `compliance/api/schemas.py` with a re-export shim.
3. Keep `Create`/`Update`/`Read` variants separated; do not merge them into unions.
4. Run `pytest` + contract test after each domain. Green → commit.
## Step 4 — Extract services (router → service delegation)
For each route file > 500 LOC, pull handler bodies into a service class under `compliance/services/<domain>_service.py` (new-style domain services, not the utility `compliance/services/` modules that already exist — consider renaming those to `compliance/services/_legacy/` if collisions arise).
Router handlers become:
```python
@router.post("/dsr/requests", response_model=DSRRequestRead, status_code=201)
async def create_dsr_request(
payload: DSRRequestCreate,
service: DSRService = Depends(get_dsr_service),
tenant_id: UUID = Depends(get_tenant_id),
) -> DSRRequestRead:
try:
return await service.create(tenant_id, payload)
except ConflictError as exc:
raise HTTPException(409, str(exc)) from exc
except NotFoundError as exc:
raise HTTPException(404, str(exc)) from exc
```
Rules:
- Handler body ≤ 30 LOC.
- Service raises domain errors (`compliance.domain`), never `HTTPException`.
- Inject service via `Depends` on a factory that wires the repository.
Run tests after each router is thinned. Contract test must stay green.
## Step 5 — Extract repositories
`compliance/db/repository.py` (1547) and `compliance/db/isms_repository.py` (838) split into:
```
compliance/repositories/
├── dsr_repository.py
├── dsfa_repository.py
├── vvt_repository.py
├── isms_repository.py # <500 LOC, split if needed
└── ...
```
Each repository class:
- Takes `AsyncSession` (or equivalent) in constructor.
- Exposes intent-named methods (`get_pending_for_tenant`, not `select_where`).
- Returns ORM instances or domain VOs. No `Row`.
- No business logic.
Unit-test every repo class against the compose Postgres with a transactional fixture (begin → rollback).
## Step 6 — mypy --strict on new packages
CI already runs `mypy --strict` against `compliance/{services,repositories,domain,schemas}/`. After every extraction, verify locally:
```bash
mypy --strict --ignore-missing-imports compliance/schemas compliance/repositories compliance/domain compliance/services
```
If you have type errors, fix them in the extracted module. **Do not** add `# type: ignore` blanket waivers. If a third-party lib is poorly typed, add it to `[mypy.overrides]` in `pyproject.toml`/`mypy.ini` with a one-line rationale.
## Step 7 — Expand test coverage
- Unit tests per service (mocked repo).
- Integration tests per repository (real db, transactional).
- Contract test stays green.
- Target: 80% coverage on new code. Never decrease the service baseline.
## Step 8 — Guardrail enforcement
After Phase 1 completes, `compliance/db/models.py`, `compliance/db/repository.py`, and `compliance/api/schemas.py` are either re-export shims (≤50 LOC each) or deleted. No file in `backend-compliance/compliance/` exceeds 500 LOC. Run:
```bash
../scripts/check-loc.sh backend-compliance/
```
Any remaining hard violations → document in `.claude/rules/loc-exceptions.txt` with rationale, or keep splitting.
## Done when
- `pytest compliance/tests/ tests/ -v` all green.
- `pytest tests/contracts/` green — OpenAPI has no removals, no renames, no new required request fields.
- Coverage ≥ baseline.
- `mypy --strict` clean on new packages.
- `scripts/check-loc.sh backend-compliance/` reports 0 hard violations in new/touched files (legacy allowlisted in `loc-exceptions.txt` only with rationale).
- CI all green on PR.
## Pitfalls
- **Do not change `__tablename__` or column names.** Even a rename breaks the DB contract.
- **Do not change relationship back_populates / backref strings.** SQLAlchemy resolves these by name at mapper configuration.
- **Do not change route paths or pydantic field names.** Contract test will catch most — but JSON field aliasing (`Field(alias=...)`) is easy to break accidentally.
- **Do not eagerly reformat unrelated code.** Keep the diff reviewable. One PR per major step.
- **Do not bypass the pre-commit hook.** If a file legitimately must be >500 LOC during an intermediate step, squash commits at the end so the final state is clean.

View File

@@ -0,0 +1,55 @@
# backend-compliance
Python/FastAPI service implementing the DSGVO compliance API: DSR, DSFA, consent, controls, risks, evidence, audit, vendor management, ISMS, change requests, document generation.
**Port:** `8002` (container: `bp-compliance-backend`)
**Stack:** Python 3.12, FastAPI, SQLAlchemy 2.x, Alembic, Keycloak auth.
## Architecture (target — Phase 1)
```
compliance/
├── api/ # Routers (thin, ≤30 LOC per handler)
├── services/ # Business logic
├── repositories/ # DB access
├── domain/ # Value objects, domain errors
├── schemas/ # Pydantic models, split per domain
└── db/models/ # SQLAlchemy ORM, one module per aggregate
```
See `../AGENTS.python.md` for the full convention and `../.claude/rules/architecture.md` for the non-negotiable rules.
## Run locally
```bash
cd backend-compliance
pip install -r requirements.txt
export COMPLIANCE_DATABASE_URL=... # Postgres (Hetzner or local)
uvicorn main:app --reload --port 8002
```
## Tests
```bash
pytest compliance/tests/ -v
pytest --cov=compliance --cov-report=term-missing
```
Layout: `tests/unit/`, `tests/integration/`, `tests/contracts/`. Contract tests diff `/openapi.json` against `tests/contracts/openapi.baseline.json`.
## Public API surface
404+ endpoints across `/api/v1/*`. Grouped by domain: `ai`, `audit`, `consent`, `dsfa`, `dsr`, `gdpr`, `vendor`, `evidence`, `change-requests`, `generation`, `projects`, `company-profile`, `isms`. Every path is a contract — see the "Public endpoints" rule in the root `CLAUDE.md`.
## Environment
| Var | Purpose |
|-----|---------|
| `COMPLIANCE_DATABASE_URL` | Postgres DSN, `sslmode=require` |
| `KEYCLOAK_*` | Auth verification |
| `QDRANT_URL`, `QDRANT_API_KEY` | Vector search |
| `CORE_VALKEY_URL` | Session cache |
## Don't touch
Database schema, `__tablename__`, column names, existing migrations under `migrations/`. See root `CLAUDE.md` rule 3.

View File

@@ -0,0 +1,30 @@
"""Domain layer: value objects, enums, and domain exceptions.
Pure Python — no FastAPI, no SQLAlchemy, no HTTP concerns. Upper layers depend on
this package; it depends on nothing except the standard library and small libraries
like ``pydantic`` or ``attrs``.
"""
class DomainError(Exception):
"""Base class for all domain-level errors.
Services raise subclasses of this; the HTTP layer is responsible for mapping
them to status codes. Never raise ``HTTPException`` from a service.
"""
class NotFoundError(DomainError):
"""Requested entity does not exist."""
class ConflictError(DomainError):
"""Operation conflicts with the current state (e.g. duplicate, stale version)."""
class ValidationError(DomainError):
"""Input failed domain-level validation (beyond what Pydantic catches)."""
class PermissionError(DomainError):
"""Caller lacks permission for the operation."""

View File

@@ -0,0 +1,10 @@
"""Repository layer: database access.
Each aggregate gets its own module (e.g. ``dsr_repository.py``) exposing a single
class with intent-named methods. Repositories own SQLAlchemy session usage; they
do not run business logic, and they do not import anything from
``compliance.api`` or ``compliance.services``.
Phase 1 refactor target: ``compliance.db.repository`` (1547 lines) is being
decomposed into per-aggregate modules under this package.
"""

View File

@@ -0,0 +1,11 @@
"""Pydantic schemas, split per domain.
Phase 1 refactor target: the monolithic ``compliance.api.schemas`` module (1899 lines)
is being decomposed into one module per domain under this package. Until every domain
has been migrated, ``compliance.api.schemas`` re-exports from here so existing imports
continue to work unchanged.
New code MUST import from the specific domain module (e.g.
``from compliance.schemas.dsr import DSRRequestCreate``) rather than from
``compliance.api.schemas``.
"""

View File

@@ -0,0 +1,37 @@
# breakpilot-compliance-sdk
TypeScript SDK monorepo providing React, Angular, Vue, vanilla JS, and core bindings for the BreakPilot Compliance backend. Published as npm packages.
**Stack:** TypeScript, workspaces (`packages/core`, `packages/react`, `packages/angular`, `packages/vanilla`, `packages/types`).
## Layout
```
packages/
├── core/ # Framework-agnostic client + state
├── types/ # Shared type definitions
├── react/ # React Provider + hooks
├── angular/ # Angular service
└── vanilla/ # Vanilla-JS embed script
```
## Architecture
Follow `../AGENTS.typescript.md`. No framework-specific code in `core/`.
## Build + test
```bash
npm install
npm run build # per-workspace build
npm test # Vitest (Phase 4 adds coverage — currently 0 tests)
```
## Known debt (Phase 4)
- `packages/vanilla/src/embed.ts` (611), `packages/react/src/provider.tsx` (539), `packages/core/src/client.ts` (521), `packages/react/src/hooks.ts` (474) — split.
- **Zero test coverage.** Priority Phase 4 target.
## Don't touch
Public API surface of `core` without bumping package major version and updating consumers.

View File

@@ -0,0 +1,30 @@
# compliance-tts-service
Python service generating German-language audio/video training materials using Piper TTS + FFmpeg. Outputs are stored in Hetzner Object Storage (S3-compatible).
**Port:** `8095` (container: `bp-compliance-tts`)
**Stack:** Python 3.12, Piper TTS (`de_DE-thorsten-high.onnx`), FFmpeg, boto3.
## Files
- `main.py` — FastAPI entrypoint
- `tts_engine.py` — Piper wrapper
- `video_generator.py` — FFmpeg pipeline
- `storage.py` — S3 client
## Run locally
```bash
cd compliance-tts-service
pip install -r requirements.txt
# Piper model + ffmpeg must be available on PATH
uvicorn main:app --reload --port 8095
```
## Tests
0 test files today. Phase 4 adds unit tests for the synthesis pipeline (mocked Piper + FFmpeg) and the S3 client.
## Architecture
Follow `../AGENTS.python.md`. Keep the Piper model loading behind a single service instance — not loaded per request.

View File

@@ -0,0 +1,26 @@
# developer-portal
Next.js 15 public API documentation portal — integration guides, SDK docs, BYOEH, development phases. Consumed by external customers.
**Port:** `3006` (container: `bp-compliance-developer-portal`)
**Stack:** Next.js 15, React 18, TypeScript.
## Run locally
```bash
cd developer-portal
npm install
npm run dev
```
## Tests
0 test files today. Phase 4 adds Playwright smoke tests for each top-level page and Vitest for `lib/` helpers.
## Architecture
Follow `../AGENTS.typescript.md`. MD/MDX content should live in a data directory, not inline in `page.tsx`.
## Known debt
- `app/development/docs/page.tsx` (891), `app/development/byoeh/page.tsx` (769), and others > 300 LOC — split in Phase 4.

19
docs-src/README.md Normal file
View File

@@ -0,0 +1,19 @@
# docs-src
MkDocs-based internal documentation site — system architecture, data models, runbooks, API references.
**Port:** `8011` (container: `bp-compliance-docs`)
**Stack:** MkDocs + Material theme, served via nginx.
## Build + serve locally
```bash
cd docs-src
pip install -r requirements.txt
mkdocs serve # http://localhost:8000
mkdocs build # static output to site/
```
## Known debt (Phase 4)
- `index.md` is 9436 lines — will be split into per-topic pages with proper mkdocs nav. Target: no single markdown file >500 lines except explicit reference tables.

View File

@@ -0,0 +1,28 @@
# document-crawler
Python/FastAPI service for document ingestion and compliance gap analysis. Parses PDF, DOCX, XLSX, PPTX; runs gap analysis against compliance requirements; coordinates with `ai-compliance-sdk` via the LLM gateway; archives to `dsms-gateway`.
**Port:** `8098` (container: `bp-compliance-document-crawler`)
**Stack:** Python 3.11, FastAPI.
## Architecture
Small service — already well under the LOC budget. Follow `../AGENTS.python.md` for any additions.
## Run locally
```bash
cd document-crawler
pip install -r requirements.txt
uvicorn main:app --reload --port 8098
```
## Tests
```bash
pytest tests/ -v
```
## Public API surface
`GET /health`, document upload/parse endpoints, gap-analysis endpoints. See the OpenAPI doc at `/docs` when running.

55
dsms-gateway/README.md Normal file
View File

@@ -0,0 +1,55 @@
# dsms-gateway
Python/FastAPI gateway to the IPFS-backed document archival store. Upload, retrieve, verify, and archive legal documents with content-addressed immutability.
**Port:** `8082` (container: `bp-compliance-dsms-gateway`)
**Stack:** Python 3.11, FastAPI, IPFS (Kubo via `dsms-node`).
## Architecture (target — Phase 4)
`main.py` (467 LOC) will split into:
```
dsms_gateway/
├── main.py # FastAPI app factory, <50 LOC
├── routers/ # /documents, /legal-documents, /verify, /node
├── ipfs/ # IPFS client wrapper
├── services/ # Business logic (archive, verify)
├── schemas/ # Pydantic models
└── config.py
```
See `../AGENTS.python.md`.
## Run locally
```bash
cd dsms-gateway
pip install -r requirements.txt
export IPFS_API_URL=http://localhost:5001
uvicorn main:app --reload --port 8082
```
## Tests
```bash
pytest test_main.py -v
```
Note: the existing test file is larger than the implementation — good coverage already. Phase 4 splits both into matching module pairs.
## Public API surface
```
GET /health
GET /api/v1/documents
POST /api/v1/documents
GET /api/v1/documents/{cid}
GET /api/v1/documents/{cid}/metadata
DELETE /api/v1/documents/{cid}
POST /api/v1/legal-documents/archive
GET /api/v1/verify/{cid}
GET /api/v1/node/info
```
Every path is a contract — updating requires synchronized updates in consumers.

15
dsms-node/README.md Normal file
View File

@@ -0,0 +1,15 @@
# dsms-node
IPFS Kubo node container — distributed document storage backend for the compliance platform. Participates in the BreakPilot IPFS swarm and serves as the storage layer behind `dsms-gateway`.
**Image:** `ipfs/kubo:v0.24.0`
**Ports:** `4001` (swarm), `5001` (API), `8085` (HTTP gateway)
**Container:** `bp-compliance-dsms-node`
## Operation
No source code — this is a thin wrapper around the upstream IPFS Kubo image. Configuration is via environment and the compose file at repo root.
## Don't touch
This service is out of refactor scope. Do not modify without the infrastructure owner's sign-off.

123
scripts/check-loc.sh Executable file
View File

@@ -0,0 +1,123 @@
#!/usr/bin/env bash
# check-loc.sh — File-size budget enforcer for breakpilot-compliance.
#
# Soft target: 300 LOC. Hard cap: 500 LOC.
#
# Usage:
# scripts/check-loc.sh # scan whole repo, respect exceptions
# scripts/check-loc.sh --changed # only files changed vs origin/main
# scripts/check-loc.sh path/to/file.py # check specific files
# scripts/check-loc.sh --json # machine-readable output
#
# Exit codes:
# 0 — clean (no hard violations)
# 1 — at least one file exceeds the hard cap (500)
# 2 — invalid invocation
#
# Behavior:
# - Skips test files, generated files, vendor dirs, node_modules, .git, dist, build,
# .next, __pycache__, migrations, and anything matching .claude/rules/loc-exceptions.txt.
# - Counts non-blank, non-comment-only lines is NOT done — we count raw lines so the
# rule is unambiguous. If you want to game it with blank lines, you're missing the point.
set -euo pipefail
SOFT=300
HARD=500
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
EXCEPTIONS_FILE="$REPO_ROOT/.claude/rules/loc-exceptions.txt"
CHANGED_ONLY=0
JSON=0
TARGETS=()
for arg in "$@"; do
case "$arg" in
--changed) CHANGED_ONLY=1 ;;
--json) JSON=1 ;;
-h|--help)
sed -n '2,18p' "$0"; exit 0 ;;
-*) echo "unknown flag: $arg" >&2; exit 2 ;;
*) TARGETS+=("$arg") ;;
esac
done
# Patterns excluded from the budget regardless of path.
is_excluded() {
local f="$1"
case "$f" in
*/node_modules/*|*/.next/*|*/.git/*|*/dist/*|*/build/*|*/__pycache__/*|*/vendor/*) return 0 ;;
*/migrations/*|*/alembic/versions/*) return 0 ;;
*_test.go|*.test.ts|*.test.tsx|*.spec.ts|*.spec.tsx) return 0 ;;
*/tests/*|*/test/*) return 0 ;;
*.md|*.json|*.yaml|*.yml|*.lock|*.sum|*.mod|*.toml|*.cfg|*.ini) return 0 ;;
*.svg|*.png|*.jpg|*.jpeg|*.gif|*.ico|*.pdf|*.woff|*.woff2|*.ttf) return 0 ;;
*.generated.*|*.gen.*|*_pb.go|*_pb2.py|*.pb.go) return 0 ;;
esac
return 1
}
is_in_exceptions() {
[[ -f "$EXCEPTIONS_FILE" ]] || return 1
local rel="${1#$REPO_ROOT/}"
grep -Fxq "$rel" "$EXCEPTIONS_FILE"
}
collect_targets() {
if (( ${#TARGETS[@]} > 0 )); then
printf '%s\n' "${TARGETS[@]}"
elif (( CHANGED_ONLY )); then
git -C "$REPO_ROOT" diff --name-only --diff-filter=AM origin/main...HEAD 2>/dev/null \
|| git -C "$REPO_ROOT" diff --name-only --diff-filter=AM HEAD
else
git -C "$REPO_ROOT" ls-files
fi
}
violations_hard=()
violations_soft=()
while IFS= read -r f; do
[[ -z "$f" ]] && continue
abs="$f"
[[ "$abs" != /* ]] && abs="$REPO_ROOT/$f"
[[ -f "$abs" ]] || continue
is_excluded "$abs" && continue
is_in_exceptions "$abs" && continue
loc=$(wc -l < "$abs" | tr -d ' ')
if (( loc > HARD )); then
violations_hard+=("$loc $f")
elif (( loc > SOFT )); then
violations_soft+=("$loc $f")
fi
done < <(collect_targets)
if (( JSON )); then
printf '{"hard":['
first=1; for v in "${violations_hard[@]}"; do
loc="${v%% *}"; path="${v#* }"
(( first )) || printf ','; first=0
printf '{"loc":%s,"path":"%s"}' "$loc" "$path"
done
printf '],"soft":['
first=1; for v in "${violations_soft[@]}"; do
loc="${v%% *}"; path="${v#* }"
(( first )) || printf ','; first=0
printf '{"loc":%s,"path":"%s"}' "$loc" "$path"
done
printf ']}\n'
else
if (( ${#violations_soft[@]} > 0 )); then
echo "::warning:: $((${#violations_soft[@]})) file(s) exceed soft target ($SOFT lines):"
printf ' %s\n' "${violations_soft[@]}" | sort -rn
fi
if (( ${#violations_hard[@]} > 0 )); then
echo "::error:: $((${#violations_hard[@]})) file(s) exceed HARD CAP ($HARD lines) — split required:"
printf ' %s\n' "${violations_hard[@]}" | sort -rn
echo
echo "If a file legitimately must exceed $HARD lines (generated code, large data tables),"
echo "add it to .claude/rules/loc-exceptions.txt with a one-line rationale comment above it."
fi
fi
(( ${#violations_hard[@]} == 0 ))

55
scripts/githooks/pre-commit Executable file
View File

@@ -0,0 +1,55 @@
#!/usr/bin/env bash
# pre-commit — enforces breakpilot-compliance structural guardrails.
#
# 1. Blocks commits that introduce a non-test, non-generated source file > 500 LOC.
# 2. Blocks commits that touch backend-compliance/migrations/ unless the commit message
# contains the marker [migration-approved] (last-resort escape hatch).
# 3. Blocks edits to .claude/settings.json, scripts/check-loc.sh, or
# .claude/rules/loc-exceptions.txt unless [guardrail-change] is in the commit message.
#
# Bypass with --no-verify is intentionally NOT supported by the team workflow.
# CI re-runs all of these on the server side anyway.
set -euo pipefail
REPO_ROOT="$(git rev-parse --show-toplevel)"
mapfile -t staged < <(git diff --cached --name-only --diff-filter=ACM)
[[ ${#staged[@]} -eq 0 ]] && exit 0
# 1. LOC budget on staged files.
loc_targets=()
for f in "${staged[@]}"; do
[[ -f "$REPO_ROOT/$f" ]] && loc_targets+=("$REPO_ROOT/$f")
done
if [[ ${#loc_targets[@]} -gt 0 ]]; then
if ! "$REPO_ROOT/scripts/check-loc.sh" "${loc_targets[@]}"; then
echo
echo "Commit blocked: file-size budget violated. See output above."
echo "Either split the file (preferred) or add an exception with rationale to"
echo " .claude/rules/loc-exceptions.txt"
exit 1
fi
fi
# 2. Migration directories are frozen unless explicitly approved.
if printf '%s\n' "${staged[@]}" | grep -qE '(^|/)(migrations|alembic/versions)/'; then
if ! git log --format=%B -n 1 HEAD 2>/dev/null | grep -q '\[migration-approved\]' \
&& ! grep -q '\[migration-approved\]' "$(git rev-parse --git-dir)/COMMIT_EDITMSG" 2>/dev/null; then
echo "Commit blocked: this change touches a migrations directory."
echo "Database schema changes require an explicit migration plan reviewed by the DB owner."
echo "If approved, add '[migration-approved]' to your commit message."
exit 1
fi
fi
# 3. Guardrail files are protected.
guarded='^(\.claude/settings\.json|\.claude/rules/loc-exceptions\.txt|scripts/check-loc\.sh|scripts/githooks/pre-commit|AGENTS\.(python|go|typescript)\.md)$'
if printf '%s\n' "${staged[@]}" | grep -qE "$guarded"; then
if ! grep -q '\[guardrail-change\]' "$(git rev-parse --git-dir)/COMMIT_EDITMSG" 2>/dev/null; then
echo "Commit blocked: this change modifies guardrail files."
echo "If intentional, add '[guardrail-change]' to your commit message and explain why in the body."
exit 1
fi
fi
exit 0

26
scripts/install-hooks.sh Executable file
View File

@@ -0,0 +1,26 @@
#!/usr/bin/env bash
# install-hooks.sh — installs git hooks that enforce repo guardrails locally.
# Idempotent. Safe to re-run.
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
HOOKS_DIR="$REPO_ROOT/.git/hooks"
SRC_DIR="$REPO_ROOT/scripts/githooks"
if [[ ! -d "$REPO_ROOT/.git" ]]; then
echo "Not a git repository: $REPO_ROOT" >&2
exit 1
fi
mkdir -p "$HOOKS_DIR"
for hook in pre-commit; do
src="$SRC_DIR/$hook"
dst="$HOOKS_DIR/$hook"
if [[ -f "$src" ]]; then
cp "$src" "$dst"
chmod +x "$dst"
echo "installed: $dst"
fi
done
echo "Done. Hooks active for this clone."