diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 228f6b8..664642d 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -1,5 +1,17 @@ # BreakPilot Compliance - DSGVO/AI-Act SDK Platform +> **NON-NEGOTIABLE STRUCTURE RULES** (enforced by `.claude/settings.json` hook, git pre-commit, and CI): +> 1. **File-size budget:** soft target **300** lines, **hard cap 500** lines for any non-test, non-generated source file. Anything larger → split it. Exceptions are listed in `.claude/rules/loc-exceptions.txt` and require a written rationale. +> 2. **Clean architecture per service.** Routers/handlers stay thin (≤30 lines per handler) and delegate to services; services use repositories; repositories own DB I/O. See `AGENTS.python.md` / `AGENTS.go.md` / `AGENTS.typescript.md`. +> 3. **Do not touch the database schema.** No new Alembic migrations, no `ALTER TABLE`, no model field renames without an explicit migration plan reviewed by the DB owner. SQLAlchemy `__tablename__` and column names are frozen. +> 4. **Public endpoints are a contract.** Any change to a path, method, status code, request schema, or response schema in `backend-compliance/`, `ai-compliance-sdk/`, `dsms-gateway/`, `document-crawler/`, or `compliance-tts-service/` must be accompanied by a matching update in **every** consumer (`admin-compliance/`, `developer-portal/`, `breakpilot-compliance-sdk/`, `consent-sdk/`). Use the OpenAPI snapshot tests in `tests/contracts/` as the gate. +> 5. **Tests are not optional.** New code without tests fails CI. Refactors must preserve coverage and add a characterization test before splitting an oversized file. +> 6. **Do not bypass the guardrails.** Do not edit `.claude/settings.json`, `scripts/check-loc.sh`, or the loc-exceptions list to silence violations. If a rule is wrong, raise it in a PR description. +> +> These rules apply to **every** Claude Code session opened inside this repository, regardless of who launched it. They are loaded automatically via this `CLAUDE.md`. + + + ## Entwicklungsumgebung (WICHTIG - IMMER ZUERST LESEN) ### Zwei-Rechner-Setup + Hetzner diff --git a/.claude/rules/architecture.md b/.claude/rules/architecture.md new file mode 100644 index 0000000..26c6c36 --- /dev/null +++ b/.claude/rules/architecture.md @@ -0,0 +1,43 @@ +# Architecture Rules (auto-loaded) + +These rules apply to **every** Claude Code session in this repository, regardless of who launched it. They are non-negotiable. + +## File-size budget + +- **Soft target:** 300 lines per non-test, non-generated source file. +- **Hard cap:** 500 lines. The PreToolUse hook in `.claude/settings.json` blocks Write/Edit operations that would create or push a file past 500. The git pre-commit hook re-checks. CI is the final gate. +- Exceptions live in `.claude/rules/loc-exceptions.txt` and require a written rationale plus `[guardrail-change]` in the commit message. The exceptions list should shrink over time, not grow. + +## Clean architecture + +- Python (FastAPI): see `AGENTS.python.md`. Layering: `api → services → repositories → db.models`. Routers ≤30 LOC per handler. Schemas split per domain. +- Go (Gin): see `AGENTS.go.md`. Standard Go Project Layout + hexagonal. `cmd/` thin, wiring in `internal/app`. +- TypeScript (Next.js): see `AGENTS.typescript.md`. Server-by-default, push the client boundary deep, colocate `_components/` and `_hooks/` per route. + +## Database is frozen + +- No new Alembic migrations. No `ALTER TABLE`. No `__tablename__` or column renames. +- The pre-commit hook blocks any change under `migrations/` or `alembic/versions/` unless the commit message contains `[migration-approved]`. + +## Public endpoints are a contract + +- Any change to a path/method/status/request schema/response schema in a backend service must update every consumer in the same change set. +- Each backend service has an OpenAPI baseline at `tests/contracts/openapi.baseline.json`. Contract tests fail on drift. + +## Tests + +- New code without tests fails CI. +- Refactors must preserve coverage. Before splitting an oversized file, add a characterization test that pins current behavior. +- Layout: `tests/unit/`, `tests/integration/`, `tests/contracts/`, `tests/e2e/`. + +## Guardrails are themselves protected + +- Edits to `.claude/settings.json`, `scripts/check-loc.sh`, `scripts/githooks/pre-commit`, `.claude/rules/loc-exceptions.txt`, or any `AGENTS.*.md` require `[guardrail-change]` in the commit message. The pre-commit hook enforces this. +- If you (Claude) think a rule is wrong, surface it to the user. Do not silently weaken it. + +## Tooling baseline + +- Python: `ruff`, `mypy --strict` on new modules, `pytest --cov`. +- Go: `golangci-lint` strict config, `go vet`, table-driven tests. +- TS: `tsc --noEmit` strict, ESLint type-aware, Vitest, Playwright. +- All three: dependency caching in CI, license/SBOM scan via `syft`+`grype`. diff --git a/.claude/rules/loc-exceptions.txt b/.claude/rules/loc-exceptions.txt new file mode 100644 index 0000000..b2ccc89 --- /dev/null +++ b/.claude/rules/loc-exceptions.txt @@ -0,0 +1,8 @@ +# loc-exceptions.txt — files allowed to exceed the 500-line hard cap. +# +# Format: one repo-relative path per line. Comments start with '#' and are ignored. +# Each exception MUST be preceded by a comment explaining why splitting is not viable. +# +# Phase 0 baseline: this list is initially empty. Phases 1-4 will add grandfathered +# entries as we encounter legitimate exceptions (e.g. large generated data tables). +# The goal is for this list to SHRINK over time, never grow. diff --git a/.claude/settings.json b/.claude/settings.json new file mode 100644 index 0000000..0d899bc --- /dev/null +++ b/.claude/settings.json @@ -0,0 +1,28 @@ +{ + "hooks": { + "PreToolUse": [ + { + "matcher": "Write", + "hooks": [ + { + "type": "command", + "command": "f=$(jq -r '.tool_input.file_path // empty'); [ -z \"$f\" ] && exit 0; lines=$(printf '%s' \"$(jq -r '.tool_input.content // empty')\" | awk 'END{print NR}'); if [ \"${lines:-0}\" -gt 500 ]; then echo '{\"decision\":\"block\",\"reason\":\"breakpilot guardrail: file exceeds the 500-line hard cap. Split it into smaller modules per the layering rules in AGENTS..md. If this is generated/data code, add an entry to .claude/rules/loc-exceptions.txt with rationale and reference [guardrail-change].\"}'; exit 0; fi", + "shell": "bash", + "timeout": 5 + } + ] + }, + { + "matcher": "Edit", + "hooks": [ + { + "type": "command", + "command": "f=$(jq -r '.tool_input.file_path // empty'); [ -z \"$f\" ] || [ ! -f \"$f\" ] && exit 0; case \"$f\" in *.md|*.json|*.yaml|*.yml|*test*|*tests/*|*node_modules/*|*.next/*|*migrations/*) exit 0 ;; esac; new_str=$(jq -r '.tool_input.new_string // empty'); old_str=$(jq -r '.tool_input.old_string // empty'); old_lines=$(printf '%s' \"$old_str\" | awk 'END{print NR}'); new_lines=$(printf '%s' \"$new_str\" | awk 'END{print NR}'); cur=$(wc -l < \"$f\" | tr -d ' '); proj=$((cur - old_lines + new_lines)); if [ \"$proj\" -gt 500 ]; then echo \"{\\\"decision\\\":\\\"block\\\",\\\"reason\\\":\\\"breakpilot guardrail: this edit would push $f to ~$proj lines (hard cap is 500). Split the file before continuing. See AGENTS..md for the layering rules.\\\"}\"; fi; exit 0", + "shell": "bash", + "timeout": 5 + } + ] + } + ] + } +} diff --git a/.gitea/workflows/ci.yaml b/.gitea/workflows/ci.yaml index d706806..fd10d5d 100644 --- a/.gitea/workflows/ci.yaml +++ b/.gitea/workflows/ci.yaml @@ -19,6 +19,55 @@ on: branches: [main, develop] jobs: + # ======================================== + # Guardrails — LOC budget + architecture gates + # Runs on every push/PR. Fails fast and cheap. + # ======================================== + + loc-budget: + runs-on: docker + container: alpine:3.20 + steps: + - name: Checkout + run: | + apk add --no-cache git bash + git clone --depth 50 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git . + - name: Enforce 500-line hard cap on changed files + run: | + chmod +x scripts/check-loc.sh + if [ "${GITHUB_EVENT_NAME}" = "pull_request" ]; then + git fetch origin ${GITHUB_BASE_REF}:base + mapfile -t changed < <(git diff --name-only --diff-filter=ACM base...HEAD) + [ ${#changed[@]} -eq 0 ] && { echo "No changed files."; exit 0; } + scripts/check-loc.sh "${changed[@]}" + else + # Push to main: only warn on whole-repo state; blocking gate is on PRs. + scripts/check-loc.sh || true + fi + # Phase 0 intentionally gates only changed files so the 205-file legacy + # baseline doesn't block every PR. Phases 1-4 drain the baseline; Phase 5 + # flips this to a whole-repo blocking gate. + + guardrail-integrity: + runs-on: docker + container: alpine:3.20 + if: github.event_name == 'pull_request' + steps: + - name: Checkout + run: | + apk add --no-cache git bash + git clone --depth 20 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git . + git fetch origin ${GITHUB_BASE_REF}:base + - name: Require [guardrail-change] label in PR commits touching guardrails + run: | + changed=$(git diff --name-only base...HEAD) + echo "$changed" | grep -E '^(\.claude/settings\.json|\.claude/rules/loc-exceptions\.txt|scripts/check-loc\.sh|scripts/githooks/pre-commit|AGENTS\.(python|go|typescript)\.md)$' || exit 0 + if ! git log base..HEAD --format=%B | grep -q '\[guardrail-change\]'; then + echo "::error:: Guardrail files were modified but no commit in this PR carries [guardrail-change]." + echo "If intentional, amend one commit message with [guardrail-change] and explain why in the body." + exit 1 + fi + # ======================================== # Lint (nur bei PRs) # ======================================== @@ -47,13 +96,29 @@ jobs: run: | apt-get update -qq && apt-get install -y -qq git > /dev/null 2>&1 git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git . - - name: Lint Python services + - name: Lint Python services (ruff) run: | pip install --quiet ruff - for svc in backend-compliance document-crawler dsms-gateway; do + fail=0 + for svc in backend-compliance document-crawler dsms-gateway compliance-tts-service; do if [ -d "$svc" ]; then - echo "=== Linting $svc ===" - ruff check "$svc/" --output-format=github || true + echo "=== ruff: $svc ===" + ruff check "$svc/" --output-format=github || fail=1 + fi + done + exit $fail + - name: Type-check new modules (mypy --strict) + # Scoped to the layered packages we own. Expand this list as Phase 1+ refactors land. + run: | + pip install --quiet mypy + for pkg in \ + backend-compliance/compliance/services \ + backend-compliance/compliance/repositories \ + backend-compliance/compliance/domain \ + backend-compliance/compliance/schemas; do + if [ -d "$pkg" ]; then + echo "=== mypy --strict: $pkg ===" + mypy --strict --ignore-missing-imports "$pkg" || exit 1 fi done @@ -66,17 +131,20 @@ jobs: run: | apk add --no-cache git git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git . - - name: Lint Node.js services + - name: Lint + type-check Node.js services run: | + fail=0 for svc in admin-compliance developer-portal; do if [ -d "$svc" ]; then - echo "=== Linting $svc ===" - cd "$svc" - npm ci --silent 2>/dev/null || npm install --silent - npx next lint || true - cd .. + echo "=== $svc: install ===" + (cd "$svc" && (npm ci --silent 2>/dev/null || npm install --silent)) + echo "=== $svc: next lint ===" + (cd "$svc" && npx next lint) || fail=1 + echo "=== $svc: tsc --noEmit ===" + (cd "$svc" && npx tsc --noEmit) || fail=1 fi done + exit $fail # ======================================== # Unit Tests @@ -169,6 +237,32 @@ jobs: pip install --quiet --no-cache-dir pytest pytest-asyncio python -m pytest test_main.py -v --tb=short + # ======================================== + # SBOM + license scan (compliance product → we eat our own dog food) + # ======================================== + + sbom-scan: + runs-on: docker + if: github.event_name == 'pull_request' + container: alpine:3.20 + steps: + - name: Checkout + run: | + apk add --no-cache git curl bash + git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git . + - name: Install syft + grype + run: | + curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin + curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin + - name: Generate SBOM + run: | + mkdir -p sbom-out + syft dir:. -o cyclonedx-json=sbom-out/sbom.cdx.json -q + - name: Vulnerability scan (fail on high+) + run: | + grype sbom:sbom-out/sbom.cdx.json --fail-on high -q || true + # Initially non-blocking ('|| true'). Flip to blocking after baseline is clean. + # ======================================== # Validate Canonical Controls # ======================================== @@ -194,6 +288,7 @@ jobs: runs-on: docker if: github.event_name == 'push' && github.ref == 'refs/heads/main' needs: + - loc-budget - test-go-ai-compliance - test-python-backend-compliance - test-python-document-crawler diff --git a/AGENTS.go.md b/AGENTS.go.md new file mode 100644 index 0000000..3c234b9 --- /dev/null +++ b/AGENTS.go.md @@ -0,0 +1,126 @@ +# AGENTS.go.md — Go Service Conventions + +Applies to: `ai-compliance-sdk/`. + +## Layered architecture (Gin) + +Follows [Standard Go Project Layout](https://github.com/golang-standards/project-layout) + hexagonal/clean-arch. + +``` +ai-compliance-sdk/ +├── cmd/server/main.go # Thin: parse flags → app.New → app.Run. <50 LOC. +├── internal/ +│ ├── app/ # Wiring: config + DI graph + lifecycle. +│ ├── domain/ # Pure types, interfaces, errors. No I/O imports. +│ │ └── / +│ ├── service/ # Business logic. Depends on domain interfaces only. +│ │ └── / +│ ├── repository/postgres/ # Concrete repo implementations. +│ │ └── / +│ ├── transport/http/ # Gin handlers. Thin. One handler per file group. +│ │ ├── handler// +│ │ ├── middleware/ +│ │ └── router.go +│ └── platform/ # DB pool, logger, config, tracing. +└── pkg/ # Importable by other repos. Empty unless needed. +``` + +**Dependency direction:** `transport → service → domain ← repository`. `domain` imports nothing from siblings. + +## Handlers + +- One handler = one Gin function. ≤40 LOC. +- Bind → call service → map domain error to HTTP via `httperr.Write(c, err)` → respond. +- Return early on errors. No business logic, no SQL. + +```go +func (h *IACEHandler) Create(c *gin.Context) { + var req CreateIACERequest + if err := c.ShouldBindJSON(&req); err != nil { + httperr.Write(c, httperr.BadRequest(err)) + return + } + out, err := h.svc.Create(c.Request.Context(), req.ToInput()) + if err != nil { + httperr.Write(c, err) + return + } + c.JSON(http.StatusCreated, out) +} +``` + +## Services + +- Struct + constructor + interface methods. No package-level state. +- Take `context.Context` as first arg always. Propagate to repos. +- Return `(value, error)`. Wrap with `fmt.Errorf("create iace: %w", err)`. +- Domain errors implemented as sentinel vars or typed errors; matched with `errors.Is` / `errors.As`. + +## Repositories + +- Interface lives in `domain//repository.go`. Implementation in `repository/postgres//`. +- One file per query group; no file >500 LOC. +- Use `pgx`/`sqlc` over hand-rolled string SQL when feasible. No ORM globals. +- All queries take `ctx`. No background goroutines without explicit lifecycle. + +## Errors + +Single `internal/platform/httperr` package maps `error` → HTTP status: + +```go +switch { +case errors.Is(err, domain.ErrNotFound): return 404 +case errors.Is(err, domain.ErrConflict): return 409 +case errors.As(err, &validationErr): return 422 +default: return 500 +} +``` + +Never `panic` in request handling. `recover` middleware logs and returns 500. + +## Tests + +- Co-located `*_test.go`. +- **Table-driven** tests for service logic; use `t.Run(tt.name, ...)`. +- Handlers tested with `httptest.NewRecorder`. +- Repos tested with `testcontainers-go` (or the existing compose Postgres) — never mocks at the SQL boundary. +- Coverage target: 80% on `service/`. CI fails on regression. + +```go +func TestIACEService_Create(t *testing.T) { + tests := []struct { + name string + input service.CreateInput + setup func(*mockRepo) + wantErr error + }{ + {"happy path", validInput(), func(r *mockRepo) { r.createReturns(nil) }, nil}, + {"conflict", validInput(), func(r *mockRepo) { r.createReturns(domain.ErrConflict) }, domain.ErrConflict}, + } + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { /* ... */ }) + } +} +``` + +## Tooling + +- `golangci-lint` with: `errcheck, govet, staticcheck, revive, gosec, gocyclo (max 15), gocognit (max 20), unused, ineffassign, errorlint, nilerr, nolintlint, contextcheck`. +- `gofumpt` formatting. +- `go vet ./...` clean. +- `go mod tidy` clean — no unused deps. + +## Concurrency + +- Goroutines must have a clear lifecycle owner (struct method that started them must stop them). +- Pass `ctx` everywhere. Cancellation respected. +- No global mutexes for request data. Use per-request context. + +## What you may NOT do + +- Touch DB schema/migrations. +- Add a new top-level package directly under `internal/` without architectural review. +- `import "C"`, unsafe, reflection-heavy code. +- Use `init()` for non-trivial setup. Wire it in `internal/app`. +- Create a file >500 lines. +- Change a public route's contract without updating consumers. diff --git a/AGENTS.python.md b/AGENTS.python.md new file mode 100644 index 0000000..bc24bab --- /dev/null +++ b/AGENTS.python.md @@ -0,0 +1,94 @@ +# AGENTS.python.md — Python Service Conventions + +Applies to: `backend-compliance/`, `document-crawler/`, `dsms-gateway/`, `compliance-tts-service/`. + +## Layered architecture (FastAPI) + +``` +compliance/ +├── api/ # HTTP layer — routers only. Thin (≤30 LOC per handler). +│ └── _routes.py +├── services/ # Business logic. Pure-ish; no FastAPI imports. +│ └── _service.py +├── repositories/ # DB access. Owns SQLAlchemy session usage. +│ └── _repository.py +├── domain/ # Value objects, enums, domain exceptions. +├── schemas/ # Pydantic models, split per domain. NEVER one giant schemas.py. +│ └── .py +└── db/ + └── models/ # SQLAlchemy ORM, one module per aggregate. __tablename__ frozen. +``` + +**Dependency direction:** `api → services → repositories → db.models`. Lower layers must not import upper layers. + +## Routers + +- One `APIRouter` per domain file. +- Handlers do exactly: parse request → call service → map domain errors to HTTPException → return response model. +- Inject services via `Depends`. No globals. +- Tag routes; document with summary + response_model. + +```python +@router.post("/dsr/requests", response_model=DSRRequestRead, status_code=201) +async def create_dsr_request( + payload: DSRRequestCreate, + service: DSRService = Depends(get_dsr_service), + tenant_id: UUID = Depends(get_tenant_id), +) -> DSRRequestRead: + try: + return await service.create(tenant_id, payload) + except DSRConflict as exc: + raise HTTPException(409, str(exc)) from exc +``` + +## Services + +- Constructor takes the repository (interface, not concrete). +- No `Request`, `Response`, or HTTP knowledge. +- Raise domain exceptions (e.g. `DSRConflict`, `DSRNotFound`), never `HTTPException`. +- Return domain objects or Pydantic schemas — pick one and stay consistent inside a service. + +## Repositories + +- Methods are intent-named (`get_pending_for_tenant`), not CRUD-named (`select_where`). +- Sessions injected, not constructed inside. +- No business logic. No cross-aggregate joins for unrelated workflows — that belongs in a service. +- Return ORM models or domain VOs; never `Row`. + +## Schemas (Pydantic v2) + +- One module per domain. Module ≤300 lines. +- Use `model_config = ConfigDict(from_attributes=True, frozen=True)` for read models. +- Separate `*Create`, `*Update`, `*Read`. No giant union schemas. + +## Tests (`pytest`) + +- Layout: `tests/unit/`, `tests/integration/`, `tests/contracts/`. +- Unit tests mock the repository. Use `pytest.fixture` + `unittest.mock.AsyncMock`. +- Integration tests run against the real Postgres from `docker-compose.yml` via a transactional fixture (rollback after each test). +- Contract tests diff `/openapi.json` against `tests/contracts/openapi.baseline.json`. +- Naming: `test___.py::TestClass::test_method`. +- `pytest-asyncio` mode = `auto`. Mark slow tests with `@pytest.mark.slow`. +- Coverage target: 80% for new code; never decrease the service baseline. + +## Tooling + +- `ruff check` + `ruff format` (line length 100). +- `mypy --strict` on `services/`, `repositories/`, `domain/`. Expand outward. +- `pip-audit` in CI. +- Async-first: prefer `httpx.AsyncClient`, `asyncpg`/`SQLAlchemy 2.x async`. + +## Errors & logging + +- Domain errors inherit from a single `DomainError` base per service. +- Log via `structlog` with bound context (`tenant_id`, `request_id`). Never log secrets, PII, or full request bodies. +- Audit-relevant actions go through the audit logger, not the application logger. + +## What you may NOT do + +- Add a new Alembic migration. +- Rename a `__tablename__`, column, or enum value. +- Change a public route's path/method/status/schema without simultaneous dashboard fix. +- Catch `Exception` broadly — catch the specific domain or library error. +- Put business logic in a router or in a Pydantic validator. +- Create a new file >500 lines. Period. diff --git a/AGENTS.typescript.md b/AGENTS.typescript.md new file mode 100644 index 0000000..6359199 --- /dev/null +++ b/AGENTS.typescript.md @@ -0,0 +1,85 @@ +# AGENTS.typescript.md — TypeScript / Next.js Conventions + +Applies to: `admin-compliance/`, `developer-portal/`, `breakpilot-compliance-sdk/`, `consent-sdk/`, `dsms-node/` (where applicable). + +## Layered architecture (Next.js 15 App Router) + +``` +app/ +├── / +│ ├── page.tsx # Server Component by default. ≤200 LOC. +│ ├── layout.tsx +│ ├── _components/ # Private folder; not routable. Colocated UI. +│ │ └── .tsx # Each file ≤300 LOC. +│ ├── _hooks/ # Client hooks for this route. +│ ├── _server/ # Server actions, data loaders for this route. +│ └── loading.tsx / error.tsx +├── api/ +│ └── /route.ts # Thin handler. Delegates to lib/server//. +lib/ +├── / # Pure helpers, types, schemas (zod). Reusable. +└── server// # Server-only logic; uses "server-only" import. +components/ # Truly shared, app-wide components. +``` + +**Server vs Client:** Default is Server Component. Add `"use client"` only when you need state, effects, or browser APIs. Push the boundary as deep as possible. + +## API routes (route.ts) + +- One handler per HTTP method, ≤40 LOC. +- Validate input with `zod`. Reject invalid → 400. +- Delegate to `lib/server//`. No business logic in `route.ts`. +- Always return `NextResponse.json(..., { status })`. Never throw to the framework. + +```ts +export async function POST(req: Request) { + const parsed = CreateDSRSchema.safeParse(await req.json()); + if (!parsed.success) return NextResponse.json({ error: parsed.error.flatten() }, { status: 400 }); + const result = await dsrService.create(parsed.data); + return NextResponse.json(result, { status: 201 }); +} +``` + +## Page components + +- Pages >300 lines must be split into colocated `_components/`. +- Server Components fetch data; pass plain objects to Client Components. +- No data fetching in `useEffect` for server-renderable data. +- State management: prefer URL state (`searchParams`) and Server Components over global stores. + +## Types + +- `lib/sdk/types.ts` is being split into `lib/sdk/types/.ts`. Mirror backend domain boundaries. +- All API DTOs are zod schemas; infer types via `z.infer`. +- No `any`. No `as unknown as`. If you reach for it, the type is wrong. + +## Tests + +- Unit: **Vitest** (`*.test.ts`/`*.test.tsx`), colocated. +- Hooks: `@testing-library/react` `renderHook`. +- E2E: **Playwright** (`tests/e2e/`), one spec per top-level page, smoke happy path minimum. +- Snapshot tests sparingly — only for stable output (CSV, JSON-LD). +- Coverage target: 70% on `lib/`, smoke coverage on `app/`. + +## Tooling + +- `tsc --noEmit` clean (strict mode, `noUncheckedIndexedAccess: true`). +- ESLint with `@typescript-eslint`, `eslint-config-next`, type-aware rules on. +- `prettier`. +- `next build` clean. No `// @ts-ignore`. `// @ts-expect-error` only with a comment explaining why. + +## Performance + +- Use `next/dynamic` for heavy client-only components. +- Image: `next/image` with explicit width/height. +- Avoid waterfalls — `Promise.all` for parallel data fetches in Server Components. + +## What you may NOT do + +- Put business logic in a `page.tsx` or `route.ts`. +- Reach across module boundaries (e.g. `admin-compliance` importing from `developer-portal`). +- Use `dangerouslySetInnerHTML` without explicit sanitization. +- Call backend APIs directly from Client Components when a Server Component or Server Action would do. +- Change a public API route's path/method/schema without updating SDK consumers in the same change. +- Create a file >500 lines. +- Disable a lint or type rule globally to silence a finding — fix the root cause. diff --git a/admin-compliance/README.md b/admin-compliance/README.md new file mode 100644 index 0000000..57f43f6 --- /dev/null +++ b/admin-compliance/README.md @@ -0,0 +1,51 @@ +# admin-compliance + +Next.js 15 dashboard for BreakPilot Compliance — SDK module UI, company profile, DSR, DSFA, VVT, TOM, consent, AI Act, training, audit, change requests, etc. Also hosts 96+ API routes that proxy/orchestrate backend services. + +**Port:** `3007` (container: `bp-compliance-admin`) +**Stack:** Next.js 15 App Router, React 18, TailwindCSS, TypeScript strict. + +## Architecture (target — Phase 3) + +``` +app/ +├── / +│ ├── page.tsx # Server Component (≤200 LOC) +│ ├── _components/ # Colocated UI, each ≤300 LOC +│ ├── _hooks/ # Client hooks +│ └── _server/ # Server actions +├── api//route.ts # Thin handlers → lib/server// +lib/ +├── / # Pure helpers, zod schemas +└── server// # "server-only" logic +components/ # App-wide shared UI +``` + +See `../AGENTS.typescript.md`. + +## Run locally + +```bash +cd admin-compliance +npm install +npm run dev # http://localhost:3007 +``` + +## Tests + +```bash +npm test # Vitest unit + component tests +npx playwright test # E2E +npx tsc --noEmit # Type-check +npx next lint +``` + +## Known debt (Phase 3 targets) + +- `app/sdk/company-profile/page.tsx` (3017 LOC), `tom-generator/controls/loader.ts` (2521), `lib/sdk/types.ts` (2511), `app/sdk/loeschfristen/page.tsx` (2322), `app/sdk/dsb-portal/page.tsx` (2068) — all must be split. +- 0 test files for 182 monolithic pages. Phase 3 adds Playwright smoke + Vitest unit coverage. + +## Don't touch + +- Backend API paths without updating `backend-compliance/` in the same change. +- `lib/sdk/types.ts` in large contiguous chunks — it's being domain-split. diff --git a/ai-compliance-sdk/README.md b/ai-compliance-sdk/README.md new file mode 100644 index 0000000..57be8ed --- /dev/null +++ b/ai-compliance-sdk/README.md @@ -0,0 +1,55 @@ +# ai-compliance-sdk + +Go/Gin service providing AI-Act compliance analysis: iACE impact assessments, UCCA rules engine, hazard library, training/academy, audit, escalation, portfolio, RBAC, RAG, whistleblower, workshop. + +**Port:** `8090` → exposed `8093` (container: `bp-compliance-ai-sdk`) +**Stack:** Go 1.24, Gin, pgx, Postgres. + +## Architecture (target — Phase 2) + +``` +cmd/server/main.go # Thin entrypoint (<50 LOC) +internal/ +├── app/ # Wiring + lifecycle +├── domain// # Types, interfaces, errors +├── service// # Business logic +├── repository/postgres/ # Repo implementations +├── transport/http/ # Gin handlers + middleware + router +└── platform/ # DB pool, logger, config, httperr +``` + +See `../AGENTS.go.md` for the full convention. + +## Run locally + +```bash +cd ai-compliance-sdk +go mod download +export COMPLIANCE_DATABASE_URL=... +go run ./cmd/server +``` + +## Tests + +```bash +go test -race -cover ./... +golangci-lint run --timeout 5m ./... +``` + +Co-located `*_test.go`, table-driven. Repo layer uses testcontainers-go (or the compose Postgres) — no SQL mocks. + +## Public API surface + +Handlers under `internal/api/handlers/` (Phase 2 moves to `internal/transport/http/handler/`). Health at `GET /health`. iACE, UCCA, training, academy, portfolio, escalation, audit, rag, whistleblower, workshop subresources. Every route is a contract. + +## Environment + +| Var | Purpose | +|-----|---------| +| `COMPLIANCE_DATABASE_URL` | Postgres DSN | +| `LLM_GATEWAY_URL` | LLM router for rag/iACE | +| `QDRANT_URL` | Vector search | + +## Don't touch + +DB schema. Hand-rolled migrations elsewhere own it. diff --git a/backend-compliance/PHASE1_RUNBOOK.md b/backend-compliance/PHASE1_RUNBOOK.md new file mode 100644 index 0000000..77b20e8 --- /dev/null +++ b/backend-compliance/PHASE1_RUNBOOK.md @@ -0,0 +1,181 @@ +# Phase 1 Runbook — backend-compliance refactor + +This document is the step-by-step execution guide for Phase 1 of the repo refactor plan at `~/.claude/plans/vectorized-purring-barto.md`. It exists because the refactor must be driven from a session that can actually run `pytest` against the service, and every step must be verified green before moving to the next. + +## Prerequisites + +- Python 3.12 venv with `backend-compliance/requirements.txt` installed. +- Local Postgres reachable via `COMPLIANCE_DATABASE_URL` (use the compose db). +- Existing 48 pytest test files pass from a clean checkout: `pytest compliance/tests/ -v` → all green. **Do not proceed until this is true.** + +## Step 0 — Record the baseline + +```bash +cd backend-compliance +pytest compliance/tests/ -v --tb=short | tee /tmp/baseline.txt +pytest --cov=compliance --cov-report=term | tee /tmp/baseline-coverage.txt +python tests/contracts/regenerate_baseline.py # creates openapi.baseline.json +git add tests/contracts/openapi.baseline.json +git commit -m "phase1: pin OpenAPI baseline before refactor" +``` + +The baseline file is the contract. From this point forward, `pytest tests/contracts/` MUST stay green. + +## Step 1 — Characterization tests (before any code move) + +For each oversized route file we will refactor, add a happy-path + 1-error-path test **before** touching the source. These are called "characterization tests" and their purpose is to freeze current observable behavior so the refactor cannot change it silently. + +Oversized route files to cover (ordered by size): + +| File | LOC | Endpoints to cover | +|---|---:|---| +| `compliance/api/isms_routes.py` | 1676 | one happy + one 4xx per route | +| `compliance/api/dsr_routes.py` | 1176 | same | +| `compliance/api/vvt_routes.py` | *N* | same | +| `compliance/api/dsfa_routes.py` | *N* | same | +| `compliance/api/tom_routes.py` | *N* | same | +| `compliance/api/schemas.py` | 1899 | N/A (covered transitively) | +| `compliance/db/models.py` | 1466 | N/A (covered by existing + route tests) | +| `compliance/db/repository.py` | 1547 | add unit tests per repo class as they are extracted | + +Use `httpx.AsyncClient` + factory fixtures; see `AGENTS.python.md`. Place under `tests/integration/test__contract.py`. + +Commit: `phase1: characterization tests for routes`. + +## Step 2 — Split `compliance/db/models.py` (1466 → <500 per file) + +⚠️ **Atomic step.** A `compliance/db/models/` package CANNOT coexist with the existing `compliance/db/models.py` module — Python's import system shadows the module with the package, breaking every `from compliance.db.models import X` call. The directory skeleton was intentionally NOT pre-created for this reason. Do the following in **one commit**: + +1. Create `compliance/db/models/` directory with `__init__.py` (re-export shim — see template below). +2. Move aggregate model classes into `compliance/db/models/.py` modules. +3. Delete the old `compliance/db/models.py` file in the same commit. + +Strategy uses a **re-export shim** so no import sites change: + +1. For each aggregate, create `compliance/db/models/.py` containing the model classes. Copy verbatim; do not rename `__tablename__`, columns, or relationship strings. +2. Aggregate suggestions (verify by reading `models.py`): + - `dsr.py` (DSR requests, exports) + - `dsfa.py` + - `vvt.py` + - `tom.py` + - `ai.py` (AI systems, compliance checks) + - `consent.py` + - `evidence.py` + - `vendor.py` + - `audit.py` + - `policy.py` + - `project.py` +3. After every aggregate is moved, replace `compliance/db/models.py` with: + ```python + """Re-export shim — see compliance.db.models package.""" + from compliance.db.models.dsr import * # noqa: F401,F403 + from compliance.db.models.dsfa import * # noqa: F401,F403 + # ... one per module + ``` + This keeps `from compliance.db.models import XYZ` working everywhere it's used today. +4. Run `pytest` after every move. Green → commit. Red → revert that move and investigate. +5. Existing aggregate-level files (`compliance/db/dsr_models.py`, `vvt_models.py`, `tom_models.py`, etc.) should be folded into the new `compliance/db/models/` package in the same pass — do not leave two parallel naming conventions. + +**Do not** add `__init__.py` star-imports that change `Base.metadata` discovery order. Alembic's autogenerate depends on it. Verify via: `alembic check` if the env is set up. + +## Step 3 — Split `compliance/api/schemas.py` (1899 → per domain) + +Mirror the models split: + +1. For each domain, create `compliance/schemas/.py` with the Pydantic models. +2. Replace `compliance/api/schemas.py` with a re-export shim. +3. Keep `Create`/`Update`/`Read` variants separated; do not merge them into unions. +4. Run `pytest` + contract test after each domain. Green → commit. + +## Step 4 — Extract services (router → service delegation) + +For each route file > 500 LOC, pull handler bodies into a service class under `compliance/services/_service.py` (new-style domain services, not the utility `compliance/services/` modules that already exist — consider renaming those to `compliance/services/_legacy/` if collisions arise). + +Router handlers become: + +```python +@router.post("/dsr/requests", response_model=DSRRequestRead, status_code=201) +async def create_dsr_request( + payload: DSRRequestCreate, + service: DSRService = Depends(get_dsr_service), + tenant_id: UUID = Depends(get_tenant_id), +) -> DSRRequestRead: + try: + return await service.create(tenant_id, payload) + except ConflictError as exc: + raise HTTPException(409, str(exc)) from exc + except NotFoundError as exc: + raise HTTPException(404, str(exc)) from exc +``` + +Rules: +- Handler body ≤ 30 LOC. +- Service raises domain errors (`compliance.domain`), never `HTTPException`. +- Inject service via `Depends` on a factory that wires the repository. + +Run tests after each router is thinned. Contract test must stay green. + +## Step 5 — Extract repositories + +`compliance/db/repository.py` (1547) and `compliance/db/isms_repository.py` (838) split into: + +``` +compliance/repositories/ +├── dsr_repository.py +├── dsfa_repository.py +├── vvt_repository.py +├── isms_repository.py # <500 LOC, split if needed +└── ... +``` + +Each repository class: +- Takes `AsyncSession` (or equivalent) in constructor. +- Exposes intent-named methods (`get_pending_for_tenant`, not `select_where`). +- Returns ORM instances or domain VOs. No `Row`. +- No business logic. + +Unit-test every repo class against the compose Postgres with a transactional fixture (begin → rollback). + +## Step 6 — mypy --strict on new packages + +CI already runs `mypy --strict` against `compliance/{services,repositories,domain,schemas}/`. After every extraction, verify locally: + +```bash +mypy --strict --ignore-missing-imports compliance/schemas compliance/repositories compliance/domain compliance/services +``` + +If you have type errors, fix them in the extracted module. **Do not** add `# type: ignore` blanket waivers. If a third-party lib is poorly typed, add it to `[mypy.overrides]` in `pyproject.toml`/`mypy.ini` with a one-line rationale. + +## Step 7 — Expand test coverage + +- Unit tests per service (mocked repo). +- Integration tests per repository (real db, transactional). +- Contract test stays green. +- Target: 80% coverage on new code. Never decrease the service baseline. + +## Step 8 — Guardrail enforcement + +After Phase 1 completes, `compliance/db/models.py`, `compliance/db/repository.py`, and `compliance/api/schemas.py` are either re-export shims (≤50 LOC each) or deleted. No file in `backend-compliance/compliance/` exceeds 500 LOC. Run: + +```bash +../scripts/check-loc.sh backend-compliance/ +``` + +Any remaining hard violations → document in `.claude/rules/loc-exceptions.txt` with rationale, or keep splitting. + +## Done when + +- `pytest compliance/tests/ tests/ -v` all green. +- `pytest tests/contracts/` green — OpenAPI has no removals, no renames, no new required request fields. +- Coverage ≥ baseline. +- `mypy --strict` clean on new packages. +- `scripts/check-loc.sh backend-compliance/` reports 0 hard violations in new/touched files (legacy allowlisted in `loc-exceptions.txt` only with rationale). +- CI all green on PR. + +## Pitfalls + +- **Do not change `__tablename__` or column names.** Even a rename breaks the DB contract. +- **Do not change relationship back_populates / backref strings.** SQLAlchemy resolves these by name at mapper configuration. +- **Do not change route paths or pydantic field names.** Contract test will catch most — but JSON field aliasing (`Field(alias=...)`) is easy to break accidentally. +- **Do not eagerly reformat unrelated code.** Keep the diff reviewable. One PR per major step. +- **Do not bypass the pre-commit hook.** If a file legitimately must be >500 LOC during an intermediate step, squash commits at the end so the final state is clean. diff --git a/backend-compliance/README.md b/backend-compliance/README.md new file mode 100644 index 0000000..ea5e9f0 --- /dev/null +++ b/backend-compliance/README.md @@ -0,0 +1,55 @@ +# backend-compliance + +Python/FastAPI service implementing the DSGVO compliance API: DSR, DSFA, consent, controls, risks, evidence, audit, vendor management, ISMS, change requests, document generation. + +**Port:** `8002` (container: `bp-compliance-backend`) +**Stack:** Python 3.12, FastAPI, SQLAlchemy 2.x, Alembic, Keycloak auth. + +## Architecture (target — Phase 1) + +``` +compliance/ +├── api/ # Routers (thin, ≤30 LOC per handler) +├── services/ # Business logic +├── repositories/ # DB access +├── domain/ # Value objects, domain errors +├── schemas/ # Pydantic models, split per domain +└── db/models/ # SQLAlchemy ORM, one module per aggregate +``` + +See `../AGENTS.python.md` for the full convention and `../.claude/rules/architecture.md` for the non-negotiable rules. + +## Run locally + +```bash +cd backend-compliance +pip install -r requirements.txt +export COMPLIANCE_DATABASE_URL=... # Postgres (Hetzner or local) +uvicorn main:app --reload --port 8002 +``` + +## Tests + +```bash +pytest compliance/tests/ -v +pytest --cov=compliance --cov-report=term-missing +``` + +Layout: `tests/unit/`, `tests/integration/`, `tests/contracts/`. Contract tests diff `/openapi.json` against `tests/contracts/openapi.baseline.json`. + +## Public API surface + +404+ endpoints across `/api/v1/*`. Grouped by domain: `ai`, `audit`, `consent`, `dsfa`, `dsr`, `gdpr`, `vendor`, `evidence`, `change-requests`, `generation`, `projects`, `company-profile`, `isms`. Every path is a contract — see the "Public endpoints" rule in the root `CLAUDE.md`. + +## Environment + +| Var | Purpose | +|-----|---------| +| `COMPLIANCE_DATABASE_URL` | Postgres DSN, `sslmode=require` | +| `KEYCLOAK_*` | Auth verification | +| `QDRANT_URL`, `QDRANT_API_KEY` | Vector search | +| `CORE_VALKEY_URL` | Session cache | + +## Don't touch + +Database schema, `__tablename__`, column names, existing migrations under `migrations/`. See root `CLAUDE.md` rule 3. diff --git a/backend-compliance/compliance/domain/__init__.py b/backend-compliance/compliance/domain/__init__.py new file mode 100644 index 0000000..f836525 --- /dev/null +++ b/backend-compliance/compliance/domain/__init__.py @@ -0,0 +1,30 @@ +"""Domain layer: value objects, enums, and domain exceptions. + +Pure Python — no FastAPI, no SQLAlchemy, no HTTP concerns. Upper layers depend on +this package; it depends on nothing except the standard library and small libraries +like ``pydantic`` or ``attrs``. +""" + + +class DomainError(Exception): + """Base class for all domain-level errors. + + Services raise subclasses of this; the HTTP layer is responsible for mapping + them to status codes. Never raise ``HTTPException`` from a service. + """ + + +class NotFoundError(DomainError): + """Requested entity does not exist.""" + + +class ConflictError(DomainError): + """Operation conflicts with the current state (e.g. duplicate, stale version).""" + + +class ValidationError(DomainError): + """Input failed domain-level validation (beyond what Pydantic catches).""" + + +class PermissionError(DomainError): + """Caller lacks permission for the operation.""" diff --git a/backend-compliance/compliance/repositories/__init__.py b/backend-compliance/compliance/repositories/__init__.py new file mode 100644 index 0000000..6921516 --- /dev/null +++ b/backend-compliance/compliance/repositories/__init__.py @@ -0,0 +1,10 @@ +"""Repository layer: database access. + +Each aggregate gets its own module (e.g. ``dsr_repository.py``) exposing a single +class with intent-named methods. Repositories own SQLAlchemy session usage; they +do not run business logic, and they do not import anything from +``compliance.api`` or ``compliance.services``. + +Phase 1 refactor target: ``compliance.db.repository`` (1547 lines) is being +decomposed into per-aggregate modules under this package. +""" diff --git a/backend-compliance/compliance/schemas/__init__.py b/backend-compliance/compliance/schemas/__init__.py new file mode 100644 index 0000000..8e95f64 --- /dev/null +++ b/backend-compliance/compliance/schemas/__init__.py @@ -0,0 +1,11 @@ +"""Pydantic schemas, split per domain. + +Phase 1 refactor target: the monolithic ``compliance.api.schemas`` module (1899 lines) +is being decomposed into one module per domain under this package. Until every domain +has been migrated, ``compliance.api.schemas`` re-exports from here so existing imports +continue to work unchanged. + +New code MUST import from the specific domain module (e.g. +``from compliance.schemas.dsr import DSRRequestCreate``) rather than from +``compliance.api.schemas``. +""" diff --git a/breakpilot-compliance-sdk/README.md b/breakpilot-compliance-sdk/README.md new file mode 100644 index 0000000..8a653e6 --- /dev/null +++ b/breakpilot-compliance-sdk/README.md @@ -0,0 +1,37 @@ +# breakpilot-compliance-sdk + +TypeScript SDK monorepo providing React, Angular, Vue, vanilla JS, and core bindings for the BreakPilot Compliance backend. Published as npm packages. + +**Stack:** TypeScript, workspaces (`packages/core`, `packages/react`, `packages/angular`, `packages/vanilla`, `packages/types`). + +## Layout + +``` +packages/ +├── core/ # Framework-agnostic client + state +├── types/ # Shared type definitions +├── react/ # React Provider + hooks +├── angular/ # Angular service +└── vanilla/ # Vanilla-JS embed script +``` + +## Architecture + +Follow `../AGENTS.typescript.md`. No framework-specific code in `core/`. + +## Build + test + +```bash +npm install +npm run build # per-workspace build +npm test # Vitest (Phase 4 adds coverage — currently 0 tests) +``` + +## Known debt (Phase 4) + +- `packages/vanilla/src/embed.ts` (611), `packages/react/src/provider.tsx` (539), `packages/core/src/client.ts` (521), `packages/react/src/hooks.ts` (474) — split. +- **Zero test coverage.** Priority Phase 4 target. + +## Don't touch + +Public API surface of `core` without bumping package major version and updating consumers. diff --git a/compliance-tts-service/README.md b/compliance-tts-service/README.md new file mode 100644 index 0000000..856c5ec --- /dev/null +++ b/compliance-tts-service/README.md @@ -0,0 +1,30 @@ +# compliance-tts-service + +Python service generating German-language audio/video training materials using Piper TTS + FFmpeg. Outputs are stored in Hetzner Object Storage (S3-compatible). + +**Port:** `8095` (container: `bp-compliance-tts`) +**Stack:** Python 3.12, Piper TTS (`de_DE-thorsten-high.onnx`), FFmpeg, boto3. + +## Files + +- `main.py` — FastAPI entrypoint +- `tts_engine.py` — Piper wrapper +- `video_generator.py` — FFmpeg pipeline +- `storage.py` — S3 client + +## Run locally + +```bash +cd compliance-tts-service +pip install -r requirements.txt +# Piper model + ffmpeg must be available on PATH +uvicorn main:app --reload --port 8095 +``` + +## Tests + +0 test files today. Phase 4 adds unit tests for the synthesis pipeline (mocked Piper + FFmpeg) and the S3 client. + +## Architecture + +Follow `../AGENTS.python.md`. Keep the Piper model loading behind a single service instance — not loaded per request. diff --git a/developer-portal/README.md b/developer-portal/README.md new file mode 100644 index 0000000..31b789f --- /dev/null +++ b/developer-portal/README.md @@ -0,0 +1,26 @@ +# developer-portal + +Next.js 15 public API documentation portal — integration guides, SDK docs, BYOEH, development phases. Consumed by external customers. + +**Port:** `3006` (container: `bp-compliance-developer-portal`) +**Stack:** Next.js 15, React 18, TypeScript. + +## Run locally + +```bash +cd developer-portal +npm install +npm run dev +``` + +## Tests + +0 test files today. Phase 4 adds Playwright smoke tests for each top-level page and Vitest for `lib/` helpers. + +## Architecture + +Follow `../AGENTS.typescript.md`. MD/MDX content should live in a data directory, not inline in `page.tsx`. + +## Known debt + +- `app/development/docs/page.tsx` (891), `app/development/byoeh/page.tsx` (769), and others > 300 LOC — split in Phase 4. diff --git a/docs-src/README.md b/docs-src/README.md new file mode 100644 index 0000000..8081848 --- /dev/null +++ b/docs-src/README.md @@ -0,0 +1,19 @@ +# docs-src + +MkDocs-based internal documentation site — system architecture, data models, runbooks, API references. + +**Port:** `8011` (container: `bp-compliance-docs`) +**Stack:** MkDocs + Material theme, served via nginx. + +## Build + serve locally + +```bash +cd docs-src +pip install -r requirements.txt +mkdocs serve # http://localhost:8000 +mkdocs build # static output to site/ +``` + +## Known debt (Phase 4) + +- `index.md` is 9436 lines — will be split into per-topic pages with proper mkdocs nav. Target: no single markdown file >500 lines except explicit reference tables. diff --git a/document-crawler/README.md b/document-crawler/README.md new file mode 100644 index 0000000..185a796 --- /dev/null +++ b/document-crawler/README.md @@ -0,0 +1,28 @@ +# document-crawler + +Python/FastAPI service for document ingestion and compliance gap analysis. Parses PDF, DOCX, XLSX, PPTX; runs gap analysis against compliance requirements; coordinates with `ai-compliance-sdk` via the LLM gateway; archives to `dsms-gateway`. + +**Port:** `8098` (container: `bp-compliance-document-crawler`) +**Stack:** Python 3.11, FastAPI. + +## Architecture + +Small service — already well under the LOC budget. Follow `../AGENTS.python.md` for any additions. + +## Run locally + +```bash +cd document-crawler +pip install -r requirements.txt +uvicorn main:app --reload --port 8098 +``` + +## Tests + +```bash +pytest tests/ -v +``` + +## Public API surface + +`GET /health`, document upload/parse endpoints, gap-analysis endpoints. See the OpenAPI doc at `/docs` when running. diff --git a/dsms-gateway/README.md b/dsms-gateway/README.md new file mode 100644 index 0000000..62aa490 --- /dev/null +++ b/dsms-gateway/README.md @@ -0,0 +1,55 @@ +# dsms-gateway + +Python/FastAPI gateway to the IPFS-backed document archival store. Upload, retrieve, verify, and archive legal documents with content-addressed immutability. + +**Port:** `8082` (container: `bp-compliance-dsms-gateway`) +**Stack:** Python 3.11, FastAPI, IPFS (Kubo via `dsms-node`). + +## Architecture (target — Phase 4) + +`main.py` (467 LOC) will split into: + +``` +dsms_gateway/ +├── main.py # FastAPI app factory, <50 LOC +├── routers/ # /documents, /legal-documents, /verify, /node +├── ipfs/ # IPFS client wrapper +├── services/ # Business logic (archive, verify) +├── schemas/ # Pydantic models +└── config.py +``` + +See `../AGENTS.python.md`. + +## Run locally + +```bash +cd dsms-gateway +pip install -r requirements.txt +export IPFS_API_URL=http://localhost:5001 +uvicorn main:app --reload --port 8082 +``` + +## Tests + +```bash +pytest test_main.py -v +``` + +Note: the existing test file is larger than the implementation — good coverage already. Phase 4 splits both into matching module pairs. + +## Public API surface + +``` +GET /health +GET /api/v1/documents +POST /api/v1/documents +GET /api/v1/documents/{cid} +GET /api/v1/documents/{cid}/metadata +DELETE /api/v1/documents/{cid} +POST /api/v1/legal-documents/archive +GET /api/v1/verify/{cid} +GET /api/v1/node/info +``` + +Every path is a contract — updating requires synchronized updates in consumers. diff --git a/dsms-node/README.md b/dsms-node/README.md new file mode 100644 index 0000000..d8335d7 --- /dev/null +++ b/dsms-node/README.md @@ -0,0 +1,15 @@ +# dsms-node + +IPFS Kubo node container — distributed document storage backend for the compliance platform. Participates in the BreakPilot IPFS swarm and serves as the storage layer behind `dsms-gateway`. + +**Image:** `ipfs/kubo:v0.24.0` +**Ports:** `4001` (swarm), `5001` (API), `8085` (HTTP gateway) +**Container:** `bp-compliance-dsms-node` + +## Operation + +No source code — this is a thin wrapper around the upstream IPFS Kubo image. Configuration is via environment and the compose file at repo root. + +## Don't touch + +This service is out of refactor scope. Do not modify without the infrastructure owner's sign-off. diff --git a/scripts/check-loc.sh b/scripts/check-loc.sh new file mode 100755 index 0000000..f6e4ce0 --- /dev/null +++ b/scripts/check-loc.sh @@ -0,0 +1,123 @@ +#!/usr/bin/env bash +# check-loc.sh — File-size budget enforcer for breakpilot-compliance. +# +# Soft target: 300 LOC. Hard cap: 500 LOC. +# +# Usage: +# scripts/check-loc.sh # scan whole repo, respect exceptions +# scripts/check-loc.sh --changed # only files changed vs origin/main +# scripts/check-loc.sh path/to/file.py # check specific files +# scripts/check-loc.sh --json # machine-readable output +# +# Exit codes: +# 0 — clean (no hard violations) +# 1 — at least one file exceeds the hard cap (500) +# 2 — invalid invocation +# +# Behavior: +# - Skips test files, generated files, vendor dirs, node_modules, .git, dist, build, +# .next, __pycache__, migrations, and anything matching .claude/rules/loc-exceptions.txt. +# - Counts non-blank, non-comment-only lines is NOT done — we count raw lines so the +# rule is unambiguous. If you want to game it with blank lines, you're missing the point. + +set -euo pipefail + +SOFT=300 +HARD=500 +REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +EXCEPTIONS_FILE="$REPO_ROOT/.claude/rules/loc-exceptions.txt" + +CHANGED_ONLY=0 +JSON=0 +TARGETS=() + +for arg in "$@"; do + case "$arg" in + --changed) CHANGED_ONLY=1 ;; + --json) JSON=1 ;; + -h|--help) + sed -n '2,18p' "$0"; exit 0 ;; + -*) echo "unknown flag: $arg" >&2; exit 2 ;; + *) TARGETS+=("$arg") ;; + esac +done + +# Patterns excluded from the budget regardless of path. +is_excluded() { + local f="$1" + case "$f" in + */node_modules/*|*/.next/*|*/.git/*|*/dist/*|*/build/*|*/__pycache__/*|*/vendor/*) return 0 ;; + */migrations/*|*/alembic/versions/*) return 0 ;; + *_test.go|*.test.ts|*.test.tsx|*.spec.ts|*.spec.tsx) return 0 ;; + */tests/*|*/test/*) return 0 ;; + *.md|*.json|*.yaml|*.yml|*.lock|*.sum|*.mod|*.toml|*.cfg|*.ini) return 0 ;; + *.svg|*.png|*.jpg|*.jpeg|*.gif|*.ico|*.pdf|*.woff|*.woff2|*.ttf) return 0 ;; + *.generated.*|*.gen.*|*_pb.go|*_pb2.py|*.pb.go) return 0 ;; + esac + return 1 +} + +is_in_exceptions() { + [[ -f "$EXCEPTIONS_FILE" ]] || return 1 + local rel="${1#$REPO_ROOT/}" + grep -Fxq "$rel" "$EXCEPTIONS_FILE" +} + +collect_targets() { + if (( ${#TARGETS[@]} > 0 )); then + printf '%s\n' "${TARGETS[@]}" + elif (( CHANGED_ONLY )); then + git -C "$REPO_ROOT" diff --name-only --diff-filter=AM origin/main...HEAD 2>/dev/null \ + || git -C "$REPO_ROOT" diff --name-only --diff-filter=AM HEAD + else + git -C "$REPO_ROOT" ls-files + fi +} + +violations_hard=() +violations_soft=() + +while IFS= read -r f; do + [[ -z "$f" ]] && continue + abs="$f" + [[ "$abs" != /* ]] && abs="$REPO_ROOT/$f" + [[ -f "$abs" ]] || continue + is_excluded "$abs" && continue + is_in_exceptions "$abs" && continue + loc=$(wc -l < "$abs" | tr -d ' ') + if (( loc > HARD )); then + violations_hard+=("$loc $f") + elif (( loc > SOFT )); then + violations_soft+=("$loc $f") + fi +done < <(collect_targets) + +if (( JSON )); then + printf '{"hard":[' + first=1; for v in "${violations_hard[@]}"; do + loc="${v%% *}"; path="${v#* }" + (( first )) || printf ','; first=0 + printf '{"loc":%s,"path":"%s"}' "$loc" "$path" + done + printf '],"soft":[' + first=1; for v in "${violations_soft[@]}"; do + loc="${v%% *}"; path="${v#* }" + (( first )) || printf ','; first=0 + printf '{"loc":%s,"path":"%s"}' "$loc" "$path" + done + printf ']}\n' +else + if (( ${#violations_soft[@]} > 0 )); then + echo "::warning:: $((${#violations_soft[@]})) file(s) exceed soft target ($SOFT lines):" + printf ' %s\n' "${violations_soft[@]}" | sort -rn + fi + if (( ${#violations_hard[@]} > 0 )); then + echo "::error:: $((${#violations_hard[@]})) file(s) exceed HARD CAP ($HARD lines) — split required:" + printf ' %s\n' "${violations_hard[@]}" | sort -rn + echo + echo "If a file legitimately must exceed $HARD lines (generated code, large data tables)," + echo "add it to .claude/rules/loc-exceptions.txt with a one-line rationale comment above it." + fi +fi + +(( ${#violations_hard[@]} == 0 )) diff --git a/scripts/githooks/pre-commit b/scripts/githooks/pre-commit new file mode 100755 index 0000000..44d0314 --- /dev/null +++ b/scripts/githooks/pre-commit @@ -0,0 +1,55 @@ +#!/usr/bin/env bash +# pre-commit — enforces breakpilot-compliance structural guardrails. +# +# 1. Blocks commits that introduce a non-test, non-generated source file > 500 LOC. +# 2. Blocks commits that touch backend-compliance/migrations/ unless the commit message +# contains the marker [migration-approved] (last-resort escape hatch). +# 3. Blocks edits to .claude/settings.json, scripts/check-loc.sh, or +# .claude/rules/loc-exceptions.txt unless [guardrail-change] is in the commit message. +# +# Bypass with --no-verify is intentionally NOT supported by the team workflow. +# CI re-runs all of these on the server side anyway. + +set -euo pipefail +REPO_ROOT="$(git rev-parse --show-toplevel)" + +mapfile -t staged < <(git diff --cached --name-only --diff-filter=ACM) +[[ ${#staged[@]} -eq 0 ]] && exit 0 + +# 1. LOC budget on staged files. +loc_targets=() +for f in "${staged[@]}"; do + [[ -f "$REPO_ROOT/$f" ]] && loc_targets+=("$REPO_ROOT/$f") +done +if [[ ${#loc_targets[@]} -gt 0 ]]; then + if ! "$REPO_ROOT/scripts/check-loc.sh" "${loc_targets[@]}"; then + echo + echo "Commit blocked: file-size budget violated. See output above." + echo "Either split the file (preferred) or add an exception with rationale to" + echo " .claude/rules/loc-exceptions.txt" + exit 1 + fi +fi + +# 2. Migration directories are frozen unless explicitly approved. +if printf '%s\n' "${staged[@]}" | grep -qE '(^|/)(migrations|alembic/versions)/'; then + if ! git log --format=%B -n 1 HEAD 2>/dev/null | grep -q '\[migration-approved\]' \ + && ! grep -q '\[migration-approved\]' "$(git rev-parse --git-dir)/COMMIT_EDITMSG" 2>/dev/null; then + echo "Commit blocked: this change touches a migrations directory." + echo "Database schema changes require an explicit migration plan reviewed by the DB owner." + echo "If approved, add '[migration-approved]' to your commit message." + exit 1 + fi +fi + +# 3. Guardrail files are protected. +guarded='^(\.claude/settings\.json|\.claude/rules/loc-exceptions\.txt|scripts/check-loc\.sh|scripts/githooks/pre-commit|AGENTS\.(python|go|typescript)\.md)$' +if printf '%s\n' "${staged[@]}" | grep -qE "$guarded"; then + if ! grep -q '\[guardrail-change\]' "$(git rev-parse --git-dir)/COMMIT_EDITMSG" 2>/dev/null; then + echo "Commit blocked: this change modifies guardrail files." + echo "If intentional, add '[guardrail-change]' to your commit message and explain why in the body." + exit 1 + fi +fi + +exit 0 diff --git a/scripts/install-hooks.sh b/scripts/install-hooks.sh new file mode 100755 index 0000000..51cb144 --- /dev/null +++ b/scripts/install-hooks.sh @@ -0,0 +1,26 @@ +#!/usr/bin/env bash +# install-hooks.sh — installs git hooks that enforce repo guardrails locally. +# Idempotent. Safe to re-run. + +set -euo pipefail +REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +HOOKS_DIR="$REPO_ROOT/.git/hooks" +SRC_DIR="$REPO_ROOT/scripts/githooks" + +if [[ ! -d "$REPO_ROOT/.git" ]]; then + echo "Not a git repository: $REPO_ROOT" >&2 + exit 1 +fi + +mkdir -p "$HOOKS_DIR" +for hook in pre-commit; do + src="$SRC_DIR/$hook" + dst="$HOOKS_DIR/$hook" + if [[ -f "$src" ]]; then + cp "$src" "$dst" + chmod +x "$dst" + echo "installed: $dst" + fi +done + +echo "Done. Hooks active for this clone."