Commit Graph

278 Commits

Author SHA1 Message Date
Sharang Parnerkar
a83f4b4178 refactor(backend/db): split models.py into per-aggregate modules (1466 -> 85 LOC shim)
The monolithic compliance/db/models.py is decomposed into seven sibling
aggregate modules following the existing repo pattern (dsr_models.py,
vvt_models.py, tom_models.py, etc.):

  regulation_models.py       (134 LOC) — RegulationDB, RequirementDB
  control_models.py          (279 LOC) — ControlDB, ControlMappingDB, EvidenceDB, RiskDB
  ai_system_models.py        (141 LOC) — AISystemDB, AuditExportDB
  service_module_models.py   (176 LOC) — ServiceModuleDB, ModuleRegulationMappingDB, ModuleRiskDB
  audit_session_models.py    (177 LOC) — AuditSessionDB, AuditSignOffDB
  isms_governance_models.py  (323 LOC) — ISMSScope, Context, Policy, Objective, SoA
  isms_audit_models.py       (468 LOC) — AuditFinding, CAPA, ManagementReview, InternalAudit,
                                         AuditTrail, ReadinessCheck

models.py becomes an 85-line re-export shim — every public symbol is
re-exported in dependency order so existing imports work unchanged:

  from compliance.db.models import RegulationDB, ControlDB, AuditFindingDB  # still works

New code SHOULD import from the aggregate module directly; the shim is
for backwards compatibility during the migration.

Schema freeze preserved:
  - __tablename__ byte-identical
  - Column names, types, indexes, constraints byte-identical
  - relationship() string references and back_populates unchanged
  - cascade directives unchanged

Verified:
  - 173/173 pytest compliance/tests/ pass
  - tests/contracts/test_openapi_baseline.py passes (360 paths,
    484 operations — identical to baseline)
  - All new sibling files under the 500-line hard cap
    (largest: isms_audit_models.py at 468 LOC)
  - No file in compliance/db/ now exceeds the hard cap

This is Phase 1 Step 2 from PHASE1_RUNBOOK.md. Phase 1 Step 3 (split
compliance/api/schemas.py, 1899 LOC) is the next target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:10:31 +02:00
Sharang Parnerkar
7806425ba6 test(backend): pin OpenAPI contract baseline (360 paths, 484 operations)
Adds tests/contracts/test_openapi_baseline.py which loads the live
FastAPI app and diffs its OpenAPI schema against a checked-in baseline.
Fails on:
  - Any removed path or operation
  - Any removed response status code on an existing operation
  - Any new required request body field (would break existing clients)

Passes silently on additive changes. The baseline is regenerated by
running tests/contracts/regenerate_baseline.py — only when a contract
change has been reviewed and every consumer (admin-compliance,
developer-portal, SDKs) has been updated in the same change set.

This is the safety harness for the Phase 1 backend-compliance refactor:
every subsequent refactor commit must keep this test green.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:10:11 +02:00
Sharang Parnerkar
cb90d0db0c chore(backend): deprecation sweep — Pydantic V1 -> V2, utcnow -> tz-aware
Two low-risk Pydantic V1 idioms that will be hard errors in V3:
  - Query(regex=...) -> Query(pattern=...) (audit_routes, control_generator_routes)
  - class Config: from_attributes=True -> model_config = ConfigDict(...)
    in source_policy_router.py (schemas.py is intentionally skipped — it is
    the Phase 1 schema-split target and the ConfigDict conversion is most
    efficient to do during that split).

Naive -> aware datetime sweep across 47 files:
  - datetime.utcnow() -> datetime.now(timezone.utc)
  - default=datetime.utcnow -> default=lambda: datetime.now(timezone.utc)
  - onupdate=datetime.utcnow -> onupdate=lambda: datetime.now(timezone.utc)

All SQLAlchemy DateTime columns in the project already declare
timezone=True, so the DB schema expects aware datetimes. Before this
commit, the in-Python side was generating naive values and the driver
was silently coercing them. This is a latent-bug fix, not a behavior
change at the DB boundary.

Verified:
  - 173/173 pytest compliance/tests/ pass (same as baseline)
  - tests/contracts/test_openapi_baseline.py passes (360 paths,
    484 operations unchanged)
  - DeprecationWarning count dropped from 158 -> 35

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:09:59 +02:00
Sharang Parnerkar
512b7a0f6c phase0: add architecture guardrails, CI gates, per-language AGENTS.md
Non-negotiable structural rules that apply to every Claude Code session in
this repo and to every commit, enforced via three defense-in-depth layers:

  1. PreToolUse hook in .claude/settings.json blocks any Write/Edit that
     would push a file past the 500-line hard cap. Auto-loads for any
     Claude session in this repo regardless of who launched it.
  2. scripts/githooks/pre-commit (installed via scripts/install-hooks.sh)
     enforces the LOC cap, freezes migrations/ unless [migration-approved],
     and protects guardrail files unless [guardrail-change] is present.
  3. .gitea/workflows/ci.yaml gets loc-budget + guardrail-integrity jobs,
     plus mypy --strict on new Python packages, tsc --noEmit on Node
     services, and a syft+grype SBOM scan.

Per-language conventions are documented in AGENTS.python.md / AGENTS.go.md /
AGENTS.typescript.md at the repo root — layering (router->service->repo for
Python, hexagonal for Go, colocation for Next.js), tooling baseline, and
explicit "what you may NOT do" lists.

Adds scripts/check-loc.sh (soft 300 / hard 500, reports 205 hard and 161
soft violations in the current codebase) plus .claude/rules/loc-exceptions.txt
(initially empty — the list is designed to shrink over time).

Per-service READMEs for all 10 services + PHASE1_RUNBOOK.md for the
backend-compliance refactor. Skeleton packages (compliance/{domain,
repositories,schemas}) are the landing zone for the clean-arch rewrite that
begins in Phase 1.

CLAUDE.md is prepended with the six non-negotiable rules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:09:26 +02:00
Sharang Parnerkar
1dfea51919 Remove standalone deploy-coolify.yml — deploy is handled in ci.yaml
Some checks failed
CI/CD / go-lint (pull_request) Failing after 2s
CI/CD / python-lint (pull_request) Failing after 10s
CI/CD / nodejs-lint (pull_request) Failing after 2s
CI/CD / test-go-ai-compliance (pull_request) Failing after 2s
CI/CD / test-python-backend-compliance (pull_request) Failing after 10s
CI/CD / test-python-document-crawler (pull_request) Failing after 12s
CI/CD / test-python-dsms-gateway (pull_request) Failing after 10s
CI/CD / validate-canonical-controls (pull_request) Failing after 10s
CI/CD / Deploy (pull_request) Has been skipped
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 11:26:31 +01:00
Sharang Parnerkar
559d7960a2 Replace deploy-hetzner with Coolify webhook deploy
Some checks failed
CI/CD / go-lint (pull_request) Failing after 15s
CI/CD / python-lint (pull_request) Failing after 12s
CI/CD / nodejs-lint (pull_request) Failing after 2s
CI/CD / test-go-ai-compliance (pull_request) Failing after 2s
CI/CD / test-python-backend-compliance (pull_request) Failing after 11s
CI/CD / test-python-document-crawler (pull_request) Failing after 11s
CI/CD / test-python-dsms-gateway (pull_request) Failing after 10s
CI/CD / validate-canonical-controls (pull_request) Failing after 9s
CI/CD / Deploy (pull_request) Has been skipped
Deploy to Coolify / deploy (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:39:12 +01:00
Sharang Parnerkar
a101426dba Add traefik.docker.network label to fix routing
Containers are on multiple networks (breakpilot-network, coolify,
gokocgws...). Without traefik.docker.network, Traefik randomly picks
a network and may choose breakpilot-network where it has no access.
This label forces Traefik to always use the coolify network.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:52 +01:00
Sharang Parnerkar
f6b22820ce Add coolify network to externally-routed services
Traefik routes traffic via the 'coolify' bridge network, so services
that need public domain access must be on both breakpilot-network
(for inter-service communication) and coolify (for Traefik routing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:52 +01:00
Sharang Parnerkar
86588aff09 Fix SQLAlchemy 2.x compatibility: wrap raw SQL in text()
SQLAlchemy 2.x requires raw SQL strings to be explicitly wrapped
in text(). Fixed 16 instances across 5 route files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:52 +01:00
Sharang Parnerkar
033fa52e5b Add healthcheck to dsms-gateway
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:00 +01:00
Sharang Parnerkar
005fb9d219 Add healthchecks to admin-compliance, developer-portal, backend-compliance
Traefik may require healthchecks to route traffic to containers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:00 +01:00
Sharang Parnerkar
0c01f1c96c Remove Traefik labels from coolify compose — Coolify handles routing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:00 +01:00
Sharang Parnerkar
ffd256d420 Sync coolify compose with main: use COMPLIANCE_DATABASE_URL, QDRANT_URL
- Switch to ${COMPLIANCE_DATABASE_URL} for admin-compliance, backend, SDK, crawler
- Add DATABASE_URL to admin-compliance environment
- Switch ai-compliance-sdk from QDRANT_HOST/PORT to QDRANT_URL + QDRANT_API_KEY
- Add MINIO_SECURE to compliance-tts-service
- Update .env.coolify.example with new variable patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:00 +01:00
Sharang Parnerkar
d542dbbacd fix: ensure public dir exists in developer-portal build
Next.js standalone COPY fails when no public directory exists in source.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:00 +01:00
Sharang Parnerkar
a3d0024d39 fix: use Alpine-compatible addgroup/adduser flags in Dockerfiles
Replace --system/--gid/--uid (Debian syntax) with -S/-g/-u (BusyBox/Alpine).
Coolify ARG injection causes exit code 255 with Debian-style flags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:13:57 +01:00
Sharang Parnerkar
998d427c3c fix: update alpine base to 3.21 for ai-compliance-sdk
Alpine 3.19 apk mirrors failing during Coolify build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:13:57 +01:00
Sharang Parnerkar
99f3180ffc refactor(coolify): externalize postgres, qdrant, S3
- Replace bp-core-postgres with POSTGRES_HOST env var
- Replace bp-core-qdrant with QDRANT_HOST env var
- Replace bp-core-minio with S3_ENDPOINT/S3_ACCESS_KEY/S3_SECRET_KEY

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:13:57 +01:00
Sharang Parnerkar
2ec340c64b feat: add Coolify deployment configuration
Add docker-compose.coolify.yml (8 services), .env.coolify.example,
and Gitea Action workflow for Coolify API deployment. Removes
core-health-check and docs. Adds Traefik labels for
*.breakpilot.ai domain routing with Let's Encrypt SSL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:13:57 +01:00
Benjamin Admin
499ddc04d5 chore: trigger redeploy via Gitea Actions CI/CD
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 37s
CI/CD / test-python-backend-compliance (push) Successful in 34s
CI/CD / test-python-document-crawler (push) Successful in 25s
CI/CD / test-python-dsms-gateway (push) Successful in 22s
CI/CD / validate-canonical-controls (push) Successful in 13s
CI/CD / deploy-hetzner (push) Successful in 15s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 09:54:23 +01:00
Benjamin Admin
f738ca8c52 fix: make compliance router imports resilient to individual module failures
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 33s
CI/CD / test-python-backend-compliance (push) Successful in 33s
CI/CD / test-python-document-crawler (push) Successful in 23s
CI/CD / test-python-dsms-gateway (push) Successful in 19s
CI/CD / validate-canonical-controls (push) Successful in 13s
CI/CD / deploy-hetzner (push) Successful in 17s
Replaced bare imports with safe_import_router pattern — if one sub-router
fails to import (e.g. missing dependency), other routers still load.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 09:46:52 +01:00
Benjamin Admin
c530898963 fix: replace Python 3.10+ union type syntax with typing.Optional for Pydantic v2 compat
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 37s
CI/CD / test-python-backend-compliance (push) Successful in 35s
CI/CD / test-python-document-crawler (push) Successful in 24s
CI/CD / test-python-dsms-gateway (push) Successful in 19s
CI/CD / validate-canonical-controls (push) Successful in 12s
CI/CD / deploy-hetzner (push) Has been cancelled
from __future__ import annotations breaks Pydantic BaseModel runtime type
evaluation. Replaced str | None → Optional[str], list[str] → List[str] etc.
in control_generator.py, anchor_finder.py, control_generator_routes.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 09:36:14 +01:00
Benjamin Admin
cdafc4d9f4 feat: auto-run SQL migrations on backend startup
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 35s
CI/CD / test-python-backend-compliance (push) Successful in 33s
CI/CD / test-python-document-crawler (push) Successful in 26s
CI/CD / test-python-dsms-gateway (push) Successful in 19s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / deploy-hetzner (push) Successful in 2m35s
Adds migration_runner.py that executes pending migrations from
migrations/ directory when backend-compliance starts. Tracks applied
migrations in _migration_history table.

Handles existing databases: detects if tables from migrations 001-045
already exist and seeds the history table accordingly, so only new
migrations (046+) are applied.

Skippable via SKIP_MIGRATIONS=true env var.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 09:14:18 +01:00
Benjamin Admin
de19ef0684 feat(control-generator): 7-stage pipeline for RAG→LLM→Controls generation
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 45s
CI/CD / test-python-document-crawler (push) Has been cancelled
CI/CD / test-python-dsms-gateway (push) Has been cancelled
CI/CD / validate-canonical-controls (push) Has been cancelled
CI/CD / deploy-hetzner (push) Has been cancelled
CI/CD / test-python-backend-compliance (push) Has been cancelled
Implements the Control Generator Pipeline that systematically generates
canonical security controls from 150k+ RAG chunks across all compliance
collections (BSI, NIST, OWASP, ENISA, EU laws, German laws).

Three license rules enforced throughout:
- Rule 1 (free_use): Laws/Public Domain — original text preserved
- Rule 2 (citation_required): CC-BY/CC-BY-SA — text with citation
- Rule 3 (restricted): BSI/ISO — full reformulation, no source traces

New files:
- Migration 046: job tracking, chunk tracking, blocked sources tables
- control_generator.py: 7-stage pipeline (scan→classify→structure/reform→harmonize→anchor→store→mark)
- anchor_finder.py: RAG + DuckDuckGo open-source reference search
- control_generator_routes.py: REST API (generate, review, stats, blocked-sources)
- test_control_generator.py: license mapping, rule enforcement, anchor filtering tests

Modified:
- __init__.py: register control_generator_router
- route.ts: proxy generator/review/stats endpoints
- page.tsx: Generator modal, stats panel, state filter, review queue, license badges

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 09:03:37 +01:00
Benjamin Admin
c87f07c99a feat: seed 10 canonical controls + CRUD endpoints + frontend editor
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 39s
CI/CD / test-python-backend-compliance (push) Successful in 39s
CI/CD / test-python-document-crawler (push) Successful in 30s
CI/CD / test-python-dsms-gateway (push) Successful in 20s
CI/CD / validate-canonical-controls (push) Successful in 12s
CI/CD / deploy-hetzner (push) Successful in 1m37s
- Migration 045: Seed 10 controls (AUTH, NET, SUP, LOG, WEB, DATA, CRYP, REL)
  with 39 open-source anchors into the database
- Backend: POST/PUT/DELETE endpoints for canonical controls CRUD
- Frontend proxy: PUT and DELETE methods added to canonical route
- Frontend: Control Library with create/edit/delete UI, full form with
  open anchor management, scope, requirements, evidence, test procedures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 00:28:21 +01:00
Benjamin Admin
453eec9ed8 fix: correct canonical control proxy paths to include /compliance prefix
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 44s
CI/CD / test-python-backend-compliance (push) Successful in 1m4s
CI/CD / test-python-document-crawler (push) Successful in 27s
CI/CD / test-python-dsms-gateway (push) Successful in 24s
CI/CD / validate-canonical-controls (push) Successful in 14s
CI/CD / deploy-hetzner (push) Successful in 1m49s
The backend mounts the compliance router at /api/compliance, so canonical
control endpoints are at /api/compliance/v1/canonical/*, not /api/v1/canonical/*.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:49:06 +01:00
Benjamin Admin
050f353192 feat(canonical-controls): Canonical Control Library — rechtssichere Security Controls
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 40s
CI/CD / test-python-backend-compliance (push) Successful in 41s
CI/CD / test-python-document-crawler (push) Successful in 26s
CI/CD / test-python-dsms-gateway (push) Successful in 23s
CI/CD / validate-canonical-controls (push) Successful in 18s
CI/CD / deploy-hetzner (push) Successful in 2m26s
Eigenstaendig formulierte Security Controls mit unabhaengiger Taxonomie
und Open-Source-Verankerung (OWASP, NIST, ENISA). Keine BSI-Nomenklatur.

- Migration 044: 5 DB-Tabellen (frameworks, controls, sources, licenses, mappings)
- 10 Seed Controls mit 39 Open-Source-Referenzen
- License Gate: Quellen-Berechtigungspruefung (analysis/excerpt/embeddings/product)
- Too-Close-Detektor: 5 Metriken (exact-phrase, token-overlap, ngram, embedding, LCS)
- REST API: 8 Endpoints unter /v1/canonical/
- Go Loader mit Multi-Index (ID, domain, severity, framework)
- Frontend: Control Library Browser + Provenance Wiki
- CI/CD: validate-controls.py Job (schema, no-leak, open-anchors)
- 67 Tests (8 Go + 59 Python), alle PASS
- MkDocs Dokumentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 19:55:06 +01:00
Benjamin Admin
8442115e7c fix(rag): Fix bash compatibility + missing mkdir in phase functions
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 41s
CI/CD / test-python-backend-compliance (push) Successful in 42s
CI/CD / test-python-document-crawler (push) Successful in 29s
CI/CD / test-python-dsms-gateway (push) Successful in 24s
CI/CD / deploy-hetzner (push) Successful in 17s
- Replace ${var,,} (bash 4+) with $(echo | tr) for macOS bash 3.2 compat
- Add mkdir -p to phase_gesetze, phase_eu, phase_templates, phase_datenschutz,
  phase_dach — prevents download failures when running phases individually

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 16:44:15 +01:00
Benjamin Admin
999cc81c78 feat(rag): Phase J — Security Guidelines & Standards (NIST, OWASP, ENISA)
Some checks failed
CI/CD / go-lint (push) Has been cancelled
CI/CD / python-lint (push) Has been cancelled
CI/CD / nodejs-lint (push) Has been cancelled
CI/CD / test-go-ai-compliance (push) Has been cancelled
CI/CD / test-python-backend-compliance (push) Has been cancelled
CI/CD / test-python-document-crawler (push) Has been cancelled
CI/CD / test-python-dsms-gateway (push) Has been cancelled
CI/CD / deploy-hetzner (push) Has been cancelled
Add phase_security() with 15 documents across 3 sub-phases:
- J1: 7 NIST standards (SP 800-53, 800-218, 800-63, 800-207, 8259A/B, AI RMF)
- J2: 6 OWASP projects (Top 10, API Security, ASVS, MASVS, SAMM, Mobile Top 10)
- J3: 2 ENISA guides (Procurement Hospitals, Cloud Security SMEs)

All documents are commercially licensed (Public Domain / CC BY / CC BY-SA).
Wire up 'security' phase in dispatcher and workflow yaml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 16:14:44 +01:00
Benjamin Admin
ff66612beb fix(rag): Make download failures non-fatal — prevent set -e from aborting entire ingestion
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 43s
CI/CD / test-python-backend-compliance (push) Successful in 38s
CI/CD / test-python-document-crawler (push) Successful in 30s
CI/CD / test-python-dsms-gateway (push) Successful in 23s
CI/CD / deploy-hetzner (push) Successful in 17s
download_pdf() and extract_gesetz_html() now return 0 on failure and clean up
partial files. This prevents set -euo pipefail from aborting the entire script
when a single download fails (e.g. EUR-Lex timeout, BSI redirect).

Root cause of H2 EU loop only processing 1 document in Run #724: first failed
download_pdf returned 1, triggering set -e script abort.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:56:23 +01:00
Benjamin Admin
42ec3cad6d feat(rag): Phase I DACH-Erweiterung — Gesetze, Templates, Urteile
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 56s
CI/CD / test-python-backend-compliance (push) Successful in 49s
CI/CD / test-python-document-crawler (push) Successful in 32s
CI/CD / test-python-dsms-gateway (push) Successful in 25s
CI/CD / deploy-hetzner (push) Successful in 17s
New ingestion phase 'dach' adds missing documents from DACH catalog:

I1: UStG (Retention), MStV (Impressum)
I2: DSK Muster-VVT, DSK KP5 DSFA, BfDI Beispiel-VVT (DL-DE/BY-2.0)
I3: BSI IT-Grundschutz Kompendium 2024 (CC BY-SA 4.0)
I4: 7 Gerichtsentscheidungen as Praxisanker:
  - DE: LG Bonn 1&1, BGH Planet49, BGH Art.82 (2x)
  - AT: OGH Schutzzweck, OGH Art.15+82 EuGH-Vorlage
  - CH: BVGer DSG-Auskunft, BGer Datensperre

Trigger: workflow_dispatch phase=dach

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:36:59 +01:00
Benjamin Admin
9945a62a50 fix(rag): docker cp into /workspace_scripts, then copy at runtime
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 41s
CI/CD / test-python-backend-compliance (push) Successful in 39s
CI/CD / test-python-document-crawler (push) Successful in 28s
CI/CD / test-python-dsms-gateway (push) Successful in 24s
CI/CD / deploy-hetzner (push) Successful in 18s
docker cp fails when target dir doesn't exist in a created container.
Copy scripts to /workspace_scripts, then cp them at container start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:24:36 +01:00
Benjamin Admin
eef1c2e7d3 fix(rag): Use docker cp to inject checked-out scripts
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 40s
CI/CD / test-python-backend-compliance (push) Successful in 40s
CI/CD / test-python-document-crawler (push) Successful in 29s
CI/CD / test-python-dsms-gateway (push) Successful in 24s
CI/CD / deploy-hetzner (push) Successful in 17s
The runner container can't access host paths directly, so the
deploy dir scripts were always stale. Now uses docker create +
docker cp + docker start to copy the freshly checked-out scripts
into the ingestion container before starting it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 13:37:57 +01:00
Benjamin Admin
a0e2a35e66 fix(rag): Git pull deploy dir before ingestion
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 40s
CI/CD / test-python-backend-compliance (push) Successful in 44s
CI/CD / test-python-document-crawler (push) Successful in 29s
CI/CD / test-python-dsms-gateway (push) Successful in 23s
CI/CD / deploy-hetzner (push) Successful in 18s
The RAG workflow mounts scripts from /opt/breakpilot-compliance/scripts
(deploy dir) but this may not have the latest fixes if CI hasn't
deployed yet. Add explicit git pull before running ingestion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 13:13:33 +01:00
Benjamin Admin
57f390190d fix(rag): Arithmetic error, dedup auth, EGBGB timeout
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 41s
CI/CD / test-python-backend-compliance (push) Successful in 41s
CI/CD / test-python-document-crawler (push) Successful in 27s
CI/CD / test-python-dsms-gateway (push) Successful in 21s
CI/CD / deploy-hetzner (push) Successful in 19s
- collection_count() returns 0 (not ?) on failure — fixes arithmetic error
- Pass QDRANT_API_KEY to ingestion container for dedup checks
- Include api-key header in collection_count() and dedup scroll queries
- Lower large-file threshold to 256KB (EGBGB 310KB was timing out)
- More targeted EGBGB XML extraction (Art. 246a + Anlage only)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:05:07 +01:00
Benjamin Admin
cf60c39658 fix(scope-engine): Normalize UPPERCASE trigger docs to lowercase ScopeDocumentType
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 56s
CI/CD / test-python-backend-compliance (push) Successful in 42s
CI/CD / test-python-document-crawler (push) Successful in 24s
CI/CD / test-python-dsms-gateway (push) Successful in 26s
CI/CD / deploy-hetzner (push) Successful in 2m57s
Critical bug fix: mandatoryDocuments in Hard-Trigger-Rules used UPPERCASE
names (VVT, TOM, DSE) that never matched lowercase ScopeDocumentType keys
(vvt, tom, dsi). This meant no trigger documents were ever recognized as
mandatory in buildDocumentScope().

- Add normalizeDocType() mapping function with alias support
  (DSE→dsi, LOESCHKONZEPT→lf, DSR_PROZESS→betroffenenrechte, etc.)
- Fix buildDocumentScope() to use normalized doc types
- Fix estimateEffort() to use lowercase keys matching ScopeDocumentType
- Add 2 tests for UPPERCASE normalization and alias resolution

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:39:31 +01:00
Benjamin Admin
c88653b221 fix(rag): Dedup check, BGB split, GewO timeout, arithmetic fix
- Add Qdrant dedup check in upload_file() — skip if regulation_id already exists
- Split BGB (2.7MB) into 5 targeted parts via XML extraction:
  AGB §§305-310, Fernabsatz §§312-312k, Kaufrecht §§433-480,
  Widerruf §§355-361, Digitale Produkte §§327-327u
- Lower large-file threshold 512KB→384KB (fixes GewO 432KB timeout)
- Fix arithmetic syntax error when collection_count returns "?"
- Replace EGBGB PDF (was empty) with XML extraction
- Add unzip to Alpine container for XML archives

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:39:09 +01:00
Benjamin Admin
87d06c8b20 fix(rag): Handle large file uploads + don't abort on individual failures
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 1m5s
CI/CD / test-python-backend-compliance (push) Successful in 43s
CI/CD / test-python-document-crawler (push) Successful in 33s
CI/CD / test-python-dsms-gateway (push) Successful in 27s
CI/CD / deploy-hetzner (push) Successful in 17s
- Extended timeout (15 min) for files > 500KB (BGB is 1.5MB)
- upload_file returns 0 even on failure so set -e doesn't kill script
- Failed uploads are still counted and reported in summary

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 23:33:28 +01:00
Benjamin Admin
0b47612272 fix(rag): Always run download phase before ingestion phases
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 40s
CI/CD / test-python-backend-compliance (push) Successful in 37s
CI/CD / test-python-document-crawler (push) Successful in 26s
CI/CD / test-python-dsms-gateway (push) Successful in 23s
CI/CD / deploy-hetzner (push) Successful in 20s
The gesetze phase failed because it expects text files created by the
download phase. Now the workflow automatically runs download first for
any phase that depends on it. Also adds git and python3 to the alpine
container for repo cloning and text extraction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 23:13:33 +01:00
Benjamin Admin
c14b31b3bc fix(docker): Ensure public dir exists in Next.js builds + Hetzner compose fixes
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 38s
CI/CD / test-python-backend-compliance (push) Successful in 38s
CI/CD / test-python-document-crawler (push) Successful in 29s
CI/CD / test-python-dsms-gateway (push) Successful in 20s
CI/CD / deploy-hetzner (push) Successful in 1m43s
- admin-compliance/Dockerfile: mkdir -p public before build
- developer-portal/Dockerfile: mkdir -p public before build
  (fixes "failed to calculate checksum /app/public: not found")
- docker-compose.hetzner.yml: Override core-health-check to exit
  immediately (Core doesn't run on Hetzner)
- Network override: external:false (auto-create breakpilot-network)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 19:11:26 +01:00
Benjamin Admin
0b836f7e2d fix(ci): Run docker compose from helper container with deploy dir mounted
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 38s
CI/CD / test-python-backend-compliance (push) Successful in 41s
CI/CD / test-python-document-crawler (push) Successful in 25s
CI/CD / test-python-dsms-gateway (push) Successful in 21s
CI/CD / deploy-hetzner (push) Successful in 1m27s
The runner container has Docker socket but no host filesystem access.
docker compose needs to read YAML files, so run build+deploy inside
a helper container that has both Docker socket and the deploy dir mounted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:31:19 +01:00
Benjamin Admin
18d9eec654 fix(ci): Use --entrypoint sh for alpine/git (default entrypoint is git)
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 35s
CI/CD / test-python-backend-compliance (push) Successful in 38s
CI/CD / test-python-document-crawler (push) Successful in 27s
CI/CD / test-python-dsms-gateway (push) Successful in 27s
CI/CD / deploy-hetzner (push) Failing after 6s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:14:58 +01:00
Benjamin Admin
339505feed fix(ci): Fix Hetzner deploy — host filesystem access + network + dependencies
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 37s
CI/CD / test-python-backend-compliance (push) Successful in 36s
CI/CD / test-python-document-crawler (push) Successful in 23s
CI/CD / test-python-dsms-gateway (push) Successful in 21s
CI/CD / deploy-hetzner (push) Failing after 7s
Problems fixed:
1. Deploy step couldn't access /opt/breakpilot-compliance (host path not
   mounted in runner container). Now uses alpine/git helper container with
   host bind-mount for git ops, then docker compose with host paths.
2. breakpilot-network was external:true but Core doesn't run on Hetzner.
   Override in hetzner.yml creates the network automatically.
3. core-health-check blocks startup waiting for Core. Override in
   hetzner.yml makes it exit immediately.
4. RAG ingestion script now respects RAG_URL/QDRANT_URL env vars.
5. RAG workflow discovers network dynamically from running containers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:11:05 +01:00
Benjamin Admin
23b9808bf3 debug(ci): Discovery step to find RAG service on Hetzner
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 36s
CI/CD / test-python-backend-compliance (push) Successful in 40s
CI/CD / test-python-document-crawler (push) Successful in 27s
CI/CD / test-python-dsms-gateway (push) Successful in 26s
CI/CD / deploy-hetzner (push) Failing after 1s
Temporary commit to discover Docker container names and networks
on Hetzner, since breakpilot-network doesn't exist there.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 17:58:46 +01:00
Benjamin Admin
c3654bc9ea fix(ci): Spawn ingestion container on breakpilot-network
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 36s
CI/CD / test-python-backend-compliance (push) Successful in 36s
CI/CD / test-python-document-crawler (push) Successful in 49s
CI/CD / test-python-dsms-gateway (push) Successful in 23s
CI/CD / deploy-hetzner (push) Failing after 1s
Instead of trying to connect the runner to breakpilot-network,
spawn a new alpine container directly on it via docker run.
Added debug output for network/container visibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 17:53:06 +01:00
Benjamin Admin
363bf9606a fix(ci): Connect runner to breakpilot-network for RAG ingestion
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 38s
CI/CD / test-python-backend-compliance (push) Successful in 36s
CI/CD / test-python-document-crawler (push) Successful in 28s
CI/CD / test-python-dsms-gateway (push) Successful in 22s
CI/CD / deploy-hetzner (push) Failing after 1s
- Join breakpilot-network so bp-core-rag-service is reachable
- Make RAG_URL/QDRANT_URL in script respect env vars (${VAR:-default})
- Remove complex fallback logic — fail fast if network not available

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 17:48:13 +01:00
Benjamin Admin
e88c0aeeb3 fix(ci): RAG ingestion uses git-cloned workspace instead of deploy dir
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 39s
CI/CD / test-python-backend-compliance (push) Successful in 44s
CI/CD / test-python-document-crawler (push) Successful in 31s
CI/CD / test-python-dsms-gateway (push) Successful in 26s
CI/CD / deploy-hetzner (push) Failing after 2s
The runner container doesn't always have /opt/breakpilot-compliance mounted.
Use the git-cloned workspace (current dir) and add multi-fallback for RAG API
URL (container network → localhost → host.docker.internal).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 17:43:37 +01:00
Benjamin Admin
ebe7e90bd8 feat(rag): Expand Phase H to Layer 1 Safe Core (~60 documents)
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 40s
CI/CD / test-python-backend-compliance (push) Successful in 39s
CI/CD / test-python-document-crawler (push) Successful in 29s
CI/CD / test-python-dsms-gateway (push) Successful in 25s
CI/CD / deploy-hetzner (push) Failing after 1s
Phase H now includes:
- 16 German laws (PAngV, VSBG, ProdHaftG, BDSG, HGB, AO, DDG, TKG, etc.)
- 15 EUR-Lex EU laws (DSGVO, Consumer Rights Dir, Sale of Goods Dir,
  E-Commerce Dir, Unfair Terms Dir, DMA, NIS2, Product Liability Dir, etc.)
- 2 NIST frameworks (CSF 2.0, Privacy Framework 1.0)
- 1 HLEG Ethics Guidelines

Updated rag-sources.md with complete inventory of already-ingested vs
new documents, plus Layer 2-5 TODO roadmap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:54:07 +01:00
Benjamin Admin
995de9e0f4 fix(ci): RAG ingestion uses docker:27-cli with host network access
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 47s
CI/CD / test-python-backend-compliance (push) Successful in 47s
CI/CD / test-python-document-crawler (push) Successful in 30s
CI/CD / test-python-dsms-gateway (push) Successful in 25s
CI/CD / deploy-hetzner (push) Failing after 2s
Runner needs access to /opt/breakpilot-compliance and Docker network
for RAG service (bp-core-rag-service:8097). Falls back to
host.docker.internal if container network unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:17:16 +01:00
Benjamin Admin
4e08364bc6 feat(ci): Add manual RAG ingestion workflow for Gitea Actions
Some checks failed
CI/CD / go-lint (push) Has been cancelled
CI/CD / python-lint (push) Has been cancelled
CI/CD / nodejs-lint (push) Has been cancelled
CI/CD / test-go-ai-compliance (push) Has been cancelled
CI/CD / test-python-backend-compliance (push) Has been cancelled
CI/CD / test-python-document-crawler (push) Has been cancelled
CI/CD / test-python-dsms-gateway (push) Has been cancelled
CI/CD / deploy-hetzner (push) Has been cancelled
Adds workflow_dispatch-triggered job to run ingest-legal-corpus.sh
on Hetzner. Supports phase selection (verbraucherschutz, gesetze, eu, etc.).

Usage: Gitea UI → Actions → "RAG Ingestion" → Run (select phase)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:14:44 +01:00
Benjamin Admin
7f38df9d9c feat(scope): Split HT-H01 B2B/B2C + register Verbraucherschutz document types + RAG ingestion
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 38s
CI/CD / test-python-backend-compliance (push) Successful in 39s
CI/CD / test-python-document-crawler (push) Successful in 27s
CI/CD / test-python-dsms-gateway (push) Successful in 24s
CI/CD / deploy-hetzner (push) Has been cancelled
- Split HT-H01 into HT-H01a (B2C/Hybrid mit Verbraucherschutzpflichten) und
  HT-H01b (reiner B2B mit Basis-Pflichten). B2B-Webshops bekommen keine
  Widerrufsbelehrung/Preisangaben/Fernabsatz mehr.
- Add excludeWhen/requireWhen to HardTriggerRule for conditional trigger logic
- Register 6 neue ScopeDocumentType: widerrufsbelehrung, preisangaben,
  fernabsatz_info, streitbeilegung, produktsicherheit, ai_act_doku
- Full DOCUMENT_SCOPE_MATRIX L1-L4 for all new types
- Align HardTriggerRule interface with actual engine field names
- Add Phase H (Verbraucherschutz) to RAG ingestion script:
  10 deutsche Gesetze + 4 EU-Verordnungen + HLEG Ethics Guidelines
- Add scripts/rag-sources.md with license documentation
- 9 new tests for B2B/B2C trigger split, all 326 tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:03:49 +01:00