- loc-budget CI job: remove if/else PR-only guard; now runs scripts/check-loc.sh
(no || true) on every push and PR, scanning the full repo
- sbom-scan: remove || true from grype command — high+ CVEs now block PRs
- scripts/check-loc.sh: add test_*.py / */test_*.py and *.html exclusions so
Python test files and Jinja/HTML templates are not counted against the budget
- .claude/rules/loc-exceptions.txt: grandfather 40 remaining oversized files
into the exceptions list (one-off scripts, docs copies, platform SDKs,
and Phase 1 backend-compliance refactor backlog)
- ai-compliance-sdk/.golangci.yml: add strict golangci-lint config (errcheck,
govet, staticcheck, gosec, gocyclo, gocritic, revive, goimports)
- delete stray routes.py.backup (2512 LOC)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Squash of branch refactor/phase0-guardrails-and-models-split — 4 commits,
81 files, 173/173 pytest green, OpenAPI contract preserved (360 paths /
484 operations).
## Phase 0 — Architecture guardrails
Three defense-in-depth layers to keep the architecture rules enforced
regardless of who opens Claude Code in this repo:
1. .claude/settings.json PreToolUse hook on Write/Edit blocks any file
that would exceed the 500-line hard cap. Auto-loads in every Claude
session in this repo.
2. scripts/githooks/pre-commit (install via scripts/install-hooks.sh)
enforces the LOC cap locally, freezes migrations/ without
[migration-approved], and protects guardrail files without
[guardrail-change].
3. .gitea/workflows/ci.yaml gains loc-budget + guardrail-integrity +
sbom-scan (syft+grype) jobs, adds mypy --strict for the new Python
packages (compliance/{services,repositories,domain,schemas}), and
tsc --noEmit for admin-compliance + developer-portal.
Per-language conventions documented in AGENTS.python.md, AGENTS.go.md,
AGENTS.typescript.md at the repo root — layering, tooling, and explicit
"what you may NOT do" lists. Root CLAUDE.md is prepended with the six
non-negotiable rules. Each of the 10 services gets a README.md.
scripts/check-loc.sh enforces soft 300 / hard 500 and surfaces the
current baseline of 205 hard + 161 soft violations so Phases 1-4 can
drain it incrementally. CI gates only CHANGED files in PRs so the
legacy baseline does not block unrelated work.
## Deprecation sweep
47 files. Pydantic V1 regex= -> pattern= (2 sites), class Config ->
ConfigDict in source_policy_router.py (schemas.py intentionally skipped;
it is the Phase 1 Step 3 split target). datetime.utcnow() ->
datetime.now(timezone.utc) everywhere including SQLAlchemy default=
callables. All DB columns already declare timezone=True, so this is a
latent-bug fix at the Python side, not a schema change.
DeprecationWarning count dropped from 158 to 35.
## Phase 1 Step 1 — Contract test harness
tests/contracts/test_openapi_baseline.py diffs the live FastAPI /openapi.json
against tests/contracts/openapi.baseline.json on every test run. Fails on
removed paths, removed status codes, or new required request body fields.
Regenerate only via tests/contracts/regenerate_baseline.py after a
consumer-updated contract change. This is the safety harness for all
subsequent refactor commits.
## Phase 1 Step 2 — models.py split (1466 -> 85 LOC shim)
compliance/db/models.py is decomposed into seven sibling aggregate modules
following the existing repo pattern (dsr_models.py, vvt_models.py, ...):
regulation_models.py (134) — Regulation, Requirement
control_models.py (279) — Control, Mapping, Evidence, Risk
ai_system_models.py (141) — AISystem, AuditExport
service_module_models.py (176) — ServiceModule, ModuleRegulation, ModuleRisk
audit_session_models.py (177) — AuditSession, AuditSignOff
isms_governance_models.py (323) — ISMSScope, Context, Policy, Objective, SoA
isms_audit_models.py (468) — Finding, CAPA, MgmtReview, InternalAudit,
AuditTrail, Readiness
models.py becomes an 85-line re-export shim in dependency order so
existing imports continue to work unchanged. Schema is byte-identical:
__tablename__, column definitions, relationship strings, back_populates,
cascade directives all preserved.
All new sibling files are under the 500-line hard cap; largest is
isms_audit_models.py at 468. No file in compliance/db/ now exceeds
the hard cap.
## Phase 1 Step 3 — infrastructure only
backend-compliance/compliance/{schemas,domain,repositories}/ packages
are created as landing zones with docstrings. compliance/domain/
exports DomainError / NotFoundError / ConflictError / ValidationError /
PermissionError — the base classes services will use to raise
domain-level errors instead of HTTPException.
PHASE1_RUNBOOK.md at backend-compliance/PHASE1_RUNBOOK.md documents
the nine-step execution plan for Phase 1: snapshot baseline,
characterization tests, split models.py (this commit), split schemas.py
(next), extract services, extract repositories, mypy --strict, coverage.
## Verification
backend-compliance/.venv-phase1: uv python install 3.12 + pip -r requirements.txt
PYTHONPATH=. pytest compliance/tests/ tests/contracts/
-> 173 passed, 0 failed, 35 warnings, OpenAPI 360/484 unchanged
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
macOS ships mit bash 3, declare -A wird nicht unterstuetzt.
Ersetzt durch case-Funktion dir_to_service().
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers NIST SP 800-160/30/82, SPDX 3.0, CVSS v4.0, SLSA v1.0,
CycloneDX 1.6, OpenTelemetry, EU Machinery Guide 2006/42/EC,
FDA Human Factors, and 5 GPAI documents (Scope Guidelines,
Communication, CoP Safety/Transparency/Copyright).
All documents include license metadata in regulation payloads.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 1 (LLM Quality):
- Add format=json to all Ollama payloads (obligation_extractor, control_generator, citation_backfill)
- Add Chain-of-Thought analysis steps to Pass 0a/0b system prompts
Phase 2 (Retrieval Quality):
- Hybrid search via Qdrant Query API with RRF fusion + automatic text index (legal_rag.go)
- Fallback to dense-only search if Query API unavailable
- Cross-encoder re-ranking with BGE Reranker v2 (RERANK_ENABLED=false by default)
- CPU-only PyTorch dependency to keep Docker image small
Phase 3 (Data Layer):
- Cross-regulation dedup pass (threshold 0.95) links controls across regulations
- DedupResult.link_type field distinguishes dedup_merge vs cross_regulation
- Chunk size defaults updated 512/50 → 1024/128 for new ingestions only
- Existing collections and controls are NOT affected
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Preamble controls that duplicate article controls (same regulation,
Jaccard title similarity >= 0.40) are marked as duplicate.
Article controls always take priority.
Result: 6,183 active controls (was 6,373), 648 unique preamble controls remain.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
QA pipeline that matches control source_original_text directly against
original PDF documents to verify article/paragraph assignments. Covers
backfill, dedup, source normalization, Qdrant cleanup, and prod sync.
Key results (2026-03-20):
- 4,110/7,943 controls matched to PDF (100% for major EU regs)
- 3,366 article corrections, 705 new assignments
- 1,290 controls from Erwägungsgründe (preamble) identified
- 779 controls from Anhänge (annexes) identified
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _detect_recital() to QA pipeline — flags controls where
source_original_text contains Erwägungsgrund markers instead of
article text (28% of controls with source text affected).
- Recital detection via regex + phrase matching in QA validation
- 10 new tests (TestRecitalDetection), 81 total
- ReviewCompare component for side-by-side duplicate comparison
- Review mode split: Duplikat-Verdacht vs Rule-3-ohne-Anchor tabs
- MkDocs: recital detection documentation
- Detection script for bulk analysis (scripts/find_recital_controls.py)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 5 — Frontend Integration:
- components/page.tsx: ComponentLibraryModal with 120 components + 20 energy sources
- hazards/page.tsx: AutoSuggestPanel with 3-column pattern matching review
- mitigations/page.tsx: SuggestMeasuresModal per hazard with 3-level grouping
- verification/page.tsx: SuggestEvidenceModal per mitigation with evidence types
Phase 6 — RAG Library Search:
- Added bp_iace_libraries to AllowedCollections whitelist in rag_handlers.go
- SearchLibrary endpoint: POST /iace/library-search (semantic search across libraries)
- EnrichTechFileSection endpoint: POST /projects/:id/tech-file/:section/enrich
- Created ingest-iace-libraries.sh ingestion script for Qdrant collection
Tests (123 passing):
- tag_taxonomy_test.go: 8 tests for taxonomy entries, domains, essential tags
- controls_library_test.go: 7 tests for measures, reduction types, subtypes
- integration_test.go: 7 integration tests for full match flow and library consistency
- Extended tag_resolver_test.go: 9 new tests for FindByTags and cross-category resolution
Documentation:
- Updated iace.md with Hazard-Matching-Engine, RAG enrichment, and new DB tables
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add phase_security() with 15 documents across 3 sub-phases:
- J1: 7 NIST standards (SP 800-53, 800-218, 800-63, 800-207, 8259A/B, AI RMF)
- J2: 6 OWASP projects (Top 10, API Security, ASVS, MASVS, SAMM, Mobile Top 10)
- J3: 2 ENISA guides (Procurement Hospitals, Cloud Security SMEs)
All documents are commercially licensed (Public Domain / CC BY / CC BY-SA).
Wire up 'security' phase in dispatcher and workflow yaml.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
download_pdf() and extract_gesetz_html() now return 0 on failure and clean up
partial files. This prevents set -euo pipefail from aborting the entire script
when a single download fails (e.g. EUR-Lex timeout, BSI redirect).
Root cause of H2 EU loop only processing 1 document in Run #724: first failed
download_pdf returned 1, triggering set -e script abort.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- collection_count() returns 0 (not ?) on failure — fixes arithmetic error
- Pass QDRANT_API_KEY to ingestion container for dedup checks
- Include api-key header in collection_count() and dedup scroll queries
- Lower large-file threshold to 256KB (EGBGB 310KB was timing out)
- More targeted EGBGB XML extraction (Art. 246a + Anlage only)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extended timeout (15 min) for files > 500KB (BGB is 1.5MB)
- upload_file returns 0 even on failure so set -e doesn't kill script
- Failed uploads are still counted and reported in summary
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Join breakpilot-network so bp-core-rag-service is reachable
- Make RAG_URL/QDRANT_URL in script respect env vars (${VAR:-default})
- Remove complex fallback logic — fail fast if network not available
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase H now includes:
- 16 German laws (PAngV, VSBG, ProdHaftG, BDSG, HGB, AO, DDG, TKG, etc.)
- 15 EUR-Lex EU laws (DSGVO, Consumer Rights Dir, Sale of Goods Dir,
E-Commerce Dir, Unfair Terms Dir, DMA, NIS2, Product Liability Dir, etc.)
- 2 NIST frameworks (CSF 2.0, Privacy Framework 1.0)
- 1 HLEG Ethics Guidelines
Updated rag-sources.md with complete inventory of already-ingested vs
new documents, plus Layer 2-5 TODO roadmap.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Split HT-H01 into HT-H01a (B2C/Hybrid mit Verbraucherschutzpflichten) und
HT-H01b (reiner B2B mit Basis-Pflichten). B2B-Webshops bekommen keine
Widerrufsbelehrung/Preisangaben/Fernabsatz mehr.
- Add excludeWhen/requireWhen to HardTriggerRule for conditional trigger logic
- Register 6 neue ScopeDocumentType: widerrufsbelehrung, preisangaben,
fernabsatz_info, streitbeilegung, produktsicherheit, ai_act_doku
- Full DOCUMENT_SCOPE_MATRIX L1-L4 for all new types
- Align HardTriggerRule interface with actual engine field names
- Add Phase H (Verbraucherschutz) to RAG ingestion script:
10 deutsche Gesetze + 4 EU-Verordnungen + HLEG Ethics Guidelines
- Add scripts/rag-sources.md with license documentation
- 9 new tests for B2B/B2C trigger split, all 326 tests pass
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- LegalRAGClient: QDRANT_HOST+PORT → QDRANT_URL + QDRANT_API_KEY
- docker-compose: env vars updated for hosted Qdrant
- AllowedCollections: added bp_compliance_gdpr, bp_dsfa_templates, bp_dsfa_risks
- Migration scripts (bash + python) for data transfer
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Part 1 — RAG Corpus Versioning:
- New DB table compliance_corpus_versions (migration 017)
- Go CorpusVersionStore with CRUD operations
- Assessment struct extended with corpus_version_id
- API endpoints: GET /rag/corpus-status, /rag/corpus-versions/:collection
- RAG routes (search, regulations) now registered in main.go
- Ingestion script registers corpus versions after each run
- Frontend staleness badge in SDK sidebar
Part 3 — Source Policy Backend:
- New FastAPI router with CRUD for allowed sources, PII rules,
operations matrix, audit trail, stats, and compliance report
- SQLAlchemy models for all source policy tables (migration 001)
- Frontend API base corrected from edu-search:8088/8089 to
backend-compliance:8002/api
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EDPB migrated from /sites/default/files/ to /system/files/YYYY-MM/.
Updated all URLs to current working paths.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds download URLs for 11 EDPB guidelines, EDPS DPIA list,
3 ENISA reports, and TMG/UrhG to the missing files check.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Script downloads original regulation PDFs from EUR-Lex into
~/rag-originals/ for use with the RAG QA Split-View Chunk-Browser.
Lists missing national law PDFs that require manual download.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>