breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	c52dbdb8f1	feat(rag): optimize RAG pipeline — JSON-Mode, CoT, Hybrid Search, Re-Ranking, Cross-Reg Dedup, chunk 1024 Some checks failed CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 42s Details CI/CD / test-python-backend-compliance (push) Successful in 1m38s Details CI/CD / test-python-document-crawler (push) Successful in 20s Details CI/CD / test-python-dsms-gateway (push) Successful in 17s Details CI/CD / validate-canonical-controls (push) Successful in 10s Details CI/CD / Deploy (push) Has been skipped Details Phase 1 (LLM Quality): - Add format=json to all Ollama payloads (obligation_extractor, control_generator, citation_backfill) - Add Chain-of-Thought analysis steps to Pass 0a/0b system prompts Phase 2 (Retrieval Quality): - Hybrid search via Qdrant Query API with RRF fusion + automatic text index (legal_rag.go) - Fallback to dense-only search if Query API unavailable - Cross-encoder re-ranking with BGE Reranker v2 (RERANK_ENABLED=false by default) - CPU-only PyTorch dependency to keep Docker image small Phase 3 (Data Layer): - Cross-regulation dedup pass (threshold 0.95) links controls across regulations - DedupResult.link_type field distinguishes dedup_merge vs cross_regulation - Chunk size defaults updated 512/50 → 1024/128 for new ingestions only - Existing collections and controls are NOT affected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-21 11:49:43 +01:00
Benjamin Admin	5dd7a27336	fix(pipeline): add missing regulation codes to LICENSE_MAP Some checks failed CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 41s Details CI/CD / test-python-backend-compliance (push) Successful in 1m0s Details CI/CD / test-python-document-crawler (push) Successful in 21s Details CI/CD / test-python-dsms-gateway (push) Successful in 17s Details CI/CD / validate-canonical-controls (push) Successful in 10s Details CI/CD / Deploy (push) Has been skipped Details eu_2023_1542 (Batterieverordnung), eu_2023_988 (GPSR), nist_sp800_218, nist_privacy_1_0, owasp_mobile_top10 were defaulting to Rule 3 (restricted) instead of their correct rules. This caused 68/71 controls to be flagged as too_close in the last pipeline run. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 22:14:10 +01:00
Benjamin Admin	f2819b99af	feat(pipeline): v3 — scoped control applicability + source_type classification Some checks failed CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 36s Details CI/CD / test-python-backend-compliance (push) Successful in 36s Details CI/CD / test-python-document-crawler (push) Successful in 27s Details CI/CD / test-python-dsms-gateway (push) Successful in 18s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Has been skipped Details Phase 4: source_type (law/guideline/standard/restricted) on source_citation - NIST/OWASP/ENISA correctly shown as "Standard" instead of "Gesetzliche Grundlage" - Dynamic frontend labels based on source_type - Backfill endpoint POST /v1/canonical/generate/backfill-source-type Phase v3: Scoped Control Applicability - 3 new fields: applicable_industries, applicable_company_size, scope_conditions - LLM prompt extended with 39 industries, 5 company sizes, 10 scope signals - All 5 generation paths (Rule 1/2/3, batch structure, batch reform) updated - _build_control_from_json: parsing + validation (string→list, size validation) - _store_control: writes 3 new JSONB columns - API: response models, create/update requests, SELECT queries extended - Migration 063: 3 new JSONB columns with GIN indexes - 110 generator tests + 28 route tests = 138 total, all passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-18 16:28:05 +01:00
Benjamin Admin	148c7ba3af	feat(qa): recital detection, review split, duplicate comparison Some checks failed CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 42s Details CI/CD / test-python-backend-compliance (push) Successful in 34s Details CI/CD / test-python-document-crawler (push) Successful in 21s Details CI/CD / test-python-dsms-gateway (push) Successful in 20s Details CI/CD / validate-canonical-controls (push) Successful in 12s Details CI/CD / Deploy (push) Has been skipped Details Add _detect_recital() to QA pipeline — flags controls where source_original_text contains Erwägungsgrund markers instead of article text (28% of controls with source text affected). - Recital detection via regex + phrase matching in QA validation - 10 new tests (TestRecitalDetection), 81 total - ReviewCompare component for side-by-side duplicate comparison - Review mode split: Duplikat-Verdacht vs Rule-3-ohne-Anchor tabs - MkDocs: recital detection documentation - Detection script for bulk analysis (scripts/find_recital_controls.py) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-18 08:20:02 +01:00
Benjamin Admin	a9e0869205	feat(pipeline): pipeline_version v2, migration 062, docs + 71 tests - Add PIPELINE_VERSION=2 constant and pipeline_version column to canonical_controls and canonical_processed_chunks (migration 062) - Anthropic API decides chunk relevance via null-returns (skip_prefilter) - Annex/appendix chunks explicitly protected in prompts - Fix 6 failing tests (CRYP domain, _process_batch tuple return) - Add TestPipelineVersion + TestRegulationFilter test classes (10 new tests) - Add MkDocs page: control-generator-pipeline.md (541 lines) - Update canonical-control-library.md with v2 pipeline diagram - Update testing.md with 71-test breakdown table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 17:31:11 +01:00
Benjamin Admin	653aad57e3	Let Anthropic API decide chunk relevance instead of local prefilter Updated both structure_batch and reformulate_batch prompts to return null for chunks without actionable requirements (definitions, TOCs, scope-only). Explicit instruction to always process annexes/appendices as they often contain concrete technical requirements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 16:44:01 +01:00
Benjamin Admin	a7f7e57dd7	Add skip_prefilter option to control generator Local LLM prefilter (llama3.2 3B) was incorrectly skipping annex chunks that contain concrete requirements. Added skip_prefilter flag to bypass the local pre-filter and send all chunks directly to Anthropic API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 16:30:57 +01:00
Benjamin Admin	567e82ddf5	Fix stale DB session after long embedding pre-load The embedding pre-load for 4998 existing controls takes ~16 minutes, causing the SQLAlchemy session to become invalid. Added rollback after pre-load completes to reset the session before subsequent DB operations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 14:34:44 +01:00
Benjamin Admin	36ef34169a	Fix regulation_filter bypass for chunks without regulation_code Chunks without a regulation_code were silently passing through the filter in _scan_rag(), causing unrelated documents (e.g. Data Act, legal templates) to be included in filtered generation jobs. Now chunks without reg_code are skipped when regulation_filter is active. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 13:38:25 +01:00
Benjamin Admin	d22c47c9eb	feat(pipeline): Anthropic Batch API, source/regulation filter, cost optimization Some checks failed CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 35s Details CI/CD / test-python-backend-compliance (push) Successful in 34s Details CI/CD / test-python-document-crawler (push) Successful in 22s Details CI/CD / test-python-dsms-gateway (push) Successful in 19s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Has been skipped Details - Add Anthropic API support to decomposition Pass 0a/0b (prompt caching, content batching) - Add Anthropic Batch API (50% cost reduction, async 24h processing) - Add source_filter (ILIKE on source_citation) for regulation-based filtering - Add category_filter to Pass 0a for selective decomposition - Add regulation_filter to control_generator for RAG scan phase filtering (prefix match on regulation_code — enables CE + Code Review focus) - New API endpoints: batch-submit-0a, batch-submit-0b, batch-status, batch-process - 83 new tests (all passing) Cost reduction: $2,525 → ~$600-700 with all optimizations combined. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 13:22:01 +01:00
Benjamin Admin	4f6bc8f6f6	feat(training+controls): interactive video pipeline, training blocks, control generator, CE libraries Some checks failed CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 37s Details CI/CD / test-python-backend-compliance (push) Successful in 39s Details CI/CD / test-python-document-crawler (push) Successful in 26s Details CI/CD / test-python-dsms-gateway (push) Successful in 23s Details CI/CD / validate-canonical-controls (push) Successful in 12s Details CI/CD / Deploy (push) Has been skipped Details Interactive Training Videos (CP-TRAIN): - DB migration 022: training_checkpoints + checkpoint_progress tables - NarratorScript generation via Anthropic (AI Teacher persona, German) - TTS batch synthesis + interactive video pipeline (slides + checkpoint slides + FFmpeg) - 4 new API endpoints: generate-interactive, interactive-manifest, checkpoint submit, checkpoint progress - InteractiveVideoPlayer component (HTML5 Video, quiz overlay, seek protection, progress tracking) - Learner portal integration with automatic completion on all checkpoints passed - 30 new tests (handler validation + grading logic + manifest/progress + seek protection) Training Blocks: - Block generator, block store, block config CRUD + preview/generate endpoints - Migration 021: training_blocks schema Control Generator + Canonical Library: - Control generator routes + service enhancements - Canonical control library helpers, sidebar entry - Citation backfill service + tests - CE libraries data (hazard, protection, evidence, lifecycle, components) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 21:41:48 +01:00
Benjamin Admin	3b2006ebce	feat(iace): add hazard-matching-engine with component library, tag system, and pattern engine All checks were successful CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 44s Details CI/CD / test-python-backend-compliance (push) Successful in 33s Details CI/CD / test-python-document-crawler (push) Successful in 22s Details CI/CD / test-python-dsms-gateway (push) Successful in 19s Details CI/CD / validate-canonical-controls (push) Successful in 13s Details CI/CD / Deploy (push) Successful in 4s Details Implements Phases 1-4 of the IACE Hazard-Matching-Engine: - 120 machine components (C001-C120) in 11 categories - 20 energy sources (EN01-EN20) - ~85 tag taxonomy across 5 domains - 44 hazard patterns with AND/NOT matching logic - Pattern engine with tag resolution and confidence scoring - 8 new API endpoints (component-library, energy-sources, tags, patterns, match/apply) - Completeness gate G09 for pattern matching - 320 tests passing (36 new) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 08:50:11 +01:00
Benjamin Admin	c8fd9cc780	feat(control-library): document-grouped batching, generation strategy tracking, sort by source All checks were successful CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 31s Details CI/CD / test-python-backend-compliance (push) Successful in 31s Details CI/CD / test-python-document-crawler (push) Successful in 21s Details CI/CD / test-python-dsms-gateway (push) Successful in 18s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Successful in 2s Details - Group chunks by regulation_code before batching for better LLM context - Add generation_strategy column (ungrouped=v1, document_grouped=v2) - Add v1/v2 badge to control cards in frontend - Add sort-by-source option with visual group headers - Add frontend page tests (18 tests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 15:10:52 +01:00
Benjamin Admin	0171d611f6	feat: add policy library with 29 German policy templates All checks were successful CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 34s Details CI/CD / test-python-backend-compliance (push) Successful in 35s Details CI/CD / test-python-document-crawler (push) Successful in 26s Details CI/CD / test-python-dsms-gateway (push) Successful in 19s Details CI/CD / validate-canonical-controls (push) Successful in 10s Details CI/CD / Deploy (push) Successful in 2s Details Add 29 new document types (IT security, data, personnel, vendor, BCM policies) to VALID_DOCUMENT_TYPES and 5 category pills to the document generator UI. Include seed script for production DB population. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 22:37:33 +01:00
Benjamin Admin	49ce417428	feat: add compliance modules 2-5 (dashboard, security templates, process manager, evidence collector) All checks were successful CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 32s Details CI/CD / test-python-backend-compliance (push) Successful in 34s Details CI/CD / test-python-document-crawler (push) Successful in 23s Details CI/CD / test-python-dsms-gateway (push) Successful in 21s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Successful in 2s Details Module 2: Extended Compliance Dashboard with roadmap, module-status, next-actions, snapshots, score-history Module 3: 7 German security document templates (IT-Sicherheitskonzept, Datenschutz, Backup, Logging, Incident-Response, Zugriff, Risikomanagement) Module 4: Compliance Process Manager with CRUD, complete/skip/seed, ~50 seed tasks, 3-tab UI Module 5: Evidence Collector Extended with automated checks, control-mapping, coverage report, 4-tab UI Also includes: canonical control library enhancements (verification method, categories, dedup), control generator improvements, RAG client extensions 52 tests pass, frontend builds clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 21:03:04 +01:00
Benjamin Admin	13d13c8226	fix: add all RAG regulation codes to license mapping All checks were successful CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 34s Details CI/CD / test-python-backend-compliance (push) Successful in 32s Details CI/CD / test-python-document-crawler (push) Successful in 24s Details CI/CD / test-python-dsms-gateway (push) Successful in 19s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Successful in 1s Details Many regulation codes (nist_sp800_53r5, eucsa, owasp_top10_2021, EDPB guidelines, EU laws, AT/FR/ES/NL/IT/HU laws) were defaulting to Rule 3 (restricted) because they weren't in REGULATION_LICENSE_MAP. Now all ~100 regulation codes from RAG are properly mapped to Rule 1 or 2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 08:38:31 +01:00
Benjamin Admin	b6e6ffaaee	feat: add verification method, categories, and dedup UI to control library All checks were successful CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 44s Details CI/CD / test-python-backend-compliance (push) Successful in 40s Details CI/CD / test-python-document-crawler (push) Successful in 22s Details CI/CD / test-python-dsms-gateway (push) Successful in 17s Details CI/CD / validate-canonical-controls (push) Successful in 10s Details CI/CD / Deploy (push) Successful in 4s Details - Migration 047: verification_method + category columns, 17 category lookup table - Backend: new filters, GET /categories, GET /controls/{id}/similar (embedding-based) - Frontend: filter dropdowns, badges, dedup UI in ControlDetail with merge workflow - ControlForm: verification method + category selects - Provenance: verification methods, categories, master library strategy sections - Fix UUID cast syntax in generator routes (::uuid -> CAST) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 07:55:22 +01:00
Benjamin Admin	9812ff46f3	feat: add 7-stage control generator pipeline with 3 license rules - control_generator.py: RAG→License→Structure/Reform→Harmonize→Anchor→Store→Mark pipeline with Anthropic Claude API (primary) + Ollama fallback for LLM reformulation - anchor_finder.py: RAG-based + DuckDuckGo anchor search for open references - control_generator_routes.py: REST API for generate, job status, review queue, processed stats - 046_control_generator.sql: job tracking, chunk tracking, blocked sources tables; extends canonical_controls with license_rule, source_original_text, source_citation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 18:42:40 +01:00
Benjamin Admin	c530898963	fix: replace Python 3.10+ union type syntax with typing.Optional for Pydantic v2 compat Some checks failed CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 37s Details CI/CD / test-python-backend-compliance (push) Successful in 35s Details CI/CD / test-python-document-crawler (push) Successful in 24s Details CI/CD / test-python-dsms-gateway (push) Successful in 19s Details CI/CD / validate-canonical-controls (push) Successful in 12s Details CI/CD / deploy-hetzner (push) Has been cancelled Details from __future__ import annotations breaks Pydantic BaseModel runtime type evaluation. Replaced str \| None → Optional[str], list[str] → List[str] etc. in control_generator.py, anchor_finder.py, control_generator_routes.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 09:36:14 +01:00
Benjamin Admin	de19ef0684	feat(control-generator): 7-stage pipeline for RAG→LLM→Controls generation Some checks failed CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 45s Details CI/CD / test-python-document-crawler (push) Has been cancelled Details CI/CD / test-python-dsms-gateway (push) Has been cancelled Details CI/CD / validate-canonical-controls (push) Has been cancelled Details CI/CD / deploy-hetzner (push) Has been cancelled Details CI/CD / test-python-backend-compliance (push) Has been cancelled Details Implements the Control Generator Pipeline that systematically generates canonical security controls from 150k+ RAG chunks across all compliance collections (BSI, NIST, OWASP, ENISA, EU laws, German laws). Three license rules enforced throughout: - Rule 1 (free_use): Laws/Public Domain — original text preserved - Rule 2 (citation_required): CC-BY/CC-BY-SA — text with citation - Rule 3 (restricted): BSI/ISO — full reformulation, no source traces New files: - Migration 046: job tracking, chunk tracking, blocked sources tables - control_generator.py: 7-stage pipeline (scan→classify→structure/reform→harmonize→anchor→store→mark) - anchor_finder.py: RAG + DuckDuckGo open-source reference search - control_generator_routes.py: REST API (generate, review, stats, blocked-sources) - test_control_generator.py: license mapping, rule enforcement, anchor filtering tests Modified: - __init__.py: register control_generator_router - route.ts: proxy generator/review/stats endpoints - page.tsx: Generator modal, stats panel, state filter, review queue, license badges Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 09:03:37 +01:00

20 Commits