Commit Graph

11 Commits

Author SHA1 Message Date
Benjamin Admin
c52dbdb8f1 feat(rag): optimize RAG pipeline — JSON-Mode, CoT, Hybrid Search, Re-Ranking, Cross-Reg Dedup, chunk 1024
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 42s
CI/CD / test-python-backend-compliance (push) Successful in 1m38s
CI/CD / test-python-document-crawler (push) Successful in 20s
CI/CD / test-python-dsms-gateway (push) Successful in 17s
CI/CD / validate-canonical-controls (push) Successful in 10s
CI/CD / Deploy (push) Has been skipped
Phase 1 (LLM Quality):
- Add format=json to all Ollama payloads (obligation_extractor, control_generator, citation_backfill)
- Add Chain-of-Thought analysis steps to Pass 0a/0b system prompts

Phase 2 (Retrieval Quality):
- Hybrid search via Qdrant Query API with RRF fusion + automatic text index (legal_rag.go)
- Fallback to dense-only search if Query API unavailable
- Cross-encoder re-ranking with BGE Reranker v2 (RERANK_ENABLED=false by default)
- CPU-only PyTorch dependency to keep Docker image small

Phase 3 (Data Layer):
- Cross-regulation dedup pass (threshold 0.95) links controls across regulations
- DedupResult.link_type field distinguishes dedup_merge vs cross_regulation
- Chunk size defaults updated 512/50 → 1024/128 for new ingestions only
- Existing collections and controls are NOT affected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 11:49:43 +01:00
Benjamin Admin
825e070ed9 feat(multi-layer): complete Multi-Layer Control Architecture (Phases 1-8 + Pass 0)
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 47s
CI/CD / test-python-backend-compliance (push) Successful in 33s
CI/CD / test-python-document-crawler (push) Successful in 24s
CI/CD / test-python-dsms-gateway (push) Successful in 18s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Has been skipped
Implements the full Multi-Layer Control Architecture for migrating ~25,000
Rich Controls into atomic, deduplicated Master Controls with full traceability.

Architecture: Legal Source → Obligation → Control Pattern → Master Control → Customer Instance

New services:
- ObligationExtractor: 3-tier extraction (exact → embedding → LLM)
- PatternMatcher: 2-tier matching (keyword + embedding + domain-bonus)
- ControlComposer: Pattern + Obligation → Master Control
- PipelineAdapter: Pipeline integration + Migration Passes 1-5
- DecompositionPass: Pass 0a/0b — Rich Control → atomic Controls
- CrosswalkRoutes: 15 API endpoints under /v1/canonical/

New DB schema:
- Migration 060: obligation_extractions, control_patterns, crosswalk_matrix
- Migration 061: obligation_candidates, parent_control_uuid tracking

Pattern Library: 50 YAML patterns (30 core + 20 IT-security)
Go SDK: Pattern loader with YAML validation and indexing
Documentation: MkDocs updated with full architecture overview

500 Python tests passing across all components.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 09:00:37 +01:00
Benjamin Admin
49ce417428 feat: add compliance modules 2-5 (dashboard, security templates, process manager, evidence collector)
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 32s
CI/CD / test-python-backend-compliance (push) Successful in 34s
CI/CD / test-python-document-crawler (push) Successful in 23s
CI/CD / test-python-dsms-gateway (push) Successful in 21s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Successful in 2s
Module 2: Extended Compliance Dashboard with roadmap, module-status, next-actions, snapshots, score-history
Module 3: 7 German security document templates (IT-Sicherheitskonzept, Datenschutz, Backup, Logging, Incident-Response, Zugriff, Risikomanagement)
Module 4: Compliance Process Manager with CRUD, complete/skip/seed, ~50 seed tasks, 3-tab UI
Module 5: Evidence Collector Extended with automated checks, control-mapping, coverage report, 4-tab UI

Also includes: canonical control library enhancements (verification method, categories, dedup), control generator improvements, RAG client extensions

52 tests pass, frontend builds clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 21:03:04 +01:00
Benjamin Admin
050f353192 feat(canonical-controls): Canonical Control Library — rechtssichere Security Controls
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 40s
CI/CD / test-python-backend-compliance (push) Successful in 41s
CI/CD / test-python-document-crawler (push) Successful in 26s
CI/CD / test-python-dsms-gateway (push) Successful in 23s
CI/CD / validate-canonical-controls (push) Successful in 18s
CI/CD / deploy-hetzner (push) Successful in 2m26s
Eigenstaendig formulierte Security Controls mit unabhaengiger Taxonomie
und Open-Source-Verankerung (OWASP, NIST, ENISA). Keine BSI-Nomenklatur.

- Migration 044: 5 DB-Tabellen (frameworks, controls, sources, licenses, mappings)
- 10 Seed Controls mit 39 Open-Source-Referenzen
- License Gate: Quellen-Berechtigungspruefung (analysis/excerpt/embeddings/product)
- Too-Close-Detektor: 5 Metriken (exact-phrase, token-overlap, ngram, embedding, LCS)
- REST API: 8 Endpoints unter /v1/canonical/
- Go Loader mit Multi-Index (ID, domain, severity, framework)
- Frontend: Control Library Browser + Provenance Wiki
- CI/CD: validate-controls.py Job (schema, no-leak, open-anchors)
- 67 Tests (8 Go + 59 Python), alle PASS
- MkDocs Dokumentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 19:55:06 +01:00
Benjamin Admin
4d2f4f2d24 feat(qdrant): Migrate to hosted qdrant-dev.breakpilot.ai with API-Key auth
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-ai-compliance (push) Successful in 37s
CI / test-python-backend-compliance (push) Successful in 32s
CI / test-python-document-crawler (push) Successful in 22s
CI / test-python-dsms-gateway (push) Successful in 18s
- LegalRAGClient: QDRANT_HOST+PORT → QDRANT_URL + QDRANT_API_KEY
- docker-compose: env vars updated for hosted Qdrant
- AllowedCollections: added bp_compliance_gdpr, bp_dsfa_templates, bp_dsfa_risks
- Migration scripts (bash + python) for data transfer

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 13:55:21 +01:00
Benjamin Admin
38e278ee3c feat(ucca): Pflichtendatenbank v2 (325 Obligations), Trigger-Engine, TOM-Control-Mapping
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-ai-compliance (push) Successful in 32s
CI / test-python-backend-compliance (push) Successful in 29s
CI / test-python-document-crawler (push) Successful in 20s
CI / test-python-dsms-gateway (push) Successful in 18s
- 9 Regulation-JSON-Dateien (DSGVO 80, AI Act 60, NIS2 40, BDSG 30, TTDSG 20, DSA 35, Data Act 25, EU-Maschinen 15, DORA 20)
- Condition-Tree-Engine fuer automatische Pflichtenselektion (all_of/any_of, 80+ Field-Paths)
- Generischer JSONRegulationModule-Loader mit YAML-Fallback
- Bidirektionales TOM-Control-Mapping (291 Obligation→Control, 92 Control→Obligation)
- Gap-Analyse-Engine (Compliance-%, Priority Actions, Domain Breakdown)
- ScopeDecision→UnifiedFacts Bridge fuer Auto-Profiling
- 4 neue API-Endpoints (assess-from-scope, tom-controls, gap-analysis, reverse-lookup)
- Frontend: Auto-Profiling Button, Regulation-Filter Chips, TOM-Panel, Gap-Analyse-View
- 18 Unit Tests (Condition Engine, v2 Loader, TOM Mapper)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 14:51:44 +01:00
Benjamin Admin
312c2c9b60 feat: Use-Cases/UCCA Module auf 100% — Interface Fix, Search/Offset/Total, Explain/Export, Edit-Mode
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-ai-compliance (push) Successful in 35s
CI / test-python-backend-compliance (push) Successful in 33s
CI / test-python-document-crawler (push) Successful in 23s
CI / test-python-dsms-gateway (push) Successful in 18s
Kritische Bug Fixes:
- [id]/page.tsx: FullAssessment Interface repariert (nested result → flat fields)
- resultForCard baut explizit aus flachen Assessment-Feldern (feasibility, risk_score etc.)
- Use-Case-Text-Pfad: assessment.intake?.use_case_text statt assessment.use_case_text
- rule_code/code Mapping beim Übergeben an AssessmentResultCard

Backend (A2+A3):
- store.go: AssessmentFilters um Search + Offset erweitert
- ListAssessments: COUNT-Query (total), ILIKE-Search auf title, OFFSET-Pagination
- ListAssessments Signatur: ([]Assessment, int, error)
- Handler: search/offset aus Query-Params, total in Response
- import "strconv" hinzugefügt

Neue Features:
- KI-Erklärung Button (POST /explain) mit lila Erklärungsbox
- Export-Buttons Markdown + JSON (Download-Links)
- Edit-Mode in new/page.tsx: useSearchParams(?edit=id), Form vorausfüllen
- Bedingte PUT/POST Logik; nach Edit → Detail-Seite Redirect
- Suspense-Wrapper für useSearchParams (Next.js 15 Requirement)

Backend Edit:
- store.go: UpdateAssessment() Methode (UPDATE-Query)
- ucca_handlers.go: UpdateAssessment Handler (re-evaluiert Intake)
- main.go: PUT /ucca/assessments/:id Route registriert

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 15:13:42 +01:00
Benjamin Admin
14a99322eb feat: Phase 2 — RAG integration in Requirements + DSFA Draft
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-ai-compliance (push) Successful in 35s
CI / test-python-backend-compliance (push) Successful in 26s
CI / test-python-document-crawler (push) Successful in 22s
CI / test-python-dsms-gateway (push) Successful in 19s
Add legal context enrichment from Qdrant vector corpus to the two
highest-priority modules (Requirements AI assistant and DSFA drafting
engine).

Go SDK:
- Add SearchCollection() with collection override + whitelist validation
- Refactor Search() to delegate to shared searchInternal()

Python backend:
- New ComplianceRAGClient proxying POST /sdk/v1/rag/search (error-tolerant)
- AI assistant: enrich interpret_requirement() and suggest_controls() with RAG
- Requirements API: add ?include_legal_context=true query parameter

Admin (Next.js):
- Extract shared queryRAG() utility from chat route
- Inject RAG legal context into v1 and v2 draft pipelines

Tests for all three layers (Go, Python, TypeScript shared utility).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 08:57:39 +01:00
Benjamin Admin
a228b3b528 feat: add RAG corpus versioning and source policy backend
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-ai-compliance (push) Successful in 34s
CI / test-python-backend-compliance (push) Successful in 32s
CI / test-python-document-crawler (push) Successful in 23s
CI / test-python-dsms-gateway (push) Successful in 18s
Part 1 — RAG Corpus Versioning:
- New DB table compliance_corpus_versions (migration 017)
- Go CorpusVersionStore with CRUD operations
- Assessment struct extended with corpus_version_id
- API endpoints: GET /rag/corpus-status, /rag/corpus-versions/:collection
- RAG routes (search, regulations) now registered in main.go
- Ingestion script registers corpus versions after each run
- Frontend staleness badge in SDK sidebar

Part 3 — Source Policy Backend:
- New FastAPI router with CRUD for allowed sources, PII rules,
  operations matrix, audit trail, stats, and compliance report
- SQLAlchemy models for all source policy tables (migration 001)
- Frontend API base corrected from edu-search:8088/8089 to
  backend-compliance:8002/api

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 07:58:08 +01:00
Benjamin Boenisch
899e22a31b feat(rag): connect bp_compliance_ce vector corpus to SDK
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-ai-compliance (push) Successful in 46s
CI / test-python-backend-compliance (push) Successful in 30s
CI / test-python-dsms-gateway (push) Has been cancelled
CI / test-python-document-crawler (push) Has been cancelled
- Switch LegalRAGClient from empty bp_legal_corpus to bp_compliance_ce
  collection (3,734 chunks across 14 regulations)
- Replace embedding-service (384-dim MiniLM) with Ollama bge-m3 (1024-dim)
- Add standalone RAG search endpoint: POST /sdk/v1/rag/search
- Add regulations list endpoint: GET /sdk/v1/rag/regulations
- Add QDRANT_HOST/PORT env vars to docker-compose.yml
- Update regulation ID mapping to match actual Qdrant payload schema
- Update determineRelevantRegulations for CE corpus regulation IDs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 23:44:47 +01:00
Benjamin Boenisch
4435e7ea0a Initial commit: breakpilot-compliance - Compliance SDK Platform
Services: Admin-Compliance, Backend-Compliance,
AI-Compliance-SDK, Consent-SDK, Developer-Portal,
PCA-Platform, DSMS

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 23:47:28 +01:00