Commit Graph

13 Commits

Author SHA1 Message Date
Benjamin Admin
8cb1dc1108 Fix pass0b queries to skip deprecated/duplicate controls
The NOT EXISTS check and Duplicate Guard now exclude deprecated and
duplicate controls, enabling clean re-runs after invalidation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 09:09:16 +01:00
Benjamin Admin
f8d9919b97 Improve object normalization: shorter keys, synonym expansion, qualifier stripping
- Truncate object keys to 40 chars (was 80) at underscore boundary
- Strip German qualifying prepositional phrases (bei/für/gemäß/von/zur/...)
- Add 65 new synonym mappings for near-duplicate patterns found in analysis
- Strip trailing noise tokens (articles/prepositions)
- Add _truncate_at_boundary() helper and _QUALIFYING_PHRASE_RE regex
- 11 new tests for normalization improvements (227 total pass)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 08:55:48 +01:00
Benjamin Admin
fb2cf29b34 fix: Pass 0b — Duplicate Guard, Severity-Kalibrierung, Title-Truncation
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 55s
CI/CD / test-python-backend-compliance (push) Successful in 36s
CI/CD / test-python-document-crawler (push) Successful in 23s
CI/CD / test-python-dsms-gateway (push) Successful in 20s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Successful in 4s
1. Duplicate Guard: merge_hint-Lookup vor INSERT in _write_atomic_control()
   verhindert semantisch identische Controls unter demselben Parent.
2. Severity-Kalibrierung: action_type-basiert statt blind vom Parent.
   define/review/test → max medium, implement/monitor → max high.
3. Title-Truncation: Schnitt am Wortende statt mitten im Wort.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 08:38:33 +01:00
Benjamin Admin
e6201d5239 feat: Anti-Fake-Evidence System (Phase 1-4b)
Implement full evidence integrity pipeline to prevent compliance theater:
- Confidence levels (E0-E4), truth status tracking, assertion engine
- Four-Eyes approval workflow, audit trail, reject endpoint
- Evidence distribution dashboard, LLM audit routes
- Traceability matrix (backend endpoint + Compliance Hub UI tab)
- Anti-fake badges, control status machine, normative patterns
- 2 migrations, 4 test suites, MkDocs documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 17:15:45 +01:00
Benjamin Admin
48ca0a6bef feat: Framework Decomposition Engine + Composite Detection for Pass 0b
Adds a routing layer between Pass 0a and Pass 0b that classifies obligations
into atomic/compound/framework_container. Framework-container obligations
(e.g. "CCM-Praktiken fuer AIS") are decomposed into concrete sub-obligations
via an internal framework registry before Pass 0b composition.

- New: framework_decomposition.py with routing, matching, decomposition
- New: Framework registry (NIST SP 800-53, OWASP ASVS, CSA CCM) as JSON
- New: Composite detection flags on atomic controls (is_composite, atomicity)
- New: gen_meta fields: framework_ref, framework_domain, decomposition_source
- Integration: _route_and_compose() in run_pass0b() deterministic path
- 248 tests (198 decomposition + 50 framework), all passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 12:11:55 +01:00
Benjamin Admin
1a63f5857b feat: Deterministic Control Composition Engine v2 for Pass 0b
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 38s
CI/CD / test-python-backend-compliance (push) Successful in 39s
CI/CD / test-python-document-crawler (push) Successful in 26s
CI/CD / test-python-dsms-gateway (push) Successful in 21s
CI/CD / validate-canonical-controls (push) Successful in 13s
CI/CD / Deploy (push) Successful in 3s
Replace LLM-based Pass 0b with rule-based deterministic engine that
composes atomic controls from obligation data without any LLM call.

Engine features:
- 24 action types (implement, configure, define, document, maintain,
  review, monitor, assess, audit, test, verify, validate, report,
  notify, train, restrict_access, encrypt, delete, retain, ensure,
  approve, remediate, perform, obtain)
- 19 object classes (policy, procedure, register, record, report,
  technical_control, access_control, cryptographic_control,
  configuration, account, system, data, interface, role, training,
  incident, risk_artifact, process, consent)
- Compound action splitting with no-split phrases
- Title pattern: "{Object} {state_suffix}" (24 state suffixes)
- Statement field: "{condition} {object} ist {trigger} {action}"
- Pattern candidates for downstream categorization (26 specific
  combos + 24 action fallbacks)
- Structured timing: deadline_hours + frequency extraction
- Confidence scoring (0.3 base + action/object/trigger/template)
- Merge group hints for dedup: "{action}:{norm_obj}:{trigger}"
- Synonym-based object normalization (50+ German synonyms)
- 16 specific (action_type, object_class) template overrides
- Output validator with Pflichtfelder + Negativregeln + Warnregeln
- All new fields serialized into gen_meta JSONB (no migration needed)

Tests: 185 passed (33 new tests covering all engine components)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 11:05:48 +01:00
Benjamin Admin
649a3c5e4e perf: switch Pass 0b default model to Haiku 4.5
Benchmark shows Haiku is 2.5x faster than Sonnet at 5x lower cost
for this JSON structuring task. Quality is equivalent.
$142 vs $705 for 75K obligations, ~2.8 days vs ~7 days.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 09:12:01 +01:00
Benjamin Admin
bdd2f6fa0f fix: cap Anthropic max_tokens to 16384 for Pass 0b batches
Previous formula (batch_size * 1500) exceeded Claude's 16K output limit
for batch_size > 10, causing API failures and Ollama fallback.
New formula: min(16384, max(4096, batch_size * 500))

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 08:50:45 +01:00
Benjamin Admin
ac6134ce6d feat: control_parent_links population + traceability API + frontend
- _write_atomic_control() now uses RETURNING id and inserts into
  control_parent_links (M:N) with source_regulation, source_article,
  and obligation_candidate_id parsed from parent's source_citation
- New _parse_citation() helper for JSONB source_citation extraction
- New GET /controls/{id}/traceability endpoint returning full chain:
  parent links with obligations, child controls, source_count
- Backend: control_type filter (atomic/rich) for controls + count
- Frontend: Rechtsgrundlagen section in ControlDetail showing all
  parent links per source regulation with obligation text + strength
- Frontend: Atomic/Rich filter dropdown in Control Library list
- Frontend: GenerationStrategyBadge recognizes 'pass0b' strategy
- Tests: 3 new tests for parent_link creation + citation parsing,
  existing batch test mock updated for RETURNING clause

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 08:14:29 +01:00
Benjamin Admin
a14e2f3a00 feat(decomposition): add merge pass, enrichment, and Pass 0b refinements
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 51s
CI/CD / test-python-backend-compliance (push) Successful in 34s
CI/CD / test-python-document-crawler (push) Successful in 23s
CI/CD / test-python-dsms-gateway (push) Successful in 20s
CI/CD / validate-canonical-controls (push) Successful in 12s
CI/CD / Deploy (push) Has been skipped
Add obligation refinement pipeline between Pass 0a and 0b:
- Merge pass: rule-based dedup of implementation-level duplicate obligations
  within the same parent control (Jaccard similarity on action+object)
- Enrich pass: classify trigger_type (event/periodic/continuous) and detect
  is_implementation_specific from obligation text (regex-based, no LLM)
- Pass 0b: skip merged obligations, cap severity for impl-specific, override
  category to 'testing' for test obligations
- Migration 075: merged_into_id, trigger_type, is_implementation_specific
- Two new API endpoints: merge-obligations, enrich-obligations
- 30+ new tests (122 total, all passing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 22:27:09 +01:00
Benjamin Admin
c52dbdb8f1 feat(rag): optimize RAG pipeline — JSON-Mode, CoT, Hybrid Search, Re-Ranking, Cross-Reg Dedup, chunk 1024
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 42s
CI/CD / test-python-backend-compliance (push) Successful in 1m38s
CI/CD / test-python-document-crawler (push) Successful in 20s
CI/CD / test-python-dsms-gateway (push) Successful in 17s
CI/CD / validate-canonical-controls (push) Successful in 10s
CI/CD / Deploy (push) Has been skipped
Phase 1 (LLM Quality):
- Add format=json to all Ollama payloads (obligation_extractor, control_generator, citation_backfill)
- Add Chain-of-Thought analysis steps to Pass 0a/0b system prompts

Phase 2 (Retrieval Quality):
- Hybrid search via Qdrant Query API with RRF fusion + automatic text index (legal_rag.go)
- Fallback to dense-only search if Query API unavailable
- Cross-encoder re-ranking with BGE Reranker v2 (RERANK_ENABLED=false by default)
- CPU-only PyTorch dependency to keep Docker image small

Phase 3 (Data Layer):
- Cross-regulation dedup pass (threshold 0.95) links controls across regulations
- DedupResult.link_type field distinguishes dedup_merge vs cross_regulation
- Chunk size defaults updated 512/50 → 1024/128 for new ingestions only
- Existing collections and controls are NOT affected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 11:49:43 +01:00
Benjamin Admin
d22c47c9eb feat(pipeline): Anthropic Batch API, source/regulation filter, cost optimization
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 35s
CI/CD / test-python-backend-compliance (push) Successful in 34s
CI/CD / test-python-document-crawler (push) Successful in 22s
CI/CD / test-python-dsms-gateway (push) Successful in 19s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Has been skipped
- Add Anthropic API support to decomposition Pass 0a/0b (prompt caching, content batching)
- Add Anthropic Batch API (50% cost reduction, async 24h processing)
- Add source_filter (ILIKE on source_citation) for regulation-based filtering
- Add category_filter to Pass 0a for selective decomposition
- Add regulation_filter to control_generator for RAG scan phase filtering
  (prefix match on regulation_code — enables CE + Code Review focus)
- New API endpoints: batch-submit-0a, batch-submit-0b, batch-status, batch-process
- 83 new tests (all passing)

Cost reduction: $2,525 → ~$600-700 with all optimizations combined.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 13:22:01 +01:00
Benjamin Admin
825e070ed9 feat(multi-layer): complete Multi-Layer Control Architecture (Phases 1-8 + Pass 0)
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 47s
CI/CD / test-python-backend-compliance (push) Successful in 33s
CI/CD / test-python-document-crawler (push) Successful in 24s
CI/CD / test-python-dsms-gateway (push) Successful in 18s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Has been skipped
Implements the full Multi-Layer Control Architecture for migrating ~25,000
Rich Controls into atomic, deduplicated Master Controls with full traceability.

Architecture: Legal Source → Obligation → Control Pattern → Master Control → Customer Instance

New services:
- ObligationExtractor: 3-tier extraction (exact → embedding → LLM)
- PatternMatcher: 2-tier matching (keyword + embedding + domain-bonus)
- ControlComposer: Pattern + Obligation → Master Control
- PipelineAdapter: Pipeline integration + Migration Passes 1-5
- DecompositionPass: Pass 0a/0b — Rich Control → atomic Controls
- CrosswalkRoutes: 15 API endpoints under /v1/canonical/

New DB schema:
- Migration 060: obligation_extractions, control_patterns, crosswalk_matrix
- Migration 061: obligation_candidates, parent_control_uuid tracking

Pattern Library: 50 YAML patterns (30 core + 20 IT-security)
Go SDK: Pattern loader with YAML validation and indexing
Documentation: MkDocs updated with full architecture overview

500 Python tests passing across all components.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 09:00:37 +01:00