Die atomic_controls_dedup Collection (51k Punkte) enthaelt nur atomare
Controls ohne source_citation. Jetzt wird der Parent-Control aufgeloest,
der die Rechtsgrundlage traegt. Deduplizierung nach Parent-UUID verhindert
mehrfache Eintraege fuer die gleiche Regulation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verhindert 'invalid transaction' Fehler wenn ein LLM-Call fehlschlaegt
und nachfolgende DB-Operationen blockiert.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
qwen3.5 gibt Antworten im 'thinking'-Feld statt 'response' zurueck.
Mit think:false wird der Thinking-Mode deaktiviert und die Antwort
korrekt im response-Feld geliefert.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ollama als eigener Enum-Wert neben self_hosted, damit die
docker-compose-Konfiguration (ollama) korrekt aufgeloest wird.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
POST /controls/backfill-rationale — ersetzt Placeholder "Aus Obligation
abgeleitet." durch LLM-generierte Begruendungen (Ollama/qwen3.5).
Optimierung: gruppiert ~86k Controls nach ~7k Parents, ein LLM-Call pro Parent.
Paginierung via batch_size/offset fuer kontrollierte Ausfuehrung.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DB-Constraint erlaubt nur must/should/may. 'can' gibt es nicht.
Alle Referenzen auf 'can' durch 'may' ersetzt.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Die Obligation kennt ihren Parent-Rich-Control direkt. Dessen
source_citation->>'source' gibt die Quell-Regulierung zuverlaessiger
als der Umweg ueber control_parent_links (M:N-Inflation).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
macOS ships mit bash 3, declare -A wird nicht unterstuetzt.
Ersetzt durch case-Funktion dir_to_service().
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: provenance endpoint (obligations, doc refs, merged duplicates,
regulations summary) + atomic-stats aggregation endpoint.
Frontend: ControlDetail mit Provenance-Sektionen, klickbare Navigation,
neue /sdk/atomic-controls Seite mit Stats-Bar und gefilterer Liste.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SQLAlchemy's text() parser doesn't properly handle :param::type
syntax — it fails to recognize :dd as a bind parameter when followed
by ::jsonb. Using CAST(:dd AS jsonb) instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SQLAlchemy sessions enter a failed state after SQL errors.
Without rollback(), all subsequent queries on the same session
fail with InFailedSqlTransaction. Added try/except with rollback
in _mark_duplicate, _mark_duplicate_to, _write_review, cross-group
pass, and the main phase1 loop.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All Pass 0b controls have pattern_id=NULL. Rewritten to:
- Phase 1: Group by merge_group_hint (action:object:trigger), 52k groups
- Phase 2: Cross-group embedding search for semantically similar masters
- Qdrant search uses unfiltered cross-regulation endpoint
- API param changed: pattern_id → hint_filter
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests were failing due to stale mock objects after schema extensions:
- DSFA: add _mapping property to _DictRow, use proper mock instead of MagicMock
- Company Profile: add 6 missing fields (project_id, offering_urls, etc.)
- Legal Templates/Policy: update document type count 52→58
- VVT: add 13 missing attributes to activity mock
- Legal Documents: align consent test assertions with production behavior
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows switching between Haiku 4.5 and Sonnet 4.6 for Pass 0b
without rebuilding the backend container.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Benchmark shows Haiku is 2.5x faster than Sonnet at 5x lower cost
for this JSON structuring task. Quality is equivalent.
$142 vs $705 for 75K obligations, ~2.8 days vs ~7 days.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previous formula (batch_size * 1500) exceeded Claude's 16K output limit
for batch_size > 10, causing API failures and Ollama fallback.
New formula: min(16384, max(4096, batch_size * 500))
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- _write_atomic_control() now uses RETURNING id and inserts into
control_parent_links (M:N) with source_regulation, source_article,
and obligation_candidate_id parsed from parent's source_citation
- New _parse_citation() helper for JSONB source_citation extraction
- New GET /controls/{id}/traceability endpoint returning full chain:
parent links with obligations, child controls, source_count
- Backend: control_type filter (atomic/rich) for controls + count
- Frontend: Rechtsgrundlagen section in ControlDetail showing all
parent links per source regulation with obligation text + strength
- Frontend: Atomic/Rich filter dropdown in Control Library list
- Frontend: GenerationStrategyBadge recognizes 'pass0b' strategy
- Tests: 3 new tests for parent_link creation + citation parsing,
existing batch test mock updated for RETURNING clause
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TestAllowedCollections was asserting bp_compliance_recht which was
removed from the handler whitelist. Updated test to match the actual
AllowedCollections map (added bp_compliance_gdpr, bp_dsfa_templates,
bp_dsfa_risks, bp_iace_libraries; removed bp_compliance_recht).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers NIST SP 800-160/30/82, SPDX 3.0, CVSS v4.0, SLSA v1.0,
CycloneDX 1.6, OpenTelemetry, EU Machinery Guide 2006/42/EC,
FDA Human Factors, and 5 GPAI documents (Scope Guidelines,
Communication, CoP Safety/Transparency/Copyright).
All documents include license metadata in regulation payloads.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add obligation refinement pipeline between Pass 0a and 0b:
- Merge pass: rule-based dedup of implementation-level duplicate obligations
within the same parent control (Jaccard similarity on action+object)
- Enrich pass: classify trigger_type (event/periodic/continuous) and detect
is_implementation_specific from obligation text (regex-based, no LLM)
- Pass 0b: skip merged obligations, cap severity for impl-specific, override
category to 'testing' for test obligations
- Migration 075: merged_into_id, trigger_type, is_implementation_specific
- Two new API endpoints: merge-obligations, enrich-obligations
- 30+ new tests (122 total, all passing)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move torch and sentence-transformers to requirements-reranker.txt so
the main Docker build succeeds even if these large packages fail to
install. The reranker code already handles missing imports gracefully
when RERANK_ENABLED=false (the default).
This fixes the production deployment — builds were failing because of
the ~800MB torch dependency, preventing ALL new code from deploying.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Crosswalk routes returning 404 on production. This adds a diagnostic
endpoint that reports which sub-routers failed to load and why.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 1 (LLM Quality):
- Add format=json to all Ollama payloads (obligation_extractor, control_generator, citation_backfill)
- Add Chain-of-Thought analysis steps to Pass 0a/0b system prompts
Phase 2 (Retrieval Quality):
- Hybrid search via Qdrant Query API with RRF fusion + automatic text index (legal_rag.go)
- Fallback to dense-only search if Query API unavailable
- Cross-encoder re-ranking with BGE Reranker v2 (RERANK_ENABLED=false by default)
- CPU-only PyTorch dependency to keep Docker image small
Phase 3 (Data Layer):
- Cross-regulation dedup pass (threshold 0.95) links controls across regulations
- DedupResult.link_type field distinguishes dedup_merge vs cross_regulation
- Chunk size defaults updated 512/50 → 1024/128 for new ingestions only
- Existing collections and controls are NOT affected
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v6.1.12 had an expired TrustedClientToken causing 403 on all Edge TTS
requests. v7.2.7 uses a valid token and same Communicate API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Edge TTS provides near-human quality voices (de-DE-ConradNeural, en-US-GuyNeural).
Falls back to Piper TTS when Edge TTS is unavailable (e.g. no internet).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Download en_US-lessac-high Piper model in Dockerfile
- Select TTS engine based on request language (de/en)
- Include language in cache key to avoid collisions
- List both voices in /voices endpoint
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Returns MP3 audio directly in response body (no MinIO upload)
- Disk cache (/tmp/tts-cache) avoids re-synthesis of identical text
- Used by pitch-deck presenter for real-time TTS playback
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- flow-data.ts: vendor-compliance moved from betrieb/seq:4200 to
dokumentation/seq:2500, prerequisite changed to vvt, added 5 DB tables
- architecture-data.ts: added vendor tables and API endpoints to
backend-compliance service definition
- StepHeader.tsx: added vendor-compliance explanation with 4 tips
(Art. 28, cross-module integration, third-country transfers, controls
library). Updated obligations (12 checks, vendor-link, document),
loeschfristen (vendor picker), tom (vendor-controls cross-ref),
vvt (processor tab from vendor API)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The HTML document builder was missing linked_vendor_ids in the detailed
obligation cards. Art. 28 obligations with linked vendors now display
them in the audit-ready PDF/HTML output.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
QA process, article types, match rates, preamble dedup rules,
and next steps documented in MkDocs under Entwicklung.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Preamble controls that duplicate article controls (same regulation,
Jaccard title similarity >= 0.40) are marked as duplicate.
Article controls always take priority.
Result: 6,183 active controls (was 6,373), 648 unique preamble controls remain.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
QA pipeline that matches control source_original_text directly against
original PDF documents to verify article/paragraph assignments. Covers
backfill, dedup, source normalization, Qdrant cleanup, and prod sync.
Key results (2026-03-20):
- 4,110/7,943 controls matched to PDF (100% for major EU regs)
- 3,366 article corrections, 705 new assignments
- 1,290 controls from Erwägungsgründe (preamble) identified
- 779 controls from Anhänge (annexes) identified
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
eu_2023_1542 (Batterieverordnung), eu_2023_988 (GPSR), nist_sp800_218,
nist_privacy_1_0, owasp_mobile_top10 were defaulting to Rule 3 (restricted)
instead of their correct rules. This caused 68/71 controls to be flagged
as too_close in the last pipeline run.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>