The NOT EXISTS check and Duplicate Guard now exclude deprecated and
duplicate controls, enabling clean re-runs after invalidation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Truncate object keys to 40 chars (was 80) at underscore boundary
- Strip German qualifying prepositional phrases (bei/für/gemäß/von/zur/...)
- Add 65 new synonym mappings for near-duplicate patterns found in analysis
- Strip trailing noise tokens (articles/prepositions)
- Add _truncate_at_boundary() helper and _QUALIFYING_PHRASE_RE regex
- 11 new tests for normalization improvements (227 total pass)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Duplicate Guard: merge_hint-Lookup vor INSERT in _write_atomic_control()
verhindert semantisch identische Controls unter demselben Parent.
2. Severity-Kalibrierung: action_type-basiert statt blind vom Parent.
define/review/test → max medium, implement/monitor → max high.
3. Title-Truncation: Schnitt am Wortende statt mitten im Wort.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Benchmark shows Haiku is 2.5x faster than Sonnet at 5x lower cost
for this JSON structuring task. Quality is equivalent.
$142 vs $705 for 75K obligations, ~2.8 days vs ~7 days.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previous formula (batch_size * 1500) exceeded Claude's 16K output limit
for batch_size > 10, causing API failures and Ollama fallback.
New formula: min(16384, max(4096, batch_size * 500))
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- _write_atomic_control() now uses RETURNING id and inserts into
control_parent_links (M:N) with source_regulation, source_article,
and obligation_candidate_id parsed from parent's source_citation
- New _parse_citation() helper for JSONB source_citation extraction
- New GET /controls/{id}/traceability endpoint returning full chain:
parent links with obligations, child controls, source_count
- Backend: control_type filter (atomic/rich) for controls + count
- Frontend: Rechtsgrundlagen section in ControlDetail showing all
parent links per source regulation with obligation text + strength
- Frontend: Atomic/Rich filter dropdown in Control Library list
- Frontend: GenerationStrategyBadge recognizes 'pass0b' strategy
- Tests: 3 new tests for parent_link creation + citation parsing,
existing batch test mock updated for RETURNING clause
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add obligation refinement pipeline between Pass 0a and 0b:
- Merge pass: rule-based dedup of implementation-level duplicate obligations
within the same parent control (Jaccard similarity on action+object)
- Enrich pass: classify trigger_type (event/periodic/continuous) and detect
is_implementation_specific from obligation text (regex-based, no LLM)
- Pass 0b: skip merged obligations, cap severity for impl-specific, override
category to 'testing' for test obligations
- Migration 075: merged_into_id, trigger_type, is_implementation_specific
- Two new API endpoints: merge-obligations, enrich-obligations
- 30+ new tests (122 total, all passing)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 1 (LLM Quality):
- Add format=json to all Ollama payloads (obligation_extractor, control_generator, citation_backfill)
- Add Chain-of-Thought analysis steps to Pass 0a/0b system prompts
Phase 2 (Retrieval Quality):
- Hybrid search via Qdrant Query API with RRF fusion + automatic text index (legal_rag.go)
- Fallback to dense-only search if Query API unavailable
- Cross-encoder re-ranking with BGE Reranker v2 (RERANK_ENABLED=false by default)
- CPU-only PyTorch dependency to keep Docker image small
Phase 3 (Data Layer):
- Cross-regulation dedup pass (threshold 0.95) links controls across regulations
- DedupResult.link_type field distinguishes dedup_merge vs cross_regulation
- Chunk size defaults updated 512/50 → 1024/128 for new ingestions only
- Existing collections and controls are NOT affected
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Anthropic API support to decomposition Pass 0a/0b (prompt caching, content batching)
- Add Anthropic Batch API (50% cost reduction, async 24h processing)
- Add source_filter (ILIKE on source_citation) for regulation-based filtering
- Add category_filter to Pass 0a for selective decomposition
- Add regulation_filter to control_generator for RAG scan phase filtering
(prefix match on regulation_code — enables CE + Code Review focus)
- New API endpoints: batch-submit-0a, batch-submit-0b, batch-status, batch-process
- 83 new tests (all passing)
Cost reduction: $2,525 → ~$600-700 with all optimizations combined.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>