Phase 1 (LLM Quality):
- Add format=json to all Ollama payloads (obligation_extractor, control_generator, citation_backfill)
- Add Chain-of-Thought analysis steps to Pass 0a/0b system prompts
Phase 2 (Retrieval Quality):
- Hybrid search via Qdrant Query API with RRF fusion + automatic text index (legal_rag.go)
- Fallback to dense-only search if Query API unavailable
- Cross-encoder re-ranking with BGE Reranker v2 (RERANK_ENABLED=false by default)
- CPU-only PyTorch dependency to keep Docker image small
Phase 3 (Data Layer):
- Cross-regulation dedup pass (threshold 0.95) links controls across regulations
- DedupResult.link_type field distinguishes dedup_merge vs cross_regulation
- Chunk size defaults updated 512/50 → 1024/128 for new ingestions only
- Existing collections and controls are NOT affected
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update ENISA download URLs to new site structure (publications → sites/default/files)
- Increase curl max-time from 300s to 600s for IFRS PDFs (7.5-8.2MB)
- Update ENISA Secure by Design metadata (title changed)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Ingestion script: Add 3 new PDFs (IFRS DE/EN, EFRAG Endorsement Status)
to ingest-industry-compliance.sh (7 → 10 documents total)
- System prompt: Add EU-IFRS and EFRAG to competence area, add mandatory
IFRS endorsement warning section for all IFRS/IAS queries
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>