2 Commits

Author SHA1 Message Date
Benjamin Admin
c52dbdb8f1 feat(rag): optimize RAG pipeline — JSON-Mode, CoT, Hybrid Search, Re-Ranking, Cross-Reg Dedup, chunk 1024
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 42s
CI/CD / test-python-backend-compliance (push) Successful in 1m38s
CI/CD / test-python-document-crawler (push) Successful in 20s
CI/CD / test-python-dsms-gateway (push) Successful in 17s
CI/CD / validate-canonical-controls (push) Successful in 10s
CI/CD / Deploy (push) Has been skipped
Phase 1 (LLM Quality):
- Add format=json to all Ollama payloads (obligation_extractor, control_generator, citation_backfill)
- Add Chain-of-Thought analysis steps to Pass 0a/0b system prompts

Phase 2 (Retrieval Quality):
- Hybrid search via Qdrant Query API with RRF fusion + automatic text index (legal_rag.go)
- Fallback to dense-only search if Query API unavailable
- Cross-encoder re-ranking with BGE Reranker v2 (RERANK_ENABLED=false by default)
- CPU-only PyTorch dependency to keep Docker image small

Phase 3 (Data Layer):
- Cross-regulation dedup pass (threshold 0.95) links controls across regulations
- DedupResult.link_type field distinguishes dedup_merge vs cross_regulation
- Chunk size defaults updated 512/50 → 1024/128 for new ingestions only
- Existing collections and controls are NOT affected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 11:49:43 +01:00
Benjamin Admin
9c1355c05f feat(iace): Phase 5+6 — frontend integration, RAG library search, comprehensive tests
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 34s
CI/CD / test-python-backend-compliance (push) Successful in 33s
CI/CD / test-python-document-crawler (push) Successful in 23s
CI/CD / test-python-dsms-gateway (push) Successful in 19s
CI/CD / validate-canonical-controls (push) Successful in 13s
CI/CD / Deploy (push) Successful in 2s
Phase 5 — Frontend Integration:
- components/page.tsx: ComponentLibraryModal with 120 components + 20 energy sources
- hazards/page.tsx: AutoSuggestPanel with 3-column pattern matching review
- mitigations/page.tsx: SuggestMeasuresModal per hazard with 3-level grouping
- verification/page.tsx: SuggestEvidenceModal per mitigation with evidence types

Phase 6 — RAG Library Search:
- Added bp_iace_libraries to AllowedCollections whitelist in rag_handlers.go
- SearchLibrary endpoint: POST /iace/library-search (semantic search across libraries)
- EnrichTechFileSection endpoint: POST /projects/:id/tech-file/:section/enrich
- Created ingest-iace-libraries.sh ingestion script for Qdrant collection

Tests (123 passing):
- tag_taxonomy_test.go: 8 tests for taxonomy entries, domains, essential tags
- controls_library_test.go: 7 tests for measures, reduction types, subtypes
- integration_test.go: 7 integration tests for full match flow and library consistency
- Extended tag_resolver_test.go: 9 new tests for FindByTags and cross-category resolution

Documentation:
- Updated iace.md with Hazard-Matching-Engine, RAG enrichment, and new DB tables

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 10:22:49 +01:00