breakpilot-compliance

Benjamin_Boenisch/breakpilot-compliance

Fork 0

Commit Graph

Author	SHA1	Message	Date
Benjamin Admin	c52dbdb8f1	feat(rag): optimize RAG pipeline — JSON-Mode, CoT, Hybrid Search, Re-Ranking, Cross-Reg Dedup, chunk 1024 Some checks failed CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 42s Details CI/CD / test-python-backend-compliance (push) Successful in 1m38s Details CI/CD / test-python-document-crawler (push) Successful in 20s Details CI/CD / test-python-dsms-gateway (push) Successful in 17s Details CI/CD / validate-canonical-controls (push) Successful in 10s Details CI/CD / Deploy (push) Has been skipped Details Phase 1 (LLM Quality): - Add format=json to all Ollama payloads (obligation_extractor, control_generator, citation_backfill) - Add Chain-of-Thought analysis steps to Pass 0a/0b system prompts Phase 2 (Retrieval Quality): - Hybrid search via Qdrant Query API with RRF fusion + automatic text index (legal_rag.go) - Fallback to dense-only search if Query API unavailable - Cross-encoder re-ranking with BGE Reranker v2 (RERANK_ENABLED=false by default) - CPU-only PyTorch dependency to keep Docker image small Phase 3 (Data Layer): - Cross-regulation dedup pass (threshold 0.95) links controls across regulations - DedupResult.link_type field distinguishes dedup_merge vs cross_regulation - Chunk size defaults updated 512/50 → 1024/128 for new ingestions only - Existing collections and controls are NOT affected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-21 11:49:43 +01:00
Benjamin Admin	dd09fa7a46	feat: CRA wiki, cybersecurity policy template, Phase H RAG ingestion All checks were successful CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 35s Details CI/CD / test-python-backend-compliance (push) Successful in 33s Details CI/CD / test-python-document-crawler (push) Successful in 22s Details CI/CD / test-python-dsms-gateway (push) Successful in 19s Details CI/CD / validate-canonical-controls (push) Successful in 12s Details CI/CD / Deploy (push) Successful in 2s Details - Wiki: add CRA category with 3 articles (Grundlagen, 35 Security Controls, CRA+NIS2+AI Act Framework) - Document Generator: add CRA-konforme Cybersecurity Policy template with 21 sections covering governance, SSDLC, vulnerability management, incident response (24h/72h), SBOM, patch management - RAG: ingest Phase H — 17 EU regulations + 2 NIST frameworks now in Qdrant (CRA, AI Act, NIS2, DSGVO, DMA, GPSR, Batterieverordnung, etc.) - Phase H script: add scripts/ingest-phase-h.sh for reproducible ingestion - rag-sources.md: update status to ingestiert, add CRA entry Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 00:43:46 +01:00

Author

SHA1

Message

Date

Benjamin Admin

c52dbdb8f1

feat(rag): optimize RAG pipeline — JSON-Mode, CoT, Hybrid Search, Re-Ranking, Cross-Reg Dedup, chunk 1024

CI/CD / go-lint (push) Has been skipped

Details

CI/CD / python-lint (push) Has been skipped

Details

CI/CD / nodejs-lint (push) Has been skipped

Details

CI/CD / test-go-ai-compliance (push) Failing after 42s

Details

CI/CD / test-python-backend-compliance (push) Successful in 1m38s

Details

CI/CD / test-python-document-crawler (push) Successful in 20s

Details

CI/CD / test-python-dsms-gateway (push) Successful in 17s

Details

CI/CD / validate-canonical-controls (push) Successful in 10s

Details

CI/CD / Deploy (push) Has been skipped

Details

Phase 1 (LLM Quality):
- Add format=json to all Ollama payloads (obligation_extractor, control_generator, citation_backfill)
- Add Chain-of-Thought analysis steps to Pass 0a/0b system prompts

Phase 2 (Retrieval Quality):
- Hybrid search via Qdrant Query API with RRF fusion + automatic text index (legal_rag.go)
- Fallback to dense-only search if Query API unavailable
- Cross-encoder re-ranking with BGE Reranker v2 (RERANK_ENABLED=false by default)
- CPU-only PyTorch dependency to keep Docker image small

Phase 3 (Data Layer):
- Cross-regulation dedup pass (threshold 0.95) links controls across regulations
- DedupResult.link_type field distinguishes dedup_merge vs cross_regulation
- Chunk size defaults updated 512/50 → 1024/128 for new ingestions only
- Existing collections and controls are NOT affected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-21 11:49:43 +01:00

Benjamin Admin

dd09fa7a46

feat: CRA wiki, cybersecurity policy template, Phase H RAG ingestion

CI/CD / go-lint (push) Has been skipped

Details

CI/CD / python-lint (push) Has been skipped

Details

CI/CD / nodejs-lint (push) Has been skipped

Details

CI/CD / test-go-ai-compliance (push) Successful in 35s

Details

CI/CD / test-python-backend-compliance (push) Successful in 33s

Details

CI/CD / test-python-document-crawler (push) Successful in 22s

Details

CI/CD / test-python-dsms-gateway (push) Successful in 19s

Details

CI/CD / validate-canonical-controls (push) Successful in 12s

Details

CI/CD / Deploy (push) Successful in 2s

Details

- Wiki: add CRA category with 3 articles (Grundlagen, 35 Security Controls,
  CRA+NIS2+AI Act Framework)
- Document Generator: add CRA-konforme Cybersecurity Policy template with
  21 sections covering governance, SSDLC, vulnerability management,
  incident response (24h/72h), SBOM, patch management
- RAG: ingest Phase H — 17 EU regulations + 2 NIST frameworks now in Qdrant
  (CRA, AI Act, NIS2, DSGVO, DMA, GPSR, Batterieverordnung, etc.)
- Phase H script: add scripts/ingest-phase-h.sh for reproducible ingestion
- rag-sources.md: update status to ingestiert, add CRA entry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-15 00:43:46 +01:00

2 Commits