Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 42s
CI/CD / test-python-backend-compliance (push) Successful in 1m38s
CI/CD / test-python-document-crawler (push) Successful in 20s
CI/CD / test-python-dsms-gateway (push) Successful in 17s
CI/CD / validate-canonical-controls (push) Successful in 10s
CI/CD / Deploy (push) Has been skipped
Phase 1 (LLM Quality): - Add format=json to all Ollama payloads (obligation_extractor, control_generator, citation_backfill) - Add Chain-of-Thought analysis steps to Pass 0a/0b system prompts Phase 2 (Retrieval Quality): - Hybrid search via Qdrant Query API with RRF fusion + automatic text index (legal_rag.go) - Fallback to dense-only search if Query API unavailable - Cross-encoder re-ranking with BGE Reranker v2 (RERANK_ENABLED=false by default) - CPU-only PyTorch dependency to keep Docker image small Phase 3 (Data Layer): - Cross-regulation dedup pass (threshold 0.95) links controls across regulations - DedupResult.link_type field distinguishes dedup_merge vs cross_regulation - Chunk size defaults updated 512/50 → 1024/128 for new ingestions only - Existing collections and controls are NOT affected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
38 lines
940 B
Python
38 lines
940 B
Python
"""
|
|
Configuration for RAG Service
|
|
"""
|
|
|
|
from pydantic_settings import BaseSettings
|
|
from typing import Optional
|
|
|
|
|
|
class Settings(BaseSettings):
|
|
"""Application settings loaded from environment variables."""
|
|
|
|
# Service
|
|
environment: str = "development"
|
|
port: int = 8082
|
|
|
|
# Qdrant
|
|
qdrant_url: str = "http://localhost:6333"
|
|
qdrant_collection: str = "legal_documents"
|
|
|
|
# Ollama
|
|
ollama_url: str = "http://localhost:11434"
|
|
embedding_model: str = "bge-m3"
|
|
llm_model: str = "qwen2.5:32b"
|
|
|
|
# Document Processing
|
|
# NOTE: Changed from 512/50 to 1024/128 for improved retrieval quality.
|
|
# Existing collections (ingested with 512/50) are NOT affected —
|
|
# new settings apply only to new ingestions.
|
|
chunk_size: int = 1024
|
|
chunk_overlap: int = 128
|
|
|
|
# Legal Corpus
|
|
corpus_path: str = "./legal-corpus"
|
|
|
|
class Config:
|
|
env_file = ".env"
|
|
env_file_encoding = "utf-8"
|