feat: RAG-based document verification against 144K Control Library
New module: rag_document_checker.py
- Searches RAG (Qdrant) for controls relevant to document type
- Filters by regulation (DSGVO Art.13, TDDDG §25, BGB §355 etc.)
- LLM (Qwen 3.5:35b) verifies each control against document text
- Returns fulfilled/missing with evidence text + severity
- Supports: DSI, Cookie, Impressum, Widerruf, AGB, DSFA, AVV, Loeschkonzept
Integration in doc-check endpoint:
- Regex checklist runs first (fast, deterministic)
- RAG checks run after (semantic, catches what regex misses)
- Both results combined in single response
LLM prompt returns JSON: {fulfilled, evidence, issue, severity}
Think-tags stripped, JSON extracted from response.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -198,6 +198,24 @@ async def _check_single_document(entry: DocCheckEntry) -> list[DocCheckResult]:
|
||||
|
||||
# Main document check (full text against primary type)
|
||||
main_result = _run_checklist(doc_text, entry.doc_type, entry.label, entry.url, word_count)
|
||||
|
||||
# RAG-based deep check (semantic verification against Control Library)
|
||||
try:
|
||||
from compliance.services.rag_document_checker import check_document_with_rag
|
||||
rag_checks = await check_document_with_rag(
|
||||
doc_text, entry.doc_type, entry.label, entry.url,
|
||||
)
|
||||
if rag_checks:
|
||||
for rc in rag_checks:
|
||||
main_result.checks.append(CheckItem(
|
||||
id=rc["id"], label=rc["label"], passed=rc["passed"],
|
||||
severity=rc["severity"], matched_text=rc.get("matched_text", ""),
|
||||
))
|
||||
if not rc["passed"]:
|
||||
main_result.findings_count += 1
|
||||
except Exception as e:
|
||||
logger.warning("RAG check failed for %s: %s", entry.label, e)
|
||||
|
||||
all_results.append(main_result)
|
||||
|
||||
# Sub-section checks (auto-detected from headings)
|
||||
|
||||
Reference in New Issue
Block a user