feat: Document-centric scan results + DSI deduplication
DSI Dedup (consent-tester): - Only H1/H2 headings count as documents (not H3/H4 sub-sections) - Sub-sections (Cookies, Betroffenenrechte, Social Media) are part of parent document's full text, not separate documents - Reduces IHK result from 30 to ~11 real documents Backend (agent_scan_routes): - ScanFinding gets doc_title field linking each finding to its document - doc_title set when creating DSI findings for document attribution Frontend (ScanResult.tsx): - 3 sections: Services table, Document cards, General findings - Documents: expandable cards with completeness bar (green/yellow/red) - Findings grouped under their parent document - Each card shows: title, word count, findings count, % completeness - Findings without doc_title go to "Allgemeine Findings" section Email Summary (agent_scan_helpers): - Findings listed under their parent document - General findings in separate section - No more flat mixed list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -79,6 +79,7 @@ class ScanFinding(BaseModel):
|
||||
severity: str
|
||||
text: str
|
||||
correction: str = ""
|
||||
doc_title: str = ""
|
||||
text_reference: TextReferenceModel | None = None
|
||||
|
||||
|
||||
@@ -264,6 +265,7 @@ async def _execute_scan(req: ScanRequest, scan_id: str = "") -> ScanResponse:
|
||||
if "SCORE" not in df.get("code", ""):
|
||||
dsi_findings.append(ScanFinding(
|
||||
code=df["code"], severity=df["severity"], text=df["text"],
|
||||
doc_title=doc["title"],
|
||||
))
|
||||
except Exception as e:
|
||||
logger.warning("DSI discovery failed: %s %s", type(e).__name__, e)
|
||||
|
||||
Reference in New Issue
Block a user