Benjamin Admin
f960bd052a
fix: Add missing 'import re' to agent_scan_routes.py
...
NameError: name 're' is not defined at line 146 — the import was
accidentally removed when extracting helper functions to agent_scan_helpers.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-04 22:59:53 +02:00
Benjamin Admin
48146cddaf
feat: DSI document discovery + completeness check in agent scan workflow
...
Agent scan now automatically:
1. Discovers all legal documents via consent-tester /dsi-discovery endpoint
2. Classifies each as DSE/AGB/Widerruf/Cookie/Impressum
3. Checks completeness against type-specific checklists:
- DSE: 9 Art. 13 DSGVO mandatory fields (controller, DPO, purposes,
legal basis, recipients, third-country, retention, rights, complaint)
- AGB: §305ff BGB (scope, contract formation, liability, jurisdiction)
- Widerruf: §355 BGB (right info, 14-day deadline, form, consequences)
4. Adds findings per document to scan results
5. Shows discovered documents with completeness % in email summary
6. Returns discovered_documents list in API response
New files:
- dsi_document_checker.py (229 LOC) — checklists + classifier
- agent_scan_helpers.py (109 LOC) — extracted summary builder + corrections
Refactor: agent_scan_routes.py 537→448 LOC (under 500 budget)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-04 22:10:13 +02:00
Benjamin Admin
b06a33a5fe
fix: syntax error — missing closing paren in scan summary builder
2026-04-28 17:41:11 +02:00
Benjamin Admin
6c0e76f96d
feat: show scanned pages in email summary + frontend (expandable list)
...
Email now lists all scanned URLs with checkmark/cross status.
Frontend shows collapsible "X Seiten gescannt — Details anzeigen".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-28 17:26:03 +02:00
Benjamin Admin
0106f3b5b6
fix: use Ollama directly for correction generation (bypass SDK think-mode)
...
SDK LLM chat returns empty content due to Qwen think-mode. Direct Ollama
/api/generate call with stream:false gets the full response including
think tags which we strip.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-28 16:30:51 +02:00
Benjamin Admin
b175ad2594
fix: increase LLM timeouts for scan corrections (90s) and DSE extraction (120s)
...
Qwen 3.5:35b needs ~30-60s per call. Multi-call scan was timing out.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-28 16:05:35 +02:00
Benjamin Admin
711b9b3146
feat: website scanner with SOLL/IST service comparison + corrections
...
- website_scanner.py: multi-page crawl, 20+ service patterns (tracking,
CDN, chatbots, payment, fonts, captcha, video), AI text detection
- dse_service_extractor.py: LLM extracts services from privacy policy text
- agent_scan_routes.py: POST /agent/scan — combines scan + DSE comparison,
generates findings (undocumented, outdated, third-country transfer),
auto-corrections via Qwen in pre-launch mode
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-28 15:35:31 +02:00