breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	f960bd052a	fix: Add missing 'import re' to agent_scan_routes.py NameError: name 're' is not defined at line 146 — the import was accidentally removed when extracting helper functions to agent_scan_helpers.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 22:59:53 +02:00
Benjamin Admin	48146cddaf	feat: DSI document discovery + completeness check in agent scan workflow Agent scan now automatically: 1. Discovers all legal documents via consent-tester /dsi-discovery endpoint 2. Classifies each as DSE/AGB/Widerruf/Cookie/Impressum 3. Checks completeness against type-specific checklists: - DSE: 9 Art. 13 DSGVO mandatory fields (controller, DPO, purposes, legal basis, recipients, third-country, retention, rights, complaint) - AGB: §305ff BGB (scope, contract formation, liability, jurisdiction) - Widerruf: §355 BGB (right info, 14-day deadline, form, consequences) 4. Adds findings per document to scan results 5. Shows discovered documents with completeness % in email summary 6. Returns discovered_documents list in API response New files: - dsi_document_checker.py (229 LOC) — checklists + classifier - agent_scan_helpers.py (109 LOC) — extracted summary builder + corrections Refactor: agent_scan_routes.py 537→448 LOC (under 500 budget) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 22:10:13 +02:00
Benjamin Admin	b06a33a5fe	fix: syntax error — missing closing paren in scan summary builder	2026-04-28 17:41:11 +02:00
Benjamin Admin	6c0e76f96d	feat: show scanned pages in email summary + frontend (expandable list) Email now lists all scanned URLs with checkmark/cross status. Frontend shows collapsible "X Seiten gescannt — Details anzeigen". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 17:26:03 +02:00
Benjamin Admin	0106f3b5b6	fix: use Ollama directly for correction generation (bypass SDK think-mode) SDK LLM chat returns empty content due to Qwen think-mode. Direct Ollama /api/generate call with stream:false gets the full response including think tags which we strip. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 16:30:51 +02:00
Benjamin Admin	b175ad2594	fix: increase LLM timeouts for scan corrections (90s) and DSE extraction (120s) Qwen 3.5:35b needs ~30-60s per call. Multi-call scan was timing out. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 16:05:35 +02:00
Benjamin Admin	711b9b3146	feat: website scanner with SOLL/IST service comparison + corrections - website_scanner.py: multi-page crawl, 20+ service patterns (tracking, CDN, chatbots, payment, fonts, captcha, video), AI text detection - dse_service_extractor.py: LLM extracts services from privacy policy text - agent_scan_routes.py: POST /agent/scan — combines scan + DSE comparison, generates findings (undocumented, outdated, third-country transfer), auto-corrections via Qwen in pre-launch mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 15:35:31 +02:00

7 Commits