1. Dockerfile: install Playwright AS appuser (not root) so chromium
binary is accessible at runtime. Was causing 500 error.
2. DSE service matching: text-search fallback when LLM extraction fails.
If "etracker" appears in DSE text, mark as documented even without
LLM parsing the service list.
3. CMP skip: consent managers in category "cmp" skipped (not just "other"
with id "cmp").
NOT DEPLOYED — RAG pipeline is running.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New /website-scan endpoint in consent-tester service:
- Real browser renders JavaScript (finds dynamic content)
- Clicks navigation menus (discovers hidden sub-pages like IHK DSB page)
- Follows links within DSE to find regional privacy policies
- Collects rendered HTML for each page (after JS execution)
Backend integration:
- agent_scan_routes tries Playwright first, falls back to httpx
- DSE text and HTML extracted from Playwright-rendered pages
- Service detection runs on rendered HTML (catches JS-loaded scripts)
Also fixes:
- GA regex: G-[A-Z0-9]{8,12} prevents CSS class false positives
- etracker added to service registry
- External page scanning blocked (same-domain only)
- CSS/JS/image files excluded from page list
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- dse_parser.py: HTML → structured sections (heading, number, content, parent)
Uses heading hierarchy (h1-h4) with regex fallback
- dse_matcher.py: matches detected services against DSE sections
Exact name → provider → category matching with insertion point suggestion
- agent_scan_routes: TextReference model in findings (original text,
section, paragraph, correction type, insert_after)
Enables showing: "Google Analytics not found in DSE, insert after
Section 2.4 Cookies und Tracking"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Email now lists all scanned URLs with checkmark/cross status.
Frontend shows collapsible "X Seiten gescannt — Details anzeigen".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SDK LLM chat returns empty content due to Qwen think-mode. Direct Ollama
/api/generate call with stream:false gets the full response including
think tags which we strip.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>