Two fixes:
1. consent-tester: full_text truncation raised from 10,000 to 50,000 chars
(IHK Internetangebot has ~50K chars, Beschwerderecht was after 10K cutoff)
2. Backend: dse_text now combines Playwright HTML + ALL DSI discovery texts
for mandatory content checking. Previously only used first 8K chars from
one source, missing Verantwortlicher/DSB that were in DSI documents.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: When all 9 Art. 13 checks passed (100%), no SCORE finding
was created (line: 'if pct < 100'). The backend then defaulted to
completeness=0 because it looked for the SCORE finding to extract the %.
Fix: Always generate SCORE finding, even at 100%. Added 'OK' severity
for fully compliant documents.
This was the cause of 8 documents showing '0% MANGELHAFT' despite
containing all required information.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug 1 fix: When merging documents with identical word_count, prefer
titles starting with 'Datenschutzinformation' over generic section
headings like 'Zweck und Rechtsgrundlage'. This restores the main
'Datenschutzinformationen zum Internetangebot' document.
Bug 2 fix: After navigating to a document page, wait 3s (was 2s) for
JS content loading, then try 10+ content selectors before falling back
to body text (with nav/header/footer removed). Handles IHK-style JS
navigation where content loads after page.goto() completes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- localStorage-based scan history (persists across sessions)
- Each completed scan adds entry: URL, timestamp, findings count, docs count
- 'Letzte Scans' section below results shows clickable history entries
- Click loads URL into form (and shows cached result if same URL)
- Max 20 entries, deduplicates by URL (latest scan wins)
- History visible in 'Website-Scan' tab
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- URL, mode, tab, scan result persisted in localStorage
- Active scan_id stored — polling resumes when returning to page
- Scan results survive navigation to other SDK modules
- 'Scan laeuft noch...' shown when returning to in-progress scan
- Cleans up localStorage when scan completes or fails
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DSI Dedup (consent-tester):
- Only H1/H2 headings count as documents (not H3/H4 sub-sections)
- Sub-sections (Cookies, Betroffenenrechte, Social Media) are part of
parent document's full text, not separate documents
- Reduces IHK result from 30 to ~11 real documents
Backend (agent_scan_routes):
- ScanFinding gets doc_title field linking each finding to its document
- doc_title set when creating DSI findings for document attribution
Frontend (ScanResult.tsx):
- 3 sections: Services table, Document cards, General findings
- Documents: expandable cards with completeness bar (green/yellow/red)
- Findings grouped under their parent document
- Each card shows: title, word count, findings count, % completeness
- Findings without doc_title go to "Allgemeine Findings" section
Email Summary (agent_scan_helpers):
- Findings listed under their parent document
- General findings in separate section
- No more flat mixed list
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- scanProgress state tracks live progress (not mixed into scanData)
- ScanResult only renders when scanData.services exists (prevents crash)
- Purple progress bar with spinner shows current step during scan
- Fixes: TypeError 's.services.filter' when progress data set as scanData
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CE-Risikobeurteilung Datenerfassung mit 3 wählbaren Eingabe-Modi:
1. Interview-Modus (Chat-artig): Fragen werden nacheinander gestellt
wie im Kundengespräch. Antwort-Historie sichtbar.
2. Wizard-Modus: Schritt-für-Schritt durch 8 Sektionen.
3. Formular-Modus: Alle Sektionen als Accordion auf einer Seite.
20 strukturierte Fragen in 8 Abschnitten:
- Maschinenbeschreibung (Name, Typ, Baugruppen)
- Lebensphasen (Betrieb, Einrichten, Wartung)
- Bestimmungsgemäße Verwendung
- Vorhersehbare Fehlanwendung
- Qualifikation der Benutzer
- Räumliche/Zeitliche Grenzen
- Technische Daten (Kräfte, Spannungen, Temperaturen, Drehzahlen)
- Umgebungsbedingungen
answersToNarrativeText() konvertiert alle Antworten in den Freitext
der an POST /parse-narrative gesendet wird.
Ergebnis-Panel zeigt: Komponenten, Gefahren, Patterns, Energiequellen.
URL: /sdk/iace/[projectId]/interview
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fundamental fix: scans now run asynchronously with progress polling.
Backend:
- POST /scan starts background task, returns scan_id immediately
- GET /scan/{scan_id} returns status + progress + result when done
- 7 progress steps shown: Website scan, DSI discovery, DSE analysis,
SOLL/IST comparison, corrections, report, email
- In-memory job store (dict with scan_id → status/result)
- No timeout limits on scan duration
Frontend:
- POST starts scan, receives scan_id
- Polls GET every 5 seconds (max 120 attempts = 10 min)
- Shows live progress message during scan
- Displays result when completed, error when failed
Proxy:
- POST timeout reduced to 30s (just starts the job)
- GET timeout 10s (just status check)
- No more 504/connection-dropped errors
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
banner_detector.py, script_analyzer.py, category_tester.py, authenticated_scanner.py
were only on the feature branch — needed for consent-tester to start.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug 1: max_pages was hardcoded to 15 in backend call — raised to 50
Bug 2: DSI documents checked against text_preview (500 chars) — now uses
full_text (10,000 chars) for Art. 13 mandatory field checks
Bug 3: DSE text not found when Playwright misses DSE page — now falls
back to DSI Discovery full_text as second source
Bug 4: Backend timeout 120s too short for 50 pages — raised to 300s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These files existed on the feature branch but were never cherry-picked
to main, causing ModuleNotFoundError on import:
- dse_parser.py — parses DSE HTML into structured sections
- dse_matcher.py — matches detected services against DSE sections
- mandatory_content_checker.py — checks Art. 13 DSGVO mandatory fields
- legal_basis_validator.py — validates legal basis (lit. a-f)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These files existed on the feature branch but were never cherry-picked
to main, causing ModuleNotFoundError on import:
- dse_parser.py — parses DSE HTML into structured sections
- dse_matcher.py — matches detected services against DSE sections
- mandatory_content_checker.py — checks Art. 13 DSGVO mandatory fields
- legal_basis_validator.py — validates legal basis (lit. a-f)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NameError: name 're' is not defined at line 146 — the import was
accidentally removed when extracting helper functions to agent_scan_helpers.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NameError: name 're' is not defined at line 146 — the import was
accidentally removed when extracting helper functions to agent_scan_helpers.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both scanners now search until done, not until a counter runs out:
playwright_scanner.py:
- Default max_pages raised from 15 to 50
- Added 3-minute timeout as safety net
- Recursive link discovery on EVERY visited page (not just DSE pages)
- Stops when: all links visited OR max_pages OR timeout
dsi_discovery.py:
- Default max_documents raised from 30 to 100
- Added 5-minute timeout as safety net
- Recursive: on each visited page, searches for MORE DSI links
- Processes ALL discovered links exhaustively
- Stops when: no more pending links OR max_documents OR timeout
The scanners now behave like a real user: they follow every relevant
link they find, and on each new page they look for more links.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both scanners now search until done, not until a counter runs out:
playwright_scanner.py:
- Default max_pages raised from 15 to 50
- Added 3-minute timeout as safety net
- Recursive link discovery on EVERY visited page (not just DSE pages)
- Stops when: all links visited OR max_pages OR timeout
dsi_discovery.py:
- Default max_documents raised from 30 to 100
- Added 5-minute timeout as safety net
- Recursive: on each visited page, searches for MORE DSI links
- Processes ALL discovered links exhaustively
- Stops when: no more pending links OR max_documents OR timeout
The scanners now behave like a real user: they follow every relevant
link they find, and on each new page they look for more links.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New checks (from EUIPO reference case):
- Check 9: Third-party DSE link — detects when consent dialog links to
external domain's privacy policy instead of own DSE (Art. 13 DSGVO)
- Check 10: Dark-pattern language — detects "muessen/erforderlich" for
non-essential cookies suggesting false technical necessity (EDPB Rn. 70)
- Check 11: Non-modal dismiss = consent — detects when clicking outside
dialog closes it (possibly treating as consent, Planet49 violation)
Refactor: extracted _check_banner_text (375 LOC) from consent_scanner.py
into services/banner_text_checker.py to keep both files under 500 LOC.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New checks (from EUIPO reference case):
- Check 9: Third-party DSE link — detects when consent dialog links to
external domain's privacy policy instead of own DSE (Art. 13 DSGVO)
- Check 10: Dark-pattern language — detects "muessen/erforderlich" for
non-essential cookies suggesting false technical necessity (EDPB Rn. 70)
- Check 11: Non-modal dismiss = consent — detects when clicking outside
dialog closes it (possibly treating as consent, Planet49 violation)
Refactor: extracted _check_banner_text (375 LOC) from consent_scanner.py
into services/banner_text_checker.py to keep both files under 500 LOC.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- X button to close banner (SDK admin context only)
- Overlay leaves sidebar area accessible (ml-16/ml-64)
- Click overlay backdrop to dismiss
- Preview page: close banner on API error (don't trap user)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- X button to close banner (SDK admin context only)
- Overlay leaves sidebar area accessible (ml-16/ml-64)
- Click overlay backdrop to dismiss
- Preview page: close banner on API error (don't trap user)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Browser blocks direct calls to backend-compliance:8093 due to
self-signed SSL certificate. All banner API calls now go through
Next.js API proxy at /api/sdk/v1/banner/* which runs server-side.
- New catch-all proxy: /api/sdk/v1/banner/[[...path]]/route.ts
Maps to backend-compliance:8002/api/compliance/banner/*
- Preview page: uses /api/sdk/v1/banner/ instead of https://macmini:8093
- CMP Dashboard: uses proxy for banner stats + compliance proxy for DSR/einwilligungen
- Fixes: banner not closeable due to API errors, consent not saving
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Browser blocks direct calls to backend-compliance:8093 due to
self-signed SSL certificate. All banner API calls now go through
Next.js API proxy at /api/sdk/v1/banner/* which runs server-side.
- New catch-all proxy: /api/sdk/v1/banner/[[...path]]/route.ts
Maps to backend-compliance:8002/api/compliance/banner/*
- Preview page: uses /api/sdk/v1/banner/ instead of https://macmini:8093
- CMP Dashboard: uses proxy for banner stats + compliance proxy for DSR/einwilligungen
- Fixes: banner not closeable due to API errors, consent not saving
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New route /sdk/cmp with full CMP dashboard
- 4 KPI cards: total consents, active consents, open DSR requests, configured sites
- Cookie category acceptance bars (necessary/statistics/marketing/functional)
- DSR breakdown: by status, by type (Art. 15-21), avg processing time, overdue count
- 9-point compliance checklist (banner, DSE, impressum, Art.7 proof, DSR, loeschfristen,
vendor AVV, email templates, EWR-only mode) — each links to relevant module
- 8 module cards with icons linking to all CMP sub-modules
- Real API integration: /banner/admin/stats, /einwilligungen/consents/stats, /dsr/stats
- Dashboard link added as first entry in CMP sidebar section
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New route /sdk/cmp with full CMP dashboard
- 4 KPI cards: total consents, active consents, open DSR requests, configured sites
- Cookie category acceptance bars (necessary/statistics/marketing/functional)
- DSR breakdown: by status, by type (Art. 15-21), avg processing time, overdue count
- 9-point compliance checklist (banner, DSE, impressum, Art.7 proof, DSR, loeschfristen,
vendor AVV, email templates, EWR-only mode) — each links to relevant module
- 8 module cards with icons linking to all CMP sub-modules
- Real API integration: /banner/admin/stats, /einwilligungen/consents/stats, /dsr/stats
- Dashboard link added as first entry in CMP sidebar section
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>