Go SDK points to external Qdrant (qdrant-dev.breakpilot.ai) with expired API key.
Fallback: search directly in local Qdrant (bp-core-qdrant:6333) which has
all collections: bp_compliance_datenschutz, bp_compliance_gesetze, atomic_controls_dedup.
Search strategy:
1. Try Go SDK RAG endpoint (preferred, has embedding-based search)
2. Fallback: Qdrant scroll with text-based regulation filter
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New module: rag_document_checker.py
- Searches RAG (Qdrant) for controls relevant to document type
- Filters by regulation (DSGVO Art.13, TDDDG §25, BGB §355 etc.)
- LLM (Qwen 3.5:35b) verifies each control against document text
- Returns fulfilled/missing with evidence text + severity
- Supports: DSI, Cookie, Impressum, Widerruf, AGB, DSFA, AVV, Loeschkonzept
Integration in doc-check endpoint:
- Regex checklist runs first (fast, deterministic)
- RAG checks run after (semantic, catches what regex misses)
- Both results combined in single response
LLM prompt returns JSON: {fulfilled, evidence, issue, severity}
Think-tags stripped, JSON extracted from response.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Only Cookie and Widerruf sections are checked as separate documents.
Social Media, DSFA, Betroffenenrechte, Dienste von Drittanbietern are
part of the parent DSI and no longer generate false findings.
Added PLAN-rag-document-check.md for Phase 2:
- RAG-based checks with document-type-specific Controls
- DSFA checklist (Art. 35 + Landes-Listen)
- AVV checklist (Art. 28)
- Reference detection (sub-doc → parent doc)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a single URL contains multiple document sections (e.g. IHK DSI page
with Cookies, Social Media, Dienste von Drittanbietern), the system now:
1. Extracts full page text (main document check as before)
2. Splits text at heading boundaries (short uppercase lines)
3. Classifies each section: Cookie→cookie checklist, Social Media→DSI etc.
4. Runs type-specific checklist per section
5. Returns all results: main doc + sub-sections
Section type detection via SECTION_TYPE_MAP patterns:
- 'Cookie*' → §25 TDDDG checklist
- 'Dienste von Drittanbietern' → DSI checklist
- 'Social Media' → DSI checklist (Art. 26 joint controllership)
- 'Widerrufsrecht' → §355 BGB checklist
- 'Impressum' → §5 TMG checklist
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New "Dokumenten-Pruefung" tab in Compliance Agent:
- User adds multiple URLs with document type (DSI, AGB, Impressum, Cookie, Widerruf)
- Each document loaded via Playwright, accordions expanded, text extracted
- Checked against type-specific legal checklist
- Optional: Cookie banner check via checkbox
Checklisten-UX (solves "100% looks like nothing was checked"):
- All checks shown per document: green checkmark + matched text excerpt
- Red X for missing fields with legal reference
- Builds user trust: "9 Punkte geprueft, alle bestanden"
- Expandable per document with completeness bar
New checklists:
- Impressum: §5 TMG (6 fields: name, address, contact, register, VAT, representative)
- Cookie-Richtlinie: §25 TDDDG (5 fields: types, purposes, retention, third-party, opt-out)
Backend:
- POST /agent/doc-check — async with polling (same pattern as /scan)
- DocCheckResult includes checks[] with passed/failed + matched_text
- dsi_document_checker returns all_checks in SCORE finding
- Email report shows per-document checklist
Files: agent_doc_check_routes.py (280 LOC), DocCheckTab.tsx (248 LOC),
ChecklistView.tsx (130 LOC), dsi_document_checker.py (+70 LOC)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each scan is a separate entry so users can track changes over time.
Increased max entries from 20 to 50.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Service detection:
- Only search script tags + src/href attributes for service patterns
- Prevents false positives from DSE text mentioning services
(e.g. IHK DSE describes etracker, 'google analytics' in text)
- Technical patterns (with regex chars) still checked in full HTML
Short documents:
- Documents with < 200 words flagged as 'Kurzhinweis' instead of
'MANGELHAFT' — too short for Art. 13 completeness check
- Prevents 96-word navigation pages from showing 8 missing fields
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes:
1. consent-tester: full_text truncation raised from 10,000 to 50,000 chars
(IHK Internetangebot has ~50K chars, Beschwerderecht was after 10K cutoff)
2. Backend: dse_text now combines Playwright HTML + ALL DSI discovery texts
for mandatory content checking. Previously only used first 8K chars from
one source, missing Verantwortlicher/DSB that were in DSI documents.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: When all 9 Art. 13 checks passed (100%), no SCORE finding
was created (line: 'if pct < 100'). The backend then defaulted to
completeness=0 because it looked for the SCORE finding to extract the %.
Fix: Always generate SCORE finding, even at 100%. Added 'OK' severity
for fully compliant documents.
This was the cause of 8 documents showing '0% MANGELHAFT' despite
containing all required information.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug 1 fix: When merging documents with identical word_count, prefer
titles starting with 'Datenschutzinformation' over generic section
headings like 'Zweck und Rechtsgrundlage'. This restores the main
'Datenschutzinformationen zum Internetangebot' document.
Bug 2 fix: After navigating to a document page, wait 3s (was 2s) for
JS content loading, then try 10+ content selectors before falling back
to body text (with nav/header/footer removed). Handles IHK-style JS
navigation where content loads after page.goto() completes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- localStorage-based scan history (persists across sessions)
- Each completed scan adds entry: URL, timestamp, findings count, docs count
- 'Letzte Scans' section below results shows clickable history entries
- Click loads URL into form (and shows cached result if same URL)
- Max 20 entries, deduplicates by URL (latest scan wins)
- History visible in 'Website-Scan' tab
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- URL, mode, tab, scan result persisted in localStorage
- Active scan_id stored — polling resumes when returning to page
- Scan results survive navigation to other SDK modules
- 'Scan laeuft noch...' shown when returning to in-progress scan
- Cleans up localStorage when scan completes or fails
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DSI Dedup (consent-tester):
- Only H1/H2 headings count as documents (not H3/H4 sub-sections)
- Sub-sections (Cookies, Betroffenenrechte, Social Media) are part of
parent document's full text, not separate documents
- Reduces IHK result from 30 to ~11 real documents
Backend (agent_scan_routes):
- ScanFinding gets doc_title field linking each finding to its document
- doc_title set when creating DSI findings for document attribution
Frontend (ScanResult.tsx):
- 3 sections: Services table, Document cards, General findings
- Documents: expandable cards with completeness bar (green/yellow/red)
- Findings grouped under their parent document
- Each card shows: title, word count, findings count, % completeness
- Findings without doc_title go to "Allgemeine Findings" section
Email Summary (agent_scan_helpers):
- Findings listed under their parent document
- General findings in separate section
- No more flat mixed list
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- scanProgress state tracks live progress (not mixed into scanData)
- ScanResult only renders when scanData.services exists (prevents crash)
- Purple progress bar with spinner shows current step during scan
- Fixes: TypeError 's.services.filter' when progress data set as scanData
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CE-Risikobeurteilung Datenerfassung mit 3 wählbaren Eingabe-Modi:
1. Interview-Modus (Chat-artig): Fragen werden nacheinander gestellt
wie im Kundengespräch. Antwort-Historie sichtbar.
2. Wizard-Modus: Schritt-für-Schritt durch 8 Sektionen.
3. Formular-Modus: Alle Sektionen als Accordion auf einer Seite.
20 strukturierte Fragen in 8 Abschnitten:
- Maschinenbeschreibung (Name, Typ, Baugruppen)
- Lebensphasen (Betrieb, Einrichten, Wartung)
- Bestimmungsgemäße Verwendung
- Vorhersehbare Fehlanwendung
- Qualifikation der Benutzer
- Räumliche/Zeitliche Grenzen
- Technische Daten (Kräfte, Spannungen, Temperaturen, Drehzahlen)
- Umgebungsbedingungen
answersToNarrativeText() konvertiert alle Antworten in den Freitext
der an POST /parse-narrative gesendet wird.
Ergebnis-Panel zeigt: Komponenten, Gefahren, Patterns, Energiequellen.
URL: /sdk/iace/[projectId]/interview
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fundamental fix: scans now run asynchronously with progress polling.
Backend:
- POST /scan starts background task, returns scan_id immediately
- GET /scan/{scan_id} returns status + progress + result when done
- 7 progress steps shown: Website scan, DSI discovery, DSE analysis,
SOLL/IST comparison, corrections, report, email
- In-memory job store (dict with scan_id → status/result)
- No timeout limits on scan duration
Frontend:
- POST starts scan, receives scan_id
- Polls GET every 5 seconds (max 120 attempts = 10 min)
- Shows live progress message during scan
- Displays result when completed, error when failed
Proxy:
- POST timeout reduced to 30s (just starts the job)
- GET timeout 10s (just status check)
- No more 504/connection-dropped errors
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
banner_detector.py, script_analyzer.py, category_tester.py, authenticated_scanner.py
were only on the feature branch — needed for consent-tester to start.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug 1: max_pages was hardcoded to 15 in backend call — raised to 50
Bug 2: DSI documents checked against text_preview (500 chars) — now uses
full_text (10,000 chars) for Art. 13 mandatory field checks
Bug 3: DSE text not found when Playwright misses DSE page — now falls
back to DSI Discovery full_text as second source
Bug 4: Backend timeout 120s too short for 50 pages — raised to 300s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These files existed on the feature branch but were never cherry-picked
to main, causing ModuleNotFoundError on import:
- dse_parser.py — parses DSE HTML into structured sections
- dse_matcher.py — matches detected services against DSE sections
- mandatory_content_checker.py — checks Art. 13 DSGVO mandatory fields
- legal_basis_validator.py — validates legal basis (lit. a-f)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NameError: name 're' is not defined at line 146 — the import was
accidentally removed when extracting helper functions to agent_scan_helpers.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both scanners now search until done, not until a counter runs out:
playwright_scanner.py:
- Default max_pages raised from 15 to 50
- Added 3-minute timeout as safety net
- Recursive link discovery on EVERY visited page (not just DSE pages)
- Stops when: all links visited OR max_pages OR timeout
dsi_discovery.py:
- Default max_documents raised from 30 to 100
- Added 5-minute timeout as safety net
- Recursive: on each visited page, searches for MORE DSI links
- Processes ALL discovered links exhaustively
- Stops when: no more pending links OR max_documents OR timeout
The scanners now behave like a real user: they follow every relevant
link they find, and on each new page they look for more links.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New checks (from EUIPO reference case):
- Check 9: Third-party DSE link — detects when consent dialog links to
external domain's privacy policy instead of own DSE (Art. 13 DSGVO)
- Check 10: Dark-pattern language — detects "muessen/erforderlich" for
non-essential cookies suggesting false technical necessity (EDPB Rn. 70)
- Check 11: Non-modal dismiss = consent — detects when clicking outside
dialog closes it (possibly treating as consent, Planet49 violation)
Refactor: extracted _check_banner_text (375 LOC) from consent_scanner.py
into services/banner_text_checker.py to keep both files under 500 LOC.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- X button to close banner (SDK admin context only)
- Overlay leaves sidebar area accessible (ml-16/ml-64)
- Click overlay backdrop to dismiss
- Preview page: close banner on API error (don't trap user)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Browser blocks direct calls to backend-compliance:8093 due to
self-signed SSL certificate. All banner API calls now go through
Next.js API proxy at /api/sdk/v1/banner/* which runs server-side.
- New catch-all proxy: /api/sdk/v1/banner/[[...path]]/route.ts
Maps to backend-compliance:8002/api/compliance/banner/*
- Preview page: uses /api/sdk/v1/banner/ instead of https://macmini:8093
- CMP Dashboard: uses proxy for banner stats + compliance proxy for DSR/einwilligungen
- Fixes: banner not closeable due to API errors, consent not saving
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New route /sdk/cmp with full CMP dashboard
- 4 KPI cards: total consents, active consents, open DSR requests, configured sites
- Cookie category acceptance bars (necessary/statistics/marketing/functional)
- DSR breakdown: by status, by type (Art. 15-21), avg processing time, overdue count
- 9-point compliance checklist (banner, DSE, impressum, Art.7 proof, DSR, loeschfristen,
vendor AVV, email templates, EWR-only mode) — each links to relevant module
- 8 module cards with icons linking to all CMP sub-modules
- Real API integration: /banner/admin/stats, /einwilligungen/consents/stats, /dsr/stats
- Dashboard link added as first entry in CMP sidebar section
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CMP Section in Sidebar:
- New "CMP" group with purple accent, above other module sections
- Links: Cookie-Banner, Live-Vorschau, Consent-Records, Consent-Verwaltung,
Vendor-Compliance, DSR Portal, Loeschfristen, E-Mail-Templates
Live Preview (/sdk/cookie-banner/preview):
- Simulated "MusterShop GmbH" website with full cookie banner
- Real API calls to POST /banner/consent (saves to DB)
- EWR-Only toggle functional in preview
- API Debug panel shows fingerprint, consent status, blocked vendors
- Response JSON viewer for API debugging
- Links to verify in Consent-Verwaltung, Consent-Records, DSR Portal
- "Consent zuruecksetzen" button to re-test
- Footer "Cookie-Einstellungen" link to reopen banner
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Deleted 6 unused components from /sdk/einwilligungen/cookie-banner/_components/
- Replaced page.tsx with Next.js redirect() to /sdk/cookie-banner
- Updated EinwilligungenNavTabs link to /sdk/cookie-banner
- Updated catalog page link to /sdk/cookie-banner
- Single source of truth: /sdk/cookie-banner (Step in "Rechtliche Texte")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>