Phase 3 of the Document Templates Masterplan:
- 103: 4 new security policies (information_security_policy, password_policy,
encryption_policy, access_control_policy) + updates for CRA (056) and
all 15 HR/Vendor/BCM policies (072)
New templates:
- Information Security Policy: ISMS-Leitlinie (ISO 27001, BSI, NIS2)
- Password Policy: BSI/NIST compliant (12+ chars, MFA, no forced rotation)
- Encryption Policy: BSI TR-02102, algorithms, key management, TLS config
- Access Control Policy: RBAC, Least Privilege, Zero Trust, rezertification
Updates: AI Act + NIS2UmsuCG references for CRA and all 15 HR/Vendor/BCM
Generator: 6 new categories (security, HR, data, vendor, BCM policies)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Dockerfile: install Playwright AS appuser (not root) so chromium
binary is accessible at runtime. Was causing 500 error.
2. DSE service matching: text-search fallback when LLM extraction fails.
If "etracker" appears in DSE text, mark as documented even without
LLM parsing the service list.
3. CMP skip: consent managers in category "cmp" skipped (not just "other"
with id "cmp").
NOT DEPLOYED — RAG pipeline is running.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New /website-scan endpoint in consent-tester service:
- Real browser renders JavaScript (finds dynamic content)
- Clicks navigation menus (discovers hidden sub-pages like IHK DSB page)
- Follows links within DSE to find regional privacy policies
- Collects rendered HTML for each page (after JS execution)
Backend integration:
- agent_scan_routes tries Playwright first, falls back to httpx
- DSE text and HTML extracted from Playwright-rendered pages
- Service detection runs on rendered HTML (catches JS-loaded scripts)
Also fixes:
- GA regex: G-[A-Z0-9]{8,12} prevents CSS class false positives
- etracker added to service registry
- External page scanning blocked (same-domain only)
- CSS/JS/image files excluded from page list
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. GA regex: G-\w{5,} matched CSS classes (g-7031048). Now requires
G-[A-Z0-9]{8,12} (uppercase after G-, 8-12 chars = real GA4 ID)
2. External page scanning: DSE-internal links now SAME DOMAIN only.
Previously followed links to etracker.com, google.de/policies etc.
and detected services on THOSE sites as IHK services.
3. Added etracker to service registry (DE, ePrivacy-certified)
4. CSS/JS/image files excluded from page scanning
5. Navigation-pattern links for deeper DSE sub-pages
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mandatory_content_checker.py keywords break with alternative formulations.
Solution: LLM-based check per mandatory field (9 calls, parallelizable).
For other session to implement alongside Dict→Control migration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. DSE-Matcher: Google/YouTube false match — now requires 2+ word match
for provider-name fallback, not just "Google" matching YouTube section
2. AGB/Widerrufsbelehrung: only_ecommerce flag — skips for non-shop
websites (detected via payment providers, cart keywords)
3. DSE-internal link following — scanner now discovers links WITHIN the
privacy policy and scans those too (finds regional DSE sub-pages)
4. Expanded keyword synonyms for DSE mandatory checks:
- "Zweck und Rechtsgrundlage" now matches "zwecke"
- "behoerdlichen datenschutzbeauftragt" matches DSB
- "aufsichtsbehörde" with umlaut matches
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8 test cases with deliberately wrong legal basis assignments:
- Cookie tracking on lit. f (should be lit. a)
- Analytics on lit. b (should be lit. a)
- Newsletter on lit. f (should be lit. a)
- Klarna without Art. 22
- Session recording on lit. f
- 2 correct cases (should NOT trigger findings)
Runs both hardcoded dict AND Control Library query, compares results.
If Control Library passes all → dict can be removed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6 files with hardcoded legal knowledge identified. Review deadline 2026-07-01.
legal_basis_validator.py marked with warning log on every use.
Instruction file for other session to execute migration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 6: PDF export via WeasyPrint — POST /agent/scans/pdf generates
printable compliance report with findings table, service comparison,
risk badge, and legal disclaimer.
Phase 7: Recurring scans — POST /agent/monitored-urls to add URLs,
POST /agent/run-scheduled triggers all enabled scans (cron/ZeroClaw).
In-memory storage with DB upgrade path.
Phase 8: Multi-website compare — POST /agent/compare with 2-5 URLs,
parallel scanning, comparison table (risk, findings, services, compliance
features per site).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Migration 086: compliance_agent_scans table (findings, services, corrections)
- agent_history_routes.py: POST /scans (save), GET /scans (list), GET /scans/{id}
- Scan results survive page reloads and can be reviewed later
- Phase 10 (Playwright website scanner) added to product roadmap
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scan pages in parallel instead of sequential. Reduces scan time
from ~10s (5 pages × 2s) to ~3s (all pages at once).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Third tab "Cookie-Test" in Compliance Agent:
- Phase A: Before consent (tracking without permission)
- Phase B: After rejection (CRITICAL if tracking persists)
- Phase C: After acceptance (undocumented services)
- CMP badge (Didomi, OneTrust, etc.)
- Violation cards with severity badges and legal references
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New independent service (port 8094) with headless Chromium:
- Phase A: What loads BEFORE any consent interaction
- Phase B: What loads AFTER rejecting consent (CRITICAL if tracking persists)
- Phase C: What loads AFTER accepting (check against cookie policy)
- 10 CMP-specific selectors (Didomi, OneTrust, Cookiebot, Usercentrics, etc.)
- Generic fallback via button text matching
- 18 tracking service patterns for script classification
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shows for each finding:
- Original text block from DSE (or "missing" indicator)
- Position: section heading, number, parent section, paragraph index
- Correction: insert/append/replace with copy button
Falls back to plain correction view if no text reference available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- dse_parser.py: HTML → structured sections (heading, number, content, parent)
Uses heading hierarchy (h1-h4) with regex fallback
- dse_matcher.py: matches detected services against DSE sections
Exact name → provider → category matching with insertion point suggestion
- agent_scan_routes: TextReference model in findings (original text,
section, paragraph, correction type, insert_after)
Enables showing: "Google Analytics not found in DSE, insert after
Section 2.4 Cookies und Tracking"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 0: Qwen extracts 14 structured intake flags (personal_data,
marketing, profiling, ai_usage, etc.) instead of keyword matching.
Fallback to keywords if LLM unavailable. Flags feed into UCCA for
accurate scoring.
Phase 1: Control relevance filter removes false positives.
C_TRANSPARENCY only recommended if AI/ML keywords found in text.
7 control rules with keyword lists + intake flag fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Summary now renders as styled HTML (table layout, colored risk badge,
warning banners) instead of plaintext in <div>
- Tab info text explains scope: "Analysiert nur die eingegebene URL" vs
"Scannt automatisch 5-10 Unterseiten"
- Scan history with findings count badge and page count
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Advisor now knows about: project setup (3 steps), all SDK modules
(DSGVO, AI Act, CE, independent modules), recommended workflow order,
navigation (sidebar, CommandBar, SDK-Flow). No business secrets.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same pattern as the email templates variables fix. Backend may return
placeholders as object instead of array.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Chat-Verlauf wird als strukturiertes Beratungsprotokoll per Email
an den DSB gesendet. Button erscheint im Header sobald Nachrichten
vorhanden sind. Zeigt Checkmark nach erfolgreichem Versand.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Widgets were hidden behind projectId guard. Removed condition so new
users can ask questions (e.g. "Wie lege ich ein Projekt an?") before
creating a project.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Email now lists all scanned URLs with checkmark/cross status.
Frontend shows collapsible "X Seiten gescannt — Details anzeigen".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SDK LLM chat returns empty content due to Qwen think-mode. Direct Ollama
/api/generate call with stream:false gets the full response including
think tags which we strip.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend gibt variables manchmal als {} (Objekt) statt [] (Array)
zurueck. (template.variables || []).map() greift nicht weil {}
truthy ist. Fix: Array.isArray() Check in TemplateCard + EditorTab.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Automated comparison: services mentioned in privacy policy vs. actually
embedded on website. Three categories: undocumented (Art. 13 violation),
outdated (cleanup), correctly documented (check third country only).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Multi-page crawl: scan 5-10 strategic pages (start, footer links) for
chatbot widgets, AI text mentions, and tracking services. Feed results
into relevance filter to reduce false positives.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses false-positive controls like C_TRANSPARENCY being recommended
when no AI usage is evident. Plan for separate implementation session.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Backend: mode field in request, adapts summary tone and email subject
- Pre-launch: "Implementieren Sie X vor Veroeffentlichung"
- Post-launch: "ACHTUNG: Maengel sind oeffentlich sichtbar, sofortige Nachbesserung"
- Frontend: Mode toggle (internes Dokument vs. Live-Website)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>