NameError: name 're' is not defined at line 146 — the import was
accidentally removed when extracting helper functions to agent_scan_helpers.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- EmailDeliveryService: load template → find published version →
render {{variables}} → send via SMTP → audit log. Fallback to
inline HTML when no published template exists.
- Migration 117: Professional HTML/text content for all 5 DSR
templates (receipt, completion, rejection, identity, extension)
with branded styling and proper Art. references
- DSRArt11Service now uses EmailDeliveryService with dsr_rejection
template instead of hardcoded HTML
[migration-approved]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New DSRArt11Service: handles rejection with proper legal basis,
automated email notification to requester explaining Art. 11
- POST /dsr/{id}/reject-art11 endpoint
- ActionButtons.tsx: "Nicht identifizierbar (Art. 11)" button
shown when identity is not yet verified
- Also fixes: DSR export type-cast rollback handling
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- tenant_id kept as string (PostgreSQL handles UUID cast)
- Einwilligungen query uses CAST(:tid AS VARCHAR) for compatibility
- Each data source query wrapped with rollback on failure to prevent
cascading "transaction aborted" errors
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DSRExportService: aggregates all CMP data about a user from
Banner Consents, Einwilligungen, Audit Trail, DSR History
- GET /dsr/{id}/export-user-data?format=json|csv|pdf endpoint
- PDF: A4 reportlab with 4 sections (Consents, Einwilligungen,
Audit-Trail, DSR-Anfragen) + cover page
- CSV: BOM-encoded for Excel with flattened data rows
- JSON: structured export with all data categories
- ActionButtons.tsx: PDF/JSON/CSV export buttons now functional
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Migration 115: compliance_role_training_mapping table (org roles → training codes)
- TrainingLinkService: queries training_modules/matrix/assignments to find gaps
per person and role. Gracefully degrades when Go training tables don't exist yet.
- document_review_routes: 2 new endpoints (training-requirements, training-gaps)
- _notify_approval() now checks training gaps and sends emails to persons
with outstanding modules, linking to /sdk/training/learner
[migration-approved]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Migration 111: 3 new tables (org_roles, document_reviews, document_role_mapping)
with seed data mapping all 71 doc types to 7 compliance roles
- org_role_routes.py: CRUD for roles, seed defaults, test email, mapping API
- document_review_routes.py: Review lifecycle (create→send→approve/reject)
with approval notification to all affected roles
- Migration 112: SOP template (ISO 9001 structure, 21 placeholders)
- Added standard_operating_procedure to TemplateType, doc-labels, presets
[migration-approved]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend: _ensure_list() converts null/string/malformed JSONB to []
for requirements, test_procedure, evidence, open_anchors, tags.
Frontend: defensive Array.isArray() check on ControlDetail.tsx.
Fixes: TypeError: A.requirements.map is not a function
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CookieBannerOverlay: shows vendors per category with expandable tables
(Verarbeiter, Cookies, Dauer, Land) for full transparency
- Demo vendors: 4 necessary, 3 statistics, 3 marketing, 3 functional
- cookie_table_generator.py: renders {{COOKIE_TABLE}} Markdown tables
from vendor configs (DB) or service registry (fallback)
- SERVICE_COOKIES: 16 known vendor-to-cookie mappings with provider + country
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fundamental architecture fix: data processing happens through APIs/scripts/
cookies — NOT through visible page text. A news site about healthcare does
NOT process health data.
Before: Qwen reads website text → guesses "health_data: true" (WRONG)
After: Google Analytics detected → tracking: true (CORRECT, deterministic)
New flow: detect services from HTML → map service categories to flags →
feed flags into UCCA assessment. No LLM needed for flag extraction.
SERVICE_TO_FLAGS maps categories: tracking→tracking, marketing→marketing+
third_party_sharing, payment→payment_data, heatmap→profiling, etc.
SPECIFIC_SERVICE_FLAGS for Klarna (Art.22), Stripe (US transfer), etc.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Dockerfile: install Playwright AS appuser (not root) so chromium
binary is accessible at runtime. Was causing 500 error.
2. DSE service matching: text-search fallback when LLM extraction fails.
If "etracker" appears in DSE text, mark as documented even without
LLM parsing the service list.
3. CMP skip: consent managers in category "cmp" skipped (not just "other"
with id "cmp").
NOT DEPLOYED — RAG pipeline is running.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New /website-scan endpoint in consent-tester service:
- Real browser renders JavaScript (finds dynamic content)
- Clicks navigation menus (discovers hidden sub-pages like IHK DSB page)
- Follows links within DSE to find regional privacy policies
- Collects rendered HTML for each page (after JS execution)
Backend integration:
- agent_scan_routes tries Playwright first, falls back to httpx
- DSE text and HTML extracted from Playwright-rendered pages
- Service detection runs on rendered HTML (catches JS-loaded scripts)
Also fixes:
- GA regex: G-[A-Z0-9]{8,12} prevents CSS class false positives
- etracker added to service registry
- External page scanning blocked (same-domain only)
- CSS/JS/image files excluded from page list
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. GA regex: G-\w{5,} matched CSS classes (g-7031048). Now requires
G-[A-Z0-9]{8,12} (uppercase after G-, 8-12 chars = real GA4 ID)
2. External page scanning: DSE-internal links now SAME DOMAIN only.
Previously followed links to etracker.com, google.de/policies etc.
and detected services on THOSE sites as IHK services.
3. Added etracker to service registry (DE, ePrivacy-certified)
4. CSS/JS/image files excluded from page scanning
5. Navigation-pattern links for deeper DSE sub-pages
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. DSE-Matcher: Google/YouTube false match — now requires 2+ word match
for provider-name fallback, not just "Google" matching YouTube section
2. AGB/Widerrufsbelehrung: only_ecommerce flag — skips for non-shop
websites (detected via payment providers, cart keywords)
3. DSE-internal link following — scanner now discovers links WITHIN the
privacy policy and scans those too (finds regional DSE sub-pages)
4. Expanded keyword synonyms for DSE mandatory checks:
- "Zweck und Rechtsgrundlage" now matches "zwecke"
- "behoerdlichen datenschutzbeauftragt" matches DSB
- "aufsichtsbehörde" with umlaut matches
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6 files with hardcoded legal knowledge identified. Review deadline 2026-07-01.
legal_basis_validator.py marked with warning log on every use.
Instruction file for other session to execute migration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 6: PDF export via WeasyPrint — POST /agent/scans/pdf generates
printable compliance report with findings table, service comparison,
risk badge, and legal disclaimer.
Phase 7: Recurring scans — POST /agent/monitored-urls to add URLs,
POST /agent/run-scheduled triggers all enabled scans (cron/ZeroClaw).
In-memory storage with DB upgrade path.
Phase 8: Multi-website compare — POST /agent/compare with 2-5 URLs,
parallel scanning, comparison table (risk, findings, services, compliance
features per site).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Migration 086: compliance_agent_scans table (findings, services, corrections)
- agent_history_routes.py: POST /scans (save), GET /scans (list), GET /scans/{id}
- Scan results survive page reloads and can be reviewed later
- Phase 10 (Playwright website scanner) added to product roadmap
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scan pages in parallel instead of sequential. Reduces scan time
from ~10s (5 pages × 2s) to ~3s (all pages at once).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- dse_parser.py: HTML → structured sections (heading, number, content, parent)
Uses heading hierarchy (h1-h4) with regex fallback
- dse_matcher.py: matches detected services against DSE sections
Exact name → provider → category matching with insertion point suggestion
- agent_scan_routes: TextReference model in findings (original text,
section, paragraph, correction type, insert_after)
Enables showing: "Google Analytics not found in DSE, insert after
Section 2.4 Cookies und Tracking"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 0: Qwen extracts 14 structured intake flags (personal_data,
marketing, profiling, ai_usage, etc.) instead of keyword matching.
Fallback to keywords if LLM unavailable. Flags feed into UCCA for
accurate scoring.
Phase 1: Control relevance filter removes false positives.
C_TRANSPARENCY only recommended if AI/ML keywords found in text.
7 control rules with keyword lists + intake flag fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Summary now renders as styled HTML (table layout, colored risk badge,
warning banners) instead of plaintext in <div>
- Tab info text explains scope: "Analysiert nur die eingegebene URL" vs
"Scannt automatisch 5-10 Unterseiten"
- Scan history with findings count badge and page count
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Email now lists all scanned URLs with checkmark/cross status.
Frontend shows collapsible "X Seiten gescannt — Details anzeigen".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SDK LLM chat returns empty content due to Qwen think-mode. Direct Ollama
/api/generate call with stream:false gets the full response including
think tags which we strip.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Backend: mode field in request, adapts summary tone and email subject
- Pre-launch: "Implementieren Sie X vor Veroeffentlichung"
- Post-launch: "ACHTUNG: Maengel sind oeffentlich sichtbar, sofortige Nachbesserung"
- Frontend: Mode toggle (internes Dokument vs. Live-Website)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Scan public website for cancellation button, imprint, privacy link, cookie consent
- Generate follow-up questions when checks can't be verified without login
- User answers "no" → finding with legal basis is added to results
- Frontend: FollowUpQuestions component with Ja/Nein buttons
- Sidebar: "Compliance Agent" entry added under KI-Compliance
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SessionLocal: 5x verwendet fuer DB-Sessions ausserhalb Depends()
HTTPException: verwendet in Framework-Validation
text: 55x verwendet fuer raw SQL queries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>