Commit Graph

239 Commits

Author SHA1 Message Date
Benjamin Admin 2f0f76e365 fix: Add missing 'import re' to agent_scan_routes.py
NameError: name 're' is not defined at line 146 — the import was
accidentally removed when extracting helper functions to agent_scan_helpers.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:59:53 +02:00
Benjamin Admin 53f6f30cf0 feat: DSI document discovery + completeness check in agent scan workflow
Agent scan now automatically:
1. Discovers all legal documents via consent-tester /dsi-discovery endpoint
2. Classifies each as DSE/AGB/Widerruf/Cookie/Impressum
3. Checks completeness against type-specific checklists:
   - DSE: 9 Art. 13 DSGVO mandatory fields (controller, DPO, purposes,
     legal basis, recipients, third-country, retention, rights, complaint)
   - AGB: §305ff BGB (scope, contract formation, liability, jurisdiction)
   - Widerruf: §355 BGB (right info, 14-day deadline, form, consequences)
4. Adds findings per document to scan results
5. Shows discovered documents with completeness % in email summary
6. Returns discovered_documents list in API response

New files:
- dsi_document_checker.py (229 LOC) — checklists + classifier
- agent_scan_helpers.py (109 LOC) — extracted summary builder + corrections

Refactor: agent_scan_routes.py 537→448 LOC (under 500 budget)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:09:45 +02:00
Benjamin Admin d3c8811fdb feat: IAB TCF 2.2 — TC String encoder + purpose mapping + UI
- TCFEncoderService: generates base64url-encoded TC Strings per IAB spec
  with 12 purposes, vendor consent bitfield, CMP metadata
- Category-to-purpose mapping (necessary→none, statistics→1,7,8,9,10,
  marketing→1,2,3,4,5,6,7,12, functional→1,11)
- tcf_routes: 5 endpoints (purposes, features, mapping, encode, encode-categories)
- banner_consent_service: auto-generates TC String when tcf_enabled=true
- TCFSettings.tsx: enable/disable toggle, purpose grid with category mapping,
  TC String test generator, CMP registration info
- New "TCF/IAB" tab in cookie-banner page (7 tabs total)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 07:01:37 +02:00
Benjamin Admin c89a68e59e feat: Whistleblower backend + Scanner banner-check (last 2 gaps)
Whistleblower (HinSchG):
- Migration 118: 3 tables (reports, messages, measures) with
  HinSchG deadlines (7d acknowledgment, 3mo feedback)
- whistleblower_routes.py: 14 endpoints (CRUD, acknowledge, close,
  messages, measures, public submit, anonymous status check)
- Frontend api-operations.ts rewired from Go SDK to compliance proxy
- Access key format XXXX-XXXX-XXXX for anonymous reporters

Scanner banner-check (TTDSG § 25):
- CMP Dashboard: green "Kein Cookie-Banner erforderlich" when no
  trackers detected + no banner configured
- Red warning "Cookie-Banner fehlt!" when trackers found but no banner
- Mandatory note: Impressum (DDG § 5) + DSE (DSGVO Art. 13) still required

[migration-approved]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 00:22:18 +02:00
Benjamin Admin eb4ea8bc42 feat: EmailDeliveryService + professional DSR email templates
- EmailDeliveryService: load template → find published version →
  render {{variables}} → send via SMTP → audit log. Fallback to
  inline HTML when no published template exists.
- Migration 117: Professional HTML/text content for all 5 DSR
  templates (receipt, completion, rejection, identity, extension)
  with branded styling and proper Art. references
- DSRArt11Service now uses EmailDeliveryService with dsr_rejection
  template instead of hardcoded HTML

[migration-approved]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 23:38:32 +02:00
Benjamin Admin 060f351da7 feat: Art. 11 DSGVO — reject DSR when data subject not identifiable
- New DSRArt11Service: handles rejection with proper legal basis,
  automated email notification to requester explaining Art. 11
- POST /dsr/{id}/reject-art11 endpoint
- ActionButtons.tsx: "Nicht identifizierbar (Art. 11)" button
  shown when identity is not yet verified
- Also fixes: DSR export type-cast rollback handling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 23:30:18 +02:00
Benjamin Admin c55d0ab12a fix: DSR export type-cast bug + session rollback on partial failures
- tenant_id kept as string (PostgreSQL handles UUID cast)
- Einwilligungen query uses CAST(:tid AS VARCHAR) for compatibility
- Each data source query wrapped with rollback on failure to prevent
  cascading "transaction aborted" errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 23:15:25 +02:00
Benjamin Admin 02468c94c0 feat: DSR User Data Export — Art. 15 PDF + Art. 20 JSON/CSV
- DSRExportService: aggregates all CMP data about a user from
  Banner Consents, Einwilligungen, Audit Trail, DSR History
- GET /dsr/{id}/export-user-data?format=json|csv|pdf endpoint
- PDF: A4 reportlab with 4 sections (Consents, Einwilligungen,
  Audit-Trail, DSR-Anfragen) + cover page
- CSV: BOM-encoded for Excel with flattened data rows
- JSON: structured export with all data categories
- ActionButtons.tsx: PDF/JSON/CSV export buttons now functional

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 22:42:03 +02:00
Benjamin Admin 630fffc0cc feat: Academy integration — training gap detection after document approval (F7)
- Migration 115: compliance_role_training_mapping table (org roles → training codes)
- TrainingLinkService: queries training_modules/matrix/assignments to find gaps
  per person and role. Gracefully degrades when Go training tables don't exist yet.
- document_review_routes: 2 new endpoints (training-requirements, training-gaps)
- _notify_approval() now checks training gaps and sends emails to persons
  with outstanding modules, linking to /sdk/training/learner

[migration-approved]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 22:03:25 +02:00
Benjamin Admin 965af3a34c feat: A/B Testing + Compliance Report PDF (F5 + F8)
F5: A/B Testing for Consent Rate
- Migration 116: banner_variants table + variant tracking in audit log
- BannerABService: deterministic sticky bucketing via device hash,
  chi-squared significance testing, variant CRUD
- banner_ab_routes: 6 endpoints (CRUD + stats + assign)
- ABTestPanel.tsx: variant creation, traffic sliders, opt-in comparison
  chart with winner/significance badges
- New "A/B-Test" tab in cookie-banner page

F8: Compliance Report PDF
- CompliancePDFGenerator: reportlab-based A4 PDF covering all modules
  (Company Profile, TOM, VVT, DSFA, Risks, Vendors, Incidents,
  Reviews, Consents, Roles)
- compliance_report_routes: GET /compliance/report/pdf
- "Compliance-Report herunterladen" button on SDK dashboard

[migration-approved]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 21:42:50 +02:00
Benjamin Admin c3fcfe88ee feat: Vendor-level consent + Consent analytics (F4 + F6)
F4: Granular Vendor-Level Consent
- Migration 113: vendor_consents JSONB on banner_consents + audit_log
- ConsentCreate schema + BannerConsentDB model extended
- banner_consent_service stores vendor_consents alongside categories
- Audit trail includes vendor-level decisions + user_agent

F6: Consent Rate Analytics
- Migration 114: user_agent on audit_log + time-series index
- BannerAnalyticsService: time series, category breakdown, device stats
- banner_analytics_routes: 4 endpoints (overview, time-series, categories, devices)
- AnalyticsDashboard.tsx: KPIs, bar chart, category bars, device breakdown
- New "Analytik" tab in cookie-banner page

[migration-approved]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 20:58:06 +02:00
Benjamin Admin 9b4be663f7 feat: Rollenkonzept backend + SOP template (Phase 1-3)
- Migration 111: 3 new tables (org_roles, document_reviews, document_role_mapping)
  with seed data mapping all 71 doc types to 7 compliance roles
- org_role_routes.py: CRUD for roles, seed defaults, test email, mapping API
- document_review_routes.py: Review lifecycle (create→send→approve/reject)
  with approval notification to all affected roles
- Migration 112: SOP template (ISO 9001 structure, 21 placeholders)
- Added standard_operating_procedure to TemplateType, doc-labels, presets

[migration-approved]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 13:03:38 +02:00
Benjamin Admin 64700b355e feat: Review all 12 remaining policy templates + categorize
Migration 110: Updated descriptions and version for 12 previously
unreviewed templates (asset_management, backup, change_management,
cloud_security, devsecops, incident_response, logging, patch_management,
secrets_management, vulnerability_management, informationspflichten,
verpflichtungserklaerung).

All templates assessed as "Very Good" quality — only incremental
updates needed (AI Act, CRA, NIS2UmsuCG references in descriptions).

informationspflichten: Kept as separate compact checklist (distinct
from the full privacy_policy DSI template).
verpflichtungserklaerung: Kept as standalone HR document (employee
signs at onboarding). Added to HR & Mitarbeiter category.

Result: 88 templates, 44 at v1.1+, 0 unreviewed remaining.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 07:19:41 +02:00
Benjamin Admin 4b9cf34243 feat: Full template cleanup + categories by use case
Cleanup (109):
- Removed DPA duplicates (v1 DE + v1 EN, kept v2 DE)
- Removed cookie_banner duplicate (kept larger with IF-blocks)
- Removed impressum duplicate (kept larger with IF-blocks)
- Removed TOM duplicate (kept newest)
- Removed DSFA v1 (kept v2)
- Kept all 8 VVT templates (1 main + 7 industry templates)
- DB: 98 → 88 templates, 0 duplicates remaining

Categories restructured by use case:
- Website/App: DSI, Impressum, Cookie, Social Media
- Online-Shop: AGB, Widerruf, DSI, Cookie
- SaaS/Cloud: AGB, AVV, SLA, Cloud Agreement
- App/Plattform: Nutzungsbedingungen, Community Guidelines, AUP
- Vertraege (B2B): AVV, NDA, SLA, Cloud
- DSGVO-Pflichten: TOM, VVT, Loeschkonzept, DSFA
- Sicherheitskonzepte + Richtlinien (separate categories)
- HR & Mitarbeiter, Daten-Governance, Vendor, BCM

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 07:09:16 +02:00
Benjamin Admin 5298467275 feat: Privacy notice cleanup + English v2
- 108: Remove DSI duplicate (023 + 093 both wrote privacy_policy DE),
  remove outdated EN v1, create English Privacy Notice v2 with all
  modular sections (data categories table, retention periods, processor
  vs. controller guidance, Art. 21 right to object highlighted)

DB now has exactly 2 privacy_policy templates: DE + EN, both v2.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 07:03:06 +02:00
Benjamin Admin 91b4034fee feat: AGB cleanup + English Terms v2
- 106: Remove AGB duplicates and obsolete templates (terms_of_service
  DE/EN v1.0, liability clause) — replaced by agb v2.0
- 107: English Terms and Conditions v2 (EU-compliant, same structure
  as DE version with all IF-blocks)

DB now has exactly 2 AGB templates: DE + EN, both v2.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 06:59:28 +02:00
Benjamin Admin fe6764df9a fix: ensure JSONB array fields are always arrays in control API
Backend: _ensure_list() converts null/string/malformed JSONB to []
for requirements, test_procedure, evidence, open_anchors, tags.

Frontend: defensive Array.isArray() check on ControlDetail.tsx.

Fixes: TypeError: A.requirements.map is not a function

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 21:18:10 +02:00
Benjamin Admin db697924ed feat: Cookie banner vendors per category + {{COOKIE_TABLE}} generator
- CookieBannerOverlay: shows vendors per category with expandable tables
  (Verarbeiter, Cookies, Dauer, Land) for full transparency
- Demo vendors: 4 necessary, 3 statistics, 3 marketing, 3 functional
- cookie_table_generator.py: renders {{COOKIE_TABLE}} Markdown tables
  from vendor configs (DB) or service registry (fallback)
- SERVICE_COOKIES: 16 known vendor-to-cookie mappings with provider + country

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 20:06:44 +02:00
Benjamin Admin 17c67b4f25 feat: Cookie-Banner ↔ Backend Integration (DSR, Retention, Consent Proof)
Phase 1: Vendor sync from service registry (82+ services → banner vendors)
Phase 2: Category-based retention (marketing=90d, statistics=790d, not hardcoded 365d)
Phase 3: DSR ↔ Banner email linking (link-email, by-email, Art.17 erasure, Art.15/20 export)
Phase 4: Consent sync (Banner → Einwilligungen bridge)
Phase 6: Consent proof (SHA256 config hash + config_version in audit log, Art. 7(1) DSGVO)

New files:
- banner_dsr_service.py — email linking + DSR integration
- vendor_banner_sync.py — service registry → vendor configs
- migration 106 — linked_email, banner_config_hash, consent_version columns

Tests: 20+ new backend tests + 2 Playwright E2E test suites (API + UI)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 19:41:22 +02:00
Benjamin Admin c5b22e0c99 fix: derive intake flags from DETECTED SERVICES, not from text content
Fundamental architecture fix: data processing happens through APIs/scripts/
cookies — NOT through visible page text. A news site about healthcare does
NOT process health data.

Before: Qwen reads website text → guesses "health_data: true" (WRONG)
After: Google Analytics detected → tracking: true (CORRECT, deterministic)

New flow: detect services from HTML → map service categories to flags →
feed flags into UCCA assessment. No LLM needed for flag extraction.

SERVICE_TO_FLAGS maps categories: tracking→tracking, marketing→marketing+
third_party_sharing, payment→payment_data, heatmap→profiling, etc.
SPECIFIC_SERVICE_FLAGS for Klarna (Art.22), Stripe (US transfer), etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 08:37:51 +02:00
Benjamin Admin 0f3ec9061e fix: false positive findings + restore docs-src + §312k ecommerce filter
1. Intake prompt: "BETREIBER verarbeitet" statt "Text erwaehnt".
   IHK berichtet ueber Gesundheitsdaten → false. Vorher: true.
2. §312k Check: nur bei E-Commerce/Abo-Websites (Warenkorb, Shop, PayPal etc.)
   IHK hat keine Vertraege → kein Kuendigungsbutton noetig.
3. docs-src/ restored from commit 9824304

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 08:26:59 +02:00
Benjamin Admin e318215cc5 refactor: split agent_analyze_routes (420→309 LOC) + agent docs + migration
- Extracted website compliance checks + helpers to website_compliance_checks.py
- Created agent documentation (zeroclaw/docs/compliance-agent.md)
- DB migration 086 executed (compliance_agent_scans table)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 08:22:52 +02:00
Benjamin Admin d942b21354 feat: SCC + TIA templates for third-country transfers
New templates for the Vendor Compliance module:
- 105: Transfer Impact Assessment (TIA) — Schrems II risk assessment
  with country evaluation, government access assessment, supplementary
  measures, risk matrix, and go/conditional/deny decision
- 105: SCC Companion Document — annexes to EU Decision 2021/914
  (module selection C2C/C2P/P2P/P2C, party details, data description,
  TOMs, sub-processor list)

Template recommendations: SCC+TIA triggered by tech_third_country answer
Generator: New "Drittlandtransfer" category

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 10:19:56 +02:00
Benjamin Admin 3984f39329 feat: Phase 5 — Special templates (AI policy, BYOD, ISMS, consent, video DSI)
Phase 5 of the Document Templates Masterplan:

- 104: 5 new special templates:
  - ai_usage_policy: AI usage policy (AI Act Art. 4 training obligation,
    forbidden inputs, quality check, labeling, TDM opt-out)
  - byod_policy: Bring Your Own Device (container solution, remote wipe,
    DSFA, cost sharing options)
  - consent_texts: Double-Opt-In texts, newsletter, marketing, tracking,
    profiling consent, unsubscribe confirmation
  - video_conference_dsi: Video conference privacy notice (Zoom/Teams/Meet,
    recording consent, third-country transfer)
  - isms_manual: ISMS handbook (ISO 27001, document structure map to all
    other templates, PDCA cycle, management review)

Generator: 6 new categories (AI governance, ISMS, consent, special DSI,
internal policies)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 09:25:32 +02:00
Benjamin Admin 4417938558 feat: Phase 3 — Security + HR/Vendor/BCM policies
Phase 3 of the Document Templates Masterplan:

- 103: 4 new security policies (information_security_policy, password_policy,
  encryption_policy, access_control_policy) + updates for CRA (056) and
  all 15 HR/Vendor/BCM policies (072)

New templates:
- Information Security Policy: ISMS-Leitlinie (ISO 27001, BSI, NIS2)
- Password Policy: BSI/NIST compliant (12+ chars, MFA, no forced rotation)
- Encryption Policy: BSI TR-02102, algorithms, key management, TLS config
- Access Control Policy: RBAC, Least Privilege, Zero Trust, rezertification

Updates: AI Act + NIS2UmsuCG references for CRA and all 15 HR/Vendor/BCM
Generator: 6 new categories (security, HR, data, vendor, BCM policies)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 09:05:03 +02:00
Benjamin Admin 90c7f02b40 feat: Phase 2 — Security Concepts + DSFA + DSR updates
Phase 2 of the Document Templates Masterplan:

- 101: Security Concepts v2 (7 templates) — NIS2UmsuCG references,
  BSI Grundschutz++ modernization, AI Act cross-references,
  Zero Trust principle, ransomware-protected backups, NIS2 logging
- 102: DSFA + Pflichtenregister + DSR v2 — AI Act Art. 9 for DSFA,
  NIS2UmsuCG for Pflichtenregister, tenant_id fix for DSR processes

All 16 templates reviewed — already at good product level, only
incremental updates needed (standards references, cross-doc links).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 08:45:04 +02:00
Benjamin Admin f591871277 feat: Phase 1 — Whistleblower + Cookie/Impressum + HR-DSI templates
Phase 1 of the Document Templates Masterplan:

- 098: Whistleblower-Richtlinie (HinSchG) — 10 sections, anonymous
  reporting, 7-day confirmation, 3-month feedback, reprisal protection
- 099: Cookie-Banner + Impressum updates — OS-Plattform discontinued
  note (July 2025), description updates
- 100: Applicant DSI + Employee DSI — two new HR privacy notices with
  § 26 BDSG, 6-month retention (applicants), modular blocks for video
  interviews, talent pool, IT monitoring, company vehicles, works council

Generator: 25 new fields (whistleblower, applicant, employee categories)
Categories: whistleblower, hr_dsi added to document generator

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 08:29:52 +02:00
Benjamin Admin bae59e2ce0 feat: Document Templates v2 — 11 migrations + scope-based generator
Complete overhaul of document generator templates based on paragraph-by-paragraph
legal review of attorney-drafted templates (TOM, AVV, AGB, DSI, Community
Guidelines, Nutzungsbedingungen, Widerrufsbelehrung, Cookie-Richtlinie).

Templates (11 migrations 087-097):
- 087: TOM-Dokumentation v2 (11 categories incl. Trennungskontrolle)
- 088: AVV Art. 28 DSGVO (complete, §§ 1-11, 3 annexes)
- 089: Cross-document updates (Löschkonzept DIN 66399, VVT recipients)
- 090: AGB SaaS/Shop v2 (18 §§, B2B/B2C, IoT, physical goods, IP protection)
- 091: Community Guidelines v2 (3 tones, 11 modular categories, DSA-compliant)
- 092: Media & Content modules (MStV, AI Act Art. 50, UWG, Pressekodex)
- 093: DSI/Privacy Policy v2 (Art. 13 complete, shop+corporate modules)
- 094: Nutzungsbedingungen (Terms of Use, UGC, tipping, wallet, CC licenses)
- 095: Widerrufsbelehrung (SaaS + physical + IoT bundle + combo)
- 096: Social Media DSI (Facebook, YouTube, LinkedIn, TikTok, Meta Pixel)
- 097: Cookie-Richtlinie v2 (TDDDG § 25, consent banner, browser links)

Frontend (generator):
- scopeDefaults.ts: L1-L4 scope-based defaults from Compliance Scope Engine
- contextBridge.ts: TOMCtx + DPACtx interfaces (70+ new fields)
- contextBridge-helpers.ts: 35+ placeholder mappings for TOM/DPA/AGB
- _constants.ts: 120+ new generator fields (TOM, DPA, AGB, community,
  media, social, nutzungsbedingungen, widerruf, cookie, shop, IoT)
- page.tsx: Auto-prefill TOM/DPA from scope engine decision

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 01:18:33 +02:00
Benjamin Admin 58957a4aaa fix: Playwright user permission + etracker DSE matching + CMP skip
1. Dockerfile: install Playwright AS appuser (not root) so chromium
   binary is accessible at runtime. Was causing 500 error.
2. DSE service matching: text-search fallback when LLM extraction fails.
   If "etracker" appears in DSE text, mark as documented even without
   LLM parsing the service list.
3. CMP skip: consent managers in category "cmp" skipped (not just "other"
   with id "cmp").

NOT DEPLOYED — RAG pipeline is running.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 19:36:46 +02:00
Benjamin Admin cedc5de15d feat: Phase 10 — Playwright website scanner replaces httpx
New /website-scan endpoint in consent-tester service:
- Real browser renders JavaScript (finds dynamic content)
- Clicks navigation menus (discovers hidden sub-pages like IHK DSB page)
- Follows links within DSE to find regional privacy policies
- Collects rendered HTML for each page (after JS execution)

Backend integration:
- agent_scan_routes tries Playwright first, falls back to httpx
- DSE text and HTML extracted from Playwright-rendered pages
- Service detection runs on rendered HTML (catches JS-loaded scripts)

Also fixes:
- GA regex: G-[A-Z0-9]{8,12} prevents CSS class false positives
- etracker added to service registry
- External page scanning blocked (same-domain only)
- CSS/JS/image files excluded from page list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 19:16:50 +02:00
Benjamin Admin 5eeef3a9c3 fix: 4 bugs from IHK scan — false positives + missing etracker
1. GA regex: G-\w{5,} matched CSS classes (g-7031048). Now requires
   G-[A-Z0-9]{8,12} (uppercase after G-, 8-12 chars = real GA4 ID)
2. External page scanning: DSE-internal links now SAME DOMAIN only.
   Previously followed links to etracker.com, google.de/policies etc.
   and detected services on THOSE sites as IHK services.
3. Added etracker to service registry (DE, ePrivacy-certified)
4. CSS/JS/image files excluded from page scanning
5. Navigation-pattern links for deeper DSE sub-pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 19:08:07 +02:00
Benjamin Admin fff47cc52e fix: 4 bugs from IHK Konstanz scan validation
1. DSE-Matcher: Google/YouTube false match — now requires 2+ word match
   for provider-name fallback, not just "Google" matching YouTube section
2. AGB/Widerrufsbelehrung: only_ecommerce flag — skips for non-shop
   websites (detected via payment providers, cart keywords)
3. DSE-internal link following — scanner now discovers links WITHIN the
   privacy policy and scans those too (finds regional DSE sub-pages)
4. Expanded keyword synonyms for DSE mandatory checks:
   - "Zweck und Rechtsgrundlage" now matches "zwecke"
   - "behoerdlichen datenschutzbeauftragt" matches DSB
   - "aufsichtsbehörde" with umlaut matches

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 17:57:19 +02:00
Benjamin Admin 0f3ba9c207 test: Lit-Mapping validation — Dict vs Control Library comparison
8 test cases with deliberately wrong legal basis assignments:
- Cookie tracking on lit. f (should be lit. a)
- Analytics on lit. b (should be lit. a)
- Newsletter on lit. f (should be lit. a)
- Klarna without Art. 22
- Session recording on lit. f
- 2 correct cases (should NOT trigger findings)

Runs both hardcoded dict AND Control Library query, compares results.
If Control Library passes all → dict can be removed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 16:56:38 +02:00
Benjamin Admin 2c9cea74e3 docs: instruction for hardcoded knowledge → Control Library migration
6 files with hardcoded legal knowledge identified. Review deadline 2026-07-01.
legal_basis_validator.py marked with warning log on every use.
Instruction file for other session to execute migration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 16:33:48 +02:00
Benjamin Admin 4bf92f42b8 feat: Phase 9 — Authenticated Testing + Legal Basis Validator (lit. mapping)
Phase 9: Playwright login + 5 post-login checks:
- §312k BGB: Kündigungsbutton (2 Klicks)
- Art. 17 DSGVO: Konto löschen
- Art. 20 DSGVO: Daten exportieren
- Art. 7(3): Einwilligungen widerrufen
- Art. 15: Profildaten einsehen
Auto-detects login form selectors. Credentials destroyed after test.

Legal Basis Validator: Checks 7 common lit-mapping mistakes:
- Cookie tracking on lit. f instead of lit. a (Planet49)
- Analytics on lit. b (contract overextension)
- Klarna without Art. 22 reference
- Session recording without consent
Integrated into website scan pipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 16:08:41 +02:00
Benjamin Admin 8336c01c5c feat: Phase 6-8 — PDF export, recurring scans, multi-website compare
Phase 6: PDF export via WeasyPrint — POST /agent/scans/pdf generates
printable compliance report with findings table, service comparison,
risk badge, and legal disclaimer.

Phase 7: Recurring scans — POST /agent/monitored-urls to add URLs,
POST /agent/run-scheduled triggers all enabled scans (cron/ZeroClaw).
In-memory storage with DB upgrade path.

Phase 8: Multi-website compare — POST /agent/compare with 2-5 URLs,
parallel scanning, comparison table (risk, findings, services, compliance
features per site).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 15:27:51 +02:00
Benjamin Admin e35db90232 feat: Phase 5 — DB persistence for scan results + Phase 10 in plan
- Migration 086: compliance_agent_scans table (findings, services, corrections)
- agent_history_routes.py: POST /scans (save), GET /scans (list), GET /scans/{id}
- Scan results survive page reloads and can be reviewed later
- Phase 10 (Playwright website scanner) added to product roadmap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 15:17:51 +02:00
Benjamin Admin 53774886e7 perf: Phase 4 — parallel page fetching (asyncio.gather)
Scan pages in parallel instead of sequential. Reduces scan time
from ~10s (5 pages × 2s) to ~3s (all pages at once).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 15:09:03 +02:00
Benjamin Admin 5c5054f740 feat: Phase 3 — registry 82 services, mandatory checker, SDK flow step
- website_scanner.py: imports from master service_registry.py (82 services)
- agent_scan_routes.py: mandatory content checks (documents + DSE sections)
- steps-betrieb.ts: Compliance Agent step added to SDK Flow (seq 5000)
- PLAN: Phase 9 (Authenticated Testing) added to product roadmap

Mandatory checks know what MUST be there:
- Documents: Impressum, DSE, AGB, Widerrufsbelehrung
- DSE content: 9 Art. 13 DSGVO fields (DSB, Speicherdauer, etc.)
- Impressum content: 5 §5 TMG fields (GF, HRB, USt-ID, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 15:04:44 +02:00
Benjamin Admin 642382cbe8 feat: Mandatory Content Checker — knows what MUST be there
Three check levels:
1. Documents: Impressum, DSE, AGB, Widerrufsbelehrung must exist as pages
2. DSE content: 9 Art. 13 DSGVO mandatory sections (Verantwortlicher,
   DSB-Kontakt, Zwecke, Rechtsgrundlagen, Speicherdauer, Betroffenenrechte,
   Beschwerderecht, Drittlandtransfer, Profiling)
3. Impressum content: 5 §5 TMG mandatory fields (GF, Handelsregister,
   USt-ID, Anschrift, Kontakt)

Detects both missing documents AND missing content within documents.
Also catches HTTP errors (page exists but returns 404/500).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 14:23:22 +02:00
Benjamin Admin f219b9c244 feat: Master Service Registry — 82 third-party services across 15 categories
Tracking (12), Marketing/Ads (9), Newsletter (8), CDN/Fonts (7),
Chatbots/Support (7), Payment (5), Heatmaps (4), A/B Testing (3),
Tag Managers (3), Push (3), Video (4), Social (3), Error Tracking (4),
CRM (3), Maps (3), Captcha (3), Accessibility (2), CMP (1).

Each entry: regex, provider, country, EU adequacy, consent requirement,
legal reference. Pure data, no logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 14:21:32 +02:00
Benjamin Admin 0ba76d041a feat: DSE parser + matcher — textblock references in scan findings
- dse_parser.py: HTML → structured sections (heading, number, content, parent)
  Uses heading hierarchy (h1-h4) with regex fallback
- dse_matcher.py: matches detected services against DSE sections
  Exact name → provider → category matching with insertion point suggestion
- agent_scan_routes: TextReference model in findings (original text,
  section, paragraph, correction type, insert_after)

Enables showing: "Google Analytics not found in DSE, insert after
Section 2.4 Cookies und Tracking"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 11:55:26 +02:00
Benjamin Admin 4298ae17ab feat: Phase 0+1 — LLM intake extraction + control relevance filter
Phase 0: Qwen extracts 14 structured intake flags (personal_data,
marketing, profiling, ai_usage, etc.) instead of keyword matching.
Fallback to keywords if LLM unavailable. Flags feed into UCCA for
accurate scoring.

Phase 1: Control relevance filter removes false positives.
C_TRANSPARENCY only recommended if AI/ML keywords found in text.
7 control rules with keyword lists + intake flag fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 11:36:24 +02:00
Benjamin Admin 6a77cf6a89 feat: HTML email format, tab info hints, scan history
- Summary now renders as styled HTML (table layout, colored risk badge,
  warning banners) instead of plaintext in <div>
- Tab info text explains scope: "Analysiert nur die eingegebene URL" vs
  "Scannt automatisch 5-10 Unterseiten"
- Scan history with findings count badge and page count

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 11:04:29 +02:00
Benjamin Admin b39c1d5dce feat: DSR Prozessbeschreibungen Art. 15-21 mit Swim-Lane-Diagrammen
Build + Deploy / build-admin-compliance (push) Successful in 1m56s
Build + Deploy / build-backend-compliance (push) Successful in 3m5s
Build + Deploy / build-ai-sdk (push) Successful in 47s
Build + Deploy / build-developer-portal (push) Successful in 1m5s
Build + Deploy / build-tts (push) Successful in 1m23s
Build + Deploy / build-document-crawler (push) Successful in 33s
Build + Deploy / build-dsms-gateway (push) Successful in 23s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m40s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 42s
CI / test-python-backend (push) Successful in 47s
CI / test-python-document-crawler (push) Successful in 33s
CI / test-python-dsms-gateway (push) Successful in 22s
CI / validate-canonical-controls (push) Successful in 18s
Build + Deploy / trigger-orca (push) Successful in 2m53s
7 vollstaendige Prozessbeschreibungen fuer den Document Generator:
- Art. 15: Auskunftsrecht (30 Tage, 6 Schritte, Informationskatalog)
- Art. 16: Berichtigungsrecht (14 Tage, inkl. Art. 19 Mitteilung)
- Art. 17: Loeschungsrecht (14 Tage, Art. 17(3) Ausnahmen-Checkliste)
- Art. 18: Einschraenkungsrecht (14 Tage, erlaubte Verarbeitung)
- Art. 19: Mitteilungspflicht (automatisch bei Art. 16/17/18)
- Art. 20: Datenuebertragbarkeit (30 Tage, JSON/CSV/XML Export)
- Art. 21: Widerspruchsrecht (30 Tage, Sonderfall Direktwerbung)

Jede Beschreibung enthaelt:
- Mermaid Swim-Lane-Diagramm (Betroffener/Sachbearbeitung/Fachabteilung/DSB)
- Detaillierte Schritt-Tabelle mit Verantwortlichkeiten und Fristen
- Rechtsgrundlagen-Verweise
- Firmen-Platzhalter (FIRMENNAME, VERSION, DATUM, DSB_NAME)

Integration:
- 7 neue Typen in VALID_DOCUMENT_TYPES (legal_template_routes.py)
- Neue Kategorie "DSR-Prozesse" im Document Generator Frontend
- DSR types-core.ts: templateType Feld verknuepft DSR → Document Generator
- Migration 085 seeded die Templates in die legal_templates Tabelle

[migration-approved]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 17:53:44 +02:00
Benjamin Admin b06a33a5fe fix: syntax error — missing closing paren in scan summary builder 2026-04-28 17:41:11 +02:00
Benjamin Admin 6c0e76f96d feat: show scanned pages in email summary + frontend (expandable list)
Email now lists all scanned URLs with checkmark/cross status.
Frontend shows collapsible "X Seiten gescannt — Details anzeigen".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 17:26:03 +02:00
Benjamin Admin 0106f3b5b6 fix: use Ollama directly for correction generation (bypass SDK think-mode)
SDK LLM chat returns empty content due to Qwen think-mode. Direct Ollama
/api/generate call with stream:false gets the full response including
think tags which we strip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 16:30:51 +02:00
Benjamin Admin b175ad2594 fix: increase LLM timeouts for scan corrections (90s) and DSE extraction (120s)
Qwen 3.5:35b needs ~30-60s per call. Multi-call scan was timing out.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 16:05:35 +02:00
Benjamin Admin 711b9b3146 feat: website scanner with SOLL/IST service comparison + corrections
- website_scanner.py: multi-page crawl, 20+ service patterns (tracking,
  CDN, chatbots, payment, fonts, captcha, video), AI text detection
- dse_service_extractor.py: LLM extracts services from privacy policy text
- agent_scan_routes.py: POST /agent/scan — combines scan + DSE comparison,
  generates findings (undocumented, outdated, third-country transfer),
  auto-corrections via Qwen in pre-launch mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 15:35:31 +02:00