Commit Graph

226 Commits

Author SHA1 Message Date
Benjamin Admin 293c58d0dd feat: Add actionable hints to all 138 compliance checks
Build + Deploy / build-admin-compliance (push) Successful in 1m40s
Build + Deploy / build-backend-compliance (push) Successful in 7s
Build + Deploy / build-ai-sdk (push) Successful in 35s
Build + Deploy / build-developer-portal (push) Successful in 8s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 8s
Build + Deploy / build-dsms-gateway (push) Successful in 7s
Build + Deploy / build-dsms-node (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m50s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 40s
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Successful in 25s
CI / test-python-dsms-gateway (push) Successful in 23s
CI / validate-canonical-controls (push) Successful in 15s
Build + Deploy / trigger-orca (push) Successful in 2m28s
Each check now has a "hint" field explaining what is missing and
what the customer should do to fix it. Hints are shown in the
frontend below failed checks in red text.

Examples:
- "Bei Verarbeitung auf Basis von Art. 6(1)(f) muss dokumentiert
  werden, warum Ihr berechtigtes Interesse die Rechte der
  Betroffenen ueberwiegt."
- "Die ladungsfaehige Anschrift fehlt. Erforderlich: Strasse,
  Hausnummer, PLZ und Ort."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 14:05:01 +02:00
Benjamin Admin 870953f579 fix: PLZ regex matches lowercase text and D-78467 format
Patterns ran on text.lower() but searched [A-Z] — changed to [a-z].
Also accept D-12345 prefix (common German format).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 13:28:00 +02:00
Benjamin Admin b363c28539 feat: Add 76 Level-2 regex checks for document correctness verification
Split dsi_document_checker.py (466 LOC) into doc_checks/ package (9 files).
Two-pass L1→L2 logic: L1 checks "Is it mentioned?", L2 checks "Is it correct?"
(e.g. controller has full address, specific Art. 6 lit., concrete time periods).

138 total checks (62 L1 + 76 L2) across 7 doc types:
- DSE Art. 13: 31, Impressum §5 TMG: 16, Cookie §25 TDDDG: 15
- Widerruf §355: 15, AGB §305ff: 21, Social Media Art. 26: 20, DSFA Art. 35: 18

Frontend: hierarchical L1→L2 display with dual progress bars
(green=completeness, blue=correctness).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 12:37:03 +02:00
Benjamin Admin 3c12e06faf feat: Fix DSFA dedup + expand all checklists to 56 total checks
Fixes:
- 'Risikoabwaegung' is sub-section of DSFA → added to SKIP_HEADINGS
- 'Social Media' standalone heading → recognized as social_media DSE
- Removed 'risikobew' from DSFA pattern (was too broad)

Expanded checklists:
- Widerruf: 4→7 checks (+Empfaenger, kein Grund, §312k Button)
- AGB: 4→9 checks (+Zahlung, Lieferung, Gewaehrleistung, Kuendigung, Datenschutz)
- Social Media: +1 (Social Bookmarks)
- DSFA: +1 (LFDI Richtlinie)

Total: 47→56 Regex-Checks across 7 document types:
DSI=9, Cookie=5, Social Media=10, DSFA=8, Impressum=6, Widerruf=7, AGB=9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 11:55:29 +02:00
Benjamin Admin 4642abba23 feat: Expand Social Media (10 checks) + DSFA (8 checks) checklists
Art. 26 Joint Controller (10 checks, was 7):
+ Auflistung der genutzten Plattformen
+ Rechtsgrundlage (Art. 6)
+ Social Bookmarks vs. Plugins Hinweis
Improved: broader patterns for joint parties, contact point, data types

DSFA Art. 35 (8 checks, was 5):
+ Schwellwertanalyse / Auslösepruefung
+ Beruecksichtigung Landesbehörden-Richtlinie (LFDI)
+ Dokumentation der Ergebnisse
Improved: IHK-specific patterns (Kanäle, systematische Beobachtung,
geringer Umfang, sensitive Daten)

Total: 40 → 47 Regex-Checks across all document types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 11:17:16 +02:00
Benjamin Admin 3853a0838a feat: Art. 26 Joint Controller + DSFA checklists for Social Media sections
New checklists:
- JOINT_CONTROLLER_CHECKLIST (Art. 26 DSGVO, 7 checks):
  Joint parties, arrangement, contact point, processing split,
  data categories, third-country transfer (USA), rights
- DSFA_CHECKLIST (Art. 35 DSGVO, 5 checks):
  Description, necessity, risk assessment, measures, DSB involvement

Section detection: 'Datenschutzerklaerung fuer Social Media' → social_media,
'Datenschutzfolgeabschaetzung/Risikoanalyse' → dsfa

classify_document_type: DSFA and social_media detected before generic DSE

Frontend: DOC_TYPES dropdown + ChecklistView labels updated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 10:49:32 +02:00
Benjamin Admin 45446aef16 fix: 8 quality + UX improvements
1. Cookie 'Zwecke' false positive: added 'um...zu', 'dienen', 'helfen',
   'ermöglichen' patterns — catches purpose descriptions without 'Zweck'
2. Kurzhinweis: added empty all_checks for short documents (<200 words)
3. Bezeichnungsfeld: placeholder shows 'Version / Stand' for typed docs,
   'Dokumentname' for 'Sonstiges'
4. DocCheckTab state persistence: entries + results survive navigation
5. DocCheck history: saves each check with date, doc count, findings
6. History display: 'Letzte Pruefungen' section at bottom of tab
7. ChecklistView: shows 'X von Y Pruefpunkten bestanden' per document
8. Results persist in localStorage across page navigation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 09:37:47 +02:00
Benjamin Admin a680276c86 fix: Filter controls by test_procedure content — eliminates governance false positives
Only use controls whose test_procedure mentions document-type-specific terms:
- DSI: test_procedure must contain 'datenschutzerkl' or 'art. 13/14'
- Cookie: must contain 'cookie', 'einwilligung', 'consent'
- Impressum: must contain 'impressum'

This filters out internal governance controls (Datenmodelle, Infrastruktur)
that are irrelevant for public document checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 20:42:35 +02:00
Benjamin Admin fa45b5793c feat: Control Library check via SQL (canonical_controls) instead of Qdrant
Complete rewrite of rag_document_checker.py:
- Queries canonical_controls table (294K controls, 10K data_protection)
- Filters by category + title keywords per document type
- Uses test_procedure field as actual check instructions
- Regex pre-check extracts key terms from procedure → fast match
- LLM fallback only for regex misses (saves tokens)
- /no_think prefix for direct JSON output

SQL approach advantages:
- Structured data with test_procedure, pass_criteria, fail_criteria
- Category filtering (data_protection, compliance, governance)
- No Qdrant API key issues
- Controls are actual check criteria, not general legal texts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 20:26:56 +02:00
Benjamin Admin 6da36d87c2 fix: Robust JSON parsing for LLM responses — handles unquoted keys, fallback extraction
LLM returns {fulfilled: true} instead of {"fulfilled": true}.
Now fixes unquoted keys, True→true, and falls back to text-based
boolean extraction when JSON parsing fails entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 15:18:52 +02:00
Benjamin Admin e50c4d659e fix: Disable Qwen thinking mode for RAG checks (/no_think prefix)
Qwen 3.5 uses all tokens for thinking, leaving response empty.
Using /no_think prefix to get direct JSON output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 15:12:51 +02:00
Benjamin Admin 9f16e6d535 fix: Read Qwen response from 'thinking' field when 'response' is empty
Qwen 3.5 with latest Ollama returns structured thinking in separate
'thinking' field, leaving 'response' empty. Now checks both fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 15:07:09 +02:00
Benjamin Admin f4374cfe8d feat: Semantic Qdrant search — embed query via bge-m3, vector search in local Qdrant
Replaces scroll+filter approach with proper semantic search:
1. Embed query via bp-core-embedding-service (bge-m3, 1024 dim)
2. Vector search in Qdrant (bp_compliance_datenschutz + bp_compliance_gesetze)
3. Sort by cosine similarity score
4. No API key needed — local Qdrant on Mac Mini

Falls back gracefully: SDK first, then semantic Qdrant, then empty.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 14:46:06 +02:00
Benjamin Admin 7b8440191e fix: Better error logging + increase LLM timeout to 120s for RAG check
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 14:33:58 +02:00
Benjamin Admin 510f513811 fix: Qdrant search uses chunk_text + section/category filter
Payload structure: chunk_text (not text), section (Article 13),
category, regulation_id. Scrolls 100 points per collection,
filters client-side against regulation keywords.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 14:28:32 +02:00
Benjamin Admin b50c4ec940 fix: RAG checker falls back to local Qdrant when Go SDK returns 401
Go SDK points to external Qdrant (qdrant-dev.breakpilot.ai) with expired API key.
Fallback: search directly in local Qdrant (bp-core-qdrant:6333) which has
all collections: bp_compliance_datenschutz, bp_compliance_gesetze, atomic_controls_dedup.

Search strategy:
1. Try Go SDK RAG endpoint (preferred, has embedding-based search)
2. Fallback: Qdrant scroll with text-based regulation filter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 14:23:52 +02:00
Benjamin Admin 090da0f71b feat: RAG-based document verification against 144K Control Library
New module: rag_document_checker.py
- Searches RAG (Qdrant) for controls relevant to document type
- Filters by regulation (DSGVO Art.13, TDDDG §25, BGB §355 etc.)
- LLM (Qwen 3.5:35b) verifies each control against document text
- Returns fulfilled/missing with evidence text + severity
- Supports: DSI, Cookie, Impressum, Widerruf, AGB, DSFA, AVV, Loeschkonzept

Integration in doc-check endpoint:
- Regex checklist runs first (fast, deterministic)
- RAG checks run after (semantic, catches what regex misses)
- Both results combined in single response

LLM prompt returns JSON: {fulfilled, evidence, issue, severity}
Think-tags stripped, JSON extracted from response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 13:19:15 +02:00
Benjamin Admin 4c68caac4e feat: Multi-URL Document Check with full checklist visibility
New "Dokumenten-Pruefung" tab in Compliance Agent:
- User adds multiple URLs with document type (DSI, AGB, Impressum, Cookie, Widerruf)
- Each document loaded via Playwright, accordions expanded, text extracted
- Checked against type-specific legal checklist
- Optional: Cookie banner check via checkbox

Checklisten-UX (solves "100% looks like nothing was checked"):
- All checks shown per document: green checkmark + matched text excerpt
- Red X for missing fields with legal reference
- Builds user trust: "9 Punkte geprueft, alle bestanden"
- Expandable per document with completeness bar

New checklists:
- Impressum: §5 TMG (6 fields: name, address, contact, register, VAT, representative)
- Cookie-Richtlinie: §25 TDDDG (5 fields: types, purposes, retention, third-party, opt-out)

Backend:
- POST /agent/doc-check — async with polling (same pattern as /scan)
- DocCheckResult includes checks[] with passed/failed + matched_text
- dsi_document_checker returns all_checks in SCORE finding
- Email report shows per-document checklist

Files: agent_doc_check_routes.py (280 LOC), DocCheckTab.tsx (248 LOC),
ChecklistView.tsx (130 LOC), dsi_document_checker.py (+70 LOC)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 10:08:40 +02:00
Benjamin Admin 8fb2061e9b fix: Eliminate GA false positive + handle short DSI documents
Service detection:
- Only search script tags + src/href attributes for service patterns
- Prevents false positives from DSE text mentioning services
  (e.g. IHK DSE describes etracker, 'google analytics' in text)
- Technical patterns (with regex chars) still checked in full HTML

Short documents:
- Documents with < 200 words flagged as 'Kurzhinweis' instead of
  'MANGELHAFT' — too short for Art. 13 completeness check
- Prevents 96-word navigation pages from showing 8 missing fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 18:21:37 +02:00
Benjamin Admin 8d6959e8b2 fix: Expand Art. 13 patterns for generic matching across all websites
Complaint (Art. 13(2)(d)):
+ 'recht auf beschwerde', 'art. 77', 'beschwerde...wenden/einlegen',
  'zuständige behörde' — IHK uses 'Recht auf Beschwerde gem. Art. 77'

Legal basis (Art. 13(1)(c)):
+ 'gemäß Art.', '§ X IHKG/BDSG/LDSG/BBiG/TDDDG', 'einwilligung gem',
  'verarbeitung auf grundlage' — catches statutory references

Third country (Art. 13(1)(f)):
+ 'Übermittlung ausserhalb', 'EWR/EEA', 'Data Privacy Framework'

Retention (Art. 13(2)(a)):
+ 'Dauer der Speicherung', 'Aufbewahrungsdauer/-pflicht/-zeit',
  'gesetzliche Aufbewahrung' — common German DSE headings

All patterns are generic, not IHK-specific.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 17:45:02 +02:00
Benjamin Admin e3ae35891f fix: 0% completeness bug — SCORE finding was not generated at 100%
Root cause: When all 9 Art. 13 checks passed (100%), no SCORE finding
was created (line: 'if pct < 100'). The backend then defaulted to
completeness=0 because it looked for the SCORE finding to extract the %.

Fix: Always generate SCORE finding, even at 100%. Added 'OK' severity
for fully compliant documents.

This was the cause of 8 documents showing '0% MANGELHAFT' despite
containing all required information.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 15:34:04 +02:00
Benjamin Admin 6c5e086356 fix: DSI dedup — skip anchor links, filter noise, merge duplicates + fix false positives
Dedup fixes:
- Anchor links (#cookies, #betroffenenrechte) on same page are skipped entirely
- Noise titles filtered: 'drucken', 'nach oben', 'Datenschutz' (too generic)
- Documents with < 50 words filtered (navigation snippets)
- Documents with identical word_count merged (same page, different title)
- URL-only titles filtered

False positive fixes (dsi_document_checker.py):
- 'Kontaktdaten des Verantwortlichen' pattern for controller check
- 'Zweck und Rechtsgrundlage' combined heading pattern
- 'Welche Daten werden verarbeitet' question-style headings
- 'Betroffenenrechte' as standalone heading
- 'Welche Rechte hat der Betroffene' question pattern
- 'Daten werden geloescht' retention pattern
- 'Auftragsverarbeiter' as recipient indicator

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 11:41:07 +02:00
Benjamin Admin f967480cd9 fix: Add missing service_registry.py to main
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 23:34:00 +02:00
Benjamin Admin a18ef16378 fix: Add missing service modules required by agent_scan_routes
These files existed on the feature branch but were never cherry-picked
to main, causing ModuleNotFoundError on import:
- dse_parser.py — parses DSE HTML into structured sections
- dse_matcher.py — matches detected services against DSE sections
- mandatory_content_checker.py — checks Art. 13 DSGVO mandatory fields
- legal_basis_validator.py — validates legal basis (lit. a-f)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 23:22:30 +02:00
Benjamin Admin 48146cddaf feat: DSI document discovery + completeness check in agent scan workflow
Agent scan now automatically:
1. Discovers all legal documents via consent-tester /dsi-discovery endpoint
2. Classifies each as DSE/AGB/Widerruf/Cookie/Impressum
3. Checks completeness against type-specific checklists:
   - DSE: 9 Art. 13 DSGVO mandatory fields (controller, DPO, purposes,
     legal basis, recipients, third-country, retention, rights, complaint)
   - AGB: §305ff BGB (scope, contract formation, liability, jurisdiction)
   - Widerruf: §355 BGB (right info, 14-day deadline, form, consequences)
4. Adds findings per document to scan results
5. Shows discovered documents with completeness % in email summary
6. Returns discovered_documents list in API response

New files:
- dsi_document_checker.py (229 LOC) — checklists + classifier
- agent_scan_helpers.py (109 LOC) — extracted summary builder + corrections

Refactor: agent_scan_routes.py 537→448 LOC (under 500 budget)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:10:13 +02:00
Benjamin Admin 53f6f30cf0 feat: DSI document discovery + completeness check in agent scan workflow
Agent scan now automatically:
1. Discovers all legal documents via consent-tester /dsi-discovery endpoint
2. Classifies each as DSE/AGB/Widerruf/Cookie/Impressum
3. Checks completeness against type-specific checklists:
   - DSE: 9 Art. 13 DSGVO mandatory fields (controller, DPO, purposes,
     legal basis, recipients, third-country, retention, rights, complaint)
   - AGB: §305ff BGB (scope, contract formation, liability, jurisdiction)
   - Widerruf: §355 BGB (right info, 14-day deadline, form, consequences)
4. Adds findings per document to scan results
5. Shows discovered documents with completeness % in email summary
6. Returns discovered_documents list in API response

New files:
- dsi_document_checker.py (229 LOC) — checklists + classifier
- agent_scan_helpers.py (109 LOC) — extracted summary builder + corrections

Refactor: agent_scan_routes.py 537→448 LOC (under 500 budget)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:09:45 +02:00
Benjamin Admin d3c8811fdb feat: IAB TCF 2.2 — TC String encoder + purpose mapping + UI
- TCFEncoderService: generates base64url-encoded TC Strings per IAB spec
  with 12 purposes, vendor consent bitfield, CMP metadata
- Category-to-purpose mapping (necessary→none, statistics→1,7,8,9,10,
  marketing→1,2,3,4,5,6,7,12, functional→1,11)
- tcf_routes: 5 endpoints (purposes, features, mapping, encode, encode-categories)
- banner_consent_service: auto-generates TC String when tcf_enabled=true
- TCFSettings.tsx: enable/disable toggle, purpose grid with category mapping,
  TC String test generator, CMP registration info
- New "TCF/IAB" tab in cookie-banner page (7 tabs total)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 07:01:37 +02:00
Benjamin Admin eb4ea8bc42 feat: EmailDeliveryService + professional DSR email templates
- EmailDeliveryService: load template → find published version →
  render {{variables}} → send via SMTP → audit log. Fallback to
  inline HTML when no published template exists.
- Migration 117: Professional HTML/text content for all 5 DSR
  templates (receipt, completion, rejection, identity, extension)
  with branded styling and proper Art. references
- DSRArt11Service now uses EmailDeliveryService with dsr_rejection
  template instead of hardcoded HTML

[migration-approved]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 23:38:32 +02:00
Benjamin Admin 060f351da7 feat: Art. 11 DSGVO — reject DSR when data subject not identifiable
- New DSRArt11Service: handles rejection with proper legal basis,
  automated email notification to requester explaining Art. 11
- POST /dsr/{id}/reject-art11 endpoint
- ActionButtons.tsx: "Nicht identifizierbar (Art. 11)" button
  shown when identity is not yet verified
- Also fixes: DSR export type-cast rollback handling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 23:30:18 +02:00
Benjamin Admin c55d0ab12a fix: DSR export type-cast bug + session rollback on partial failures
- tenant_id kept as string (PostgreSQL handles UUID cast)
- Einwilligungen query uses CAST(:tid AS VARCHAR) for compatibility
- Each data source query wrapped with rollback on failure to prevent
  cascading "transaction aborted" errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 23:15:25 +02:00
Benjamin Admin 02468c94c0 feat: DSR User Data Export — Art. 15 PDF + Art. 20 JSON/CSV
- DSRExportService: aggregates all CMP data about a user from
  Banner Consents, Einwilligungen, Audit Trail, DSR History
- GET /dsr/{id}/export-user-data?format=json|csv|pdf endpoint
- PDF: A4 reportlab with 4 sections (Consents, Einwilligungen,
  Audit-Trail, DSR-Anfragen) + cover page
- CSV: BOM-encoded for Excel with flattened data rows
- JSON: structured export with all data categories
- ActionButtons.tsx: PDF/JSON/CSV export buttons now functional

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 22:42:03 +02:00
Benjamin Admin 630fffc0cc feat: Academy integration — training gap detection after document approval (F7)
- Migration 115: compliance_role_training_mapping table (org roles → training codes)
- TrainingLinkService: queries training_modules/matrix/assignments to find gaps
  per person and role. Gracefully degrades when Go training tables don't exist yet.
- document_review_routes: 2 new endpoints (training-requirements, training-gaps)
- _notify_approval() now checks training gaps and sends emails to persons
  with outstanding modules, linking to /sdk/training/learner

[migration-approved]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 22:03:25 +02:00
Benjamin Admin 965af3a34c feat: A/B Testing + Compliance Report PDF (F5 + F8)
F5: A/B Testing for Consent Rate
- Migration 116: banner_variants table + variant tracking in audit log
- BannerABService: deterministic sticky bucketing via device hash,
  chi-squared significance testing, variant CRUD
- banner_ab_routes: 6 endpoints (CRUD + stats + assign)
- ABTestPanel.tsx: variant creation, traffic sliders, opt-in comparison
  chart with winner/significance badges
- New "A/B-Test" tab in cookie-banner page

F8: Compliance Report PDF
- CompliancePDFGenerator: reportlab-based A4 PDF covering all modules
  (Company Profile, TOM, VVT, DSFA, Risks, Vendors, Incidents,
  Reviews, Consents, Roles)
- compliance_report_routes: GET /compliance/report/pdf
- "Compliance-Report herunterladen" button on SDK dashboard

[migration-approved]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 21:42:50 +02:00
Benjamin Admin c3fcfe88ee feat: Vendor-level consent + Consent analytics (F4 + F6)
F4: Granular Vendor-Level Consent
- Migration 113: vendor_consents JSONB on banner_consents + audit_log
- ConsentCreate schema + BannerConsentDB model extended
- banner_consent_service stores vendor_consents alongside categories
- Audit trail includes vendor-level decisions + user_agent

F6: Consent Rate Analytics
- Migration 114: user_agent on audit_log + time-series index
- BannerAnalyticsService: time series, category breakdown, device stats
- banner_analytics_routes: 4 endpoints (overview, time-series, categories, devices)
- AnalyticsDashboard.tsx: KPIs, bar chart, category bars, device breakdown
- New "Analytik" tab in cookie-banner page

[migration-approved]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-03 20:58:06 +02:00
Benjamin Admin fe6764df9a fix: ensure JSONB array fields are always arrays in control API
Backend: _ensure_list() converts null/string/malformed JSONB to []
for requirements, test_procedure, evidence, open_anchors, tags.

Frontend: defensive Array.isArray() check on ControlDetail.tsx.

Fixes: TypeError: A.requirements.map is not a function

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 21:18:10 +02:00
Benjamin Admin 29f9a8fea3 feat: Cookie banner vendors per category + {{COOKIE_TABLE}} generator
- CookieBannerOverlay: shows vendors per category with expandable tables
  (Verarbeiter, Cookies, Dauer, Land) for full transparency
- Demo vendors: 4 necessary, 3 statistics, 3 marketing, 3 functional
- cookie_table_generator.py: renders {{COOKIE_TABLE}} Markdown tables
  from vendor configs (DB) or service registry (fallback)
- SERVICE_COOKIES: 16 known vendor-to-cookie mappings with provider + country

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 20:06:57 +02:00
Benjamin Admin db697924ed feat: Cookie banner vendors per category + {{COOKIE_TABLE}} generator
- CookieBannerOverlay: shows vendors per category with expandable tables
  (Verarbeiter, Cookies, Dauer, Land) for full transparency
- Demo vendors: 4 necessary, 3 statistics, 3 marketing, 3 functional
- cookie_table_generator.py: renders {{COOKIE_TABLE}} Markdown tables
  from vendor configs (DB) or service registry (fallback)
- SERVICE_COOKIES: 16 known vendor-to-cookie mappings with provider + country

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 20:06:44 +02:00
Benjamin Admin a1f5d883cc feat: Cookie-Banner ↔ Backend Integration (DSR, Retention, Consent Proof)
Phase 1: Vendor sync from service registry (82+ services → banner vendors)
Phase 2: Category-based retention (marketing=90d, statistics=790d, not hardcoded 365d)
Phase 3: DSR ↔ Banner email linking (link-email, by-email, Art.17 erasure, Art.15/20 export)
Phase 4: Consent sync (Banner → Einwilligungen bridge)
Phase 6: Consent proof (SHA256 config hash + config_version in audit log, Art. 7(1) DSGVO)

New files:
- banner_dsr_service.py — email linking + DSR integration
- vendor_banner_sync.py — service registry → vendor configs
- migration 106 — linked_email, banner_config_hash, consent_version columns

Tests: 20+ new backend tests + 2 Playwright E2E test suites (API + UI)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 19:52:04 +02:00
Benjamin Admin 17c67b4f25 feat: Cookie-Banner ↔ Backend Integration (DSR, Retention, Consent Proof)
Phase 1: Vendor sync from service registry (82+ services → banner vendors)
Phase 2: Category-based retention (marketing=90d, statistics=790d, not hardcoded 365d)
Phase 3: DSR ↔ Banner email linking (link-email, by-email, Art.17 erasure, Art.15/20 export)
Phase 4: Consent sync (Banner → Einwilligungen bridge)
Phase 6: Consent proof (SHA256 config hash + config_version in audit log, Art. 7(1) DSGVO)

New files:
- banner_dsr_service.py — email linking + DSR integration
- vendor_banner_sync.py — service registry → vendor configs
- migration 106 — linked_email, banner_config_hash, consent_version columns

Tests: 20+ new backend tests + 2 Playwright E2E test suites (API + UI)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 19:41:22 +02:00
Benjamin Admin c5b22e0c99 fix: derive intake flags from DETECTED SERVICES, not from text content
Fundamental architecture fix: data processing happens through APIs/scripts/
cookies — NOT through visible page text. A news site about healthcare does
NOT process health data.

Before: Qwen reads website text → guesses "health_data: true" (WRONG)
After: Google Analytics detected → tracking: true (CORRECT, deterministic)

New flow: detect services from HTML → map service categories to flags →
feed flags into UCCA assessment. No LLM needed for flag extraction.

SERVICE_TO_FLAGS maps categories: tracking→tracking, marketing→marketing+
third_party_sharing, payment→payment_data, heatmap→profiling, etc.
SPECIFIC_SERVICE_FLAGS for Klarna (Art.22), Stripe (US transfer), etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 08:37:51 +02:00
Benjamin Admin 0f3ec9061e fix: false positive findings + restore docs-src + §312k ecommerce filter
1. Intake prompt: "BETREIBER verarbeitet" statt "Text erwaehnt".
   IHK berichtet ueber Gesundheitsdaten → false. Vorher: true.
2. §312k Check: nur bei E-Commerce/Abo-Websites (Warenkorb, Shop, PayPal etc.)
   IHK hat keine Vertraege → kein Kuendigungsbutton noetig.
3. docs-src/ restored from commit 9824304

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 08:26:59 +02:00
Benjamin Admin e318215cc5 refactor: split agent_analyze_routes (420→309 LOC) + agent docs + migration
- Extracted website compliance checks + helpers to website_compliance_checks.py
- Created agent documentation (zeroclaw/docs/compliance-agent.md)
- DB migration 086 executed (compliance_agent_scans table)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 08:22:52 +02:00
Benjamin Admin 58957a4aaa fix: Playwright user permission + etracker DSE matching + CMP skip
1. Dockerfile: install Playwright AS appuser (not root) so chromium
   binary is accessible at runtime. Was causing 500 error.
2. DSE service matching: text-search fallback when LLM extraction fails.
   If "etracker" appears in DSE text, mark as documented even without
   LLM parsing the service list.
3. CMP skip: consent managers in category "cmp" skipped (not just "other"
   with id "cmp").

NOT DEPLOYED — RAG pipeline is running.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 19:36:46 +02:00
Benjamin Admin 5eeef3a9c3 fix: 4 bugs from IHK scan — false positives + missing etracker
1. GA regex: G-\w{5,} matched CSS classes (g-7031048). Now requires
   G-[A-Z0-9]{8,12} (uppercase after G-, 8-12 chars = real GA4 ID)
2. External page scanning: DSE-internal links now SAME DOMAIN only.
   Previously followed links to etracker.com, google.de/policies etc.
   and detected services on THOSE sites as IHK services.
3. Added etracker to service registry (DE, ePrivacy-certified)
4. CSS/JS/image files excluded from page scanning
5. Navigation-pattern links for deeper DSE sub-pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 19:08:07 +02:00
Benjamin Admin fff47cc52e fix: 4 bugs from IHK Konstanz scan validation
1. DSE-Matcher: Google/YouTube false match — now requires 2+ word match
   for provider-name fallback, not just "Google" matching YouTube section
2. AGB/Widerrufsbelehrung: only_ecommerce flag — skips for non-shop
   websites (detected via payment providers, cart keywords)
3. DSE-internal link following — scanner now discovers links WITHIN the
   privacy policy and scans those too (finds regional DSE sub-pages)
4. Expanded keyword synonyms for DSE mandatory checks:
   - "Zweck und Rechtsgrundlage" now matches "zwecke"
   - "behoerdlichen datenschutzbeauftragt" matches DSB
   - "aufsichtsbehörde" with umlaut matches

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 17:57:19 +02:00
Benjamin Admin 2c9cea74e3 docs: instruction for hardcoded knowledge → Control Library migration
6 files with hardcoded legal knowledge identified. Review deadline 2026-07-01.
legal_basis_validator.py marked with warning log on every use.
Instruction file for other session to execute migration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 16:33:48 +02:00
Benjamin Admin 4bf92f42b8 feat: Phase 9 — Authenticated Testing + Legal Basis Validator (lit. mapping)
Phase 9: Playwright login + 5 post-login checks:
- §312k BGB: Kündigungsbutton (2 Klicks)
- Art. 17 DSGVO: Konto löschen
- Art. 20 DSGVO: Daten exportieren
- Art. 7(3): Einwilligungen widerrufen
- Art. 15: Profildaten einsehen
Auto-detects login form selectors. Credentials destroyed after test.

Legal Basis Validator: Checks 7 common lit-mapping mistakes:
- Cookie tracking on lit. f instead of lit. a (Planet49)
- Analytics on lit. b (contract overextension)
- Klarna without Art. 22 reference
- Session recording without consent
Integrated into website scan pipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 16:08:41 +02:00
Benjamin Admin 8336c01c5c feat: Phase 6-8 — PDF export, recurring scans, multi-website compare
Phase 6: PDF export via WeasyPrint — POST /agent/scans/pdf generates
printable compliance report with findings table, service comparison,
risk badge, and legal disclaimer.

Phase 7: Recurring scans — POST /agent/monitored-urls to add URLs,
POST /agent/run-scheduled triggers all enabled scans (cron/ZeroClaw).
In-memory storage with DB upgrade path.

Phase 8: Multi-website compare — POST /agent/compare with 2-5 URLs,
parallel scanning, comparison table (risk, findings, services, compliance
features per site).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 15:27:51 +02:00
Benjamin Admin 53774886e7 perf: Phase 4 — parallel page fetching (asyncio.gather)
Scan pages in parallel instead of sequential. Reduces scan time
from ~10s (5 pages × 2s) to ~3s (all pages at once).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 15:09:03 +02:00
Benjamin Admin 5c5054f740 feat: Phase 3 — registry 82 services, mandatory checker, SDK flow step
- website_scanner.py: imports from master service_registry.py (82 services)
- agent_scan_routes.py: mandatory content checks (documents + DSE sections)
- steps-betrieb.ts: Compliance Agent step added to SDK Flow (seq 5000)
- PLAN: Phase 9 (Authenticated Testing) added to product roadmap

Mandatory checks know what MUST be there:
- Documents: Impressum, DSE, AGB, Widerrufsbelehrung
- DSE content: 9 Art. 13 DSGVO fields (DSB, Speicherdauer, etc.)
- Impressum content: 5 §5 TMG fields (GF, HRB, USt-ID, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 15:04:44 +02:00