breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	a3287cd5e6	feat: HTML email report with hints + fix duplicate Social Media sections Build + Deploy / build-admin-compliance (push) Successful in 1m45s Details Build + Deploy / build-backend-compliance (push) Successful in 9s Details Build + Deploy / build-ai-sdk (push) Successful in 36s Details Build + Deploy / build-developer-portal (push) Successful in 7s Details Build + Deploy / build-tts (push) Successful in 7s Details Build + Deploy / build-document-crawler (push) Successful in 8s Details Build + Deploy / build-dsms-gateway (push) Successful in 7s Details Build + Deploy / build-dsms-node (push) Successful in 8s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 15s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m47s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Failing after 44s Details CI / test-python-backend (push) Successful in 41s Details CI / test-python-document-crawler (push) Successful in 26s Details CI / test-python-dsms-gateway (push) Successful in 22s Details CI / validate-canonical-controls (push) Successful in 15s Details Build + Deploy / trigger-orca (push) Successful in 2m23s Details 1. Email report now renders as styled HTML (matching frontend design): - Progress bars (green=completeness, blue=correctness) - Hierarchical L1→L2 check display - Red hint boxes under failed checks explaining what to fix - Matched text evidence for passed checks 2. Section splitter deduplicates: two "Social Media" headings on the same page are merged into one section instead of creating duplicates. 3. Extracted report builder to agent_doc_check_report.py (175 LOC) to keep routes file under 500 LOC (386 LOC). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 15:13:00 +02:00
Benjamin Admin	fa4fd87102	fix: 7 regex bugs from IHK Konstanz ground truth analysis Build + Deploy / build-admin-compliance (push) Successful in 9s Details Build + Deploy / build-backend-compliance (push) Successful in 8s Details Build + Deploy / build-ai-sdk (push) Successful in 42s Details Build + Deploy / build-developer-portal (push) Successful in 8s Details Build + Deploy / build-tts (push) Successful in 7s Details Build + Deploy / build-document-crawler (push) Successful in 7s Details Build + Deploy / build-dsms-gateway (push) Successful in 8s Details Build + Deploy / build-dsms-node (push) Successful in 8s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 18s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m57s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Failing after 49s Details CI / test-python-backend (push) Successful in 42s Details CI / test-python-document-crawler (push) Successful in 28s Details CI / test-python-dsms-gateway (push) Successful in 23s Details CI / validate-canonical-controls (push) Successful in 15s Details Build + Deploy / trigger-orca (push) Successful in 2m24s Details Fixes based on manual verification of all 30 failed checks: 1. Cookie table: recognize "folgende cookies" + column headers as text 2. Cookie names: add JSESSIONID, cookieinfo, et_id, BT_* patterns 3. Essential justified: match "sitzung zuordnen", "betrieb der website" 4. Social bookmarks: recognize as 2-click alternative 5. DSFA plural: "kanaelen" now matches alongside "kanal" 6. Section splitter: skip-headings no longer lose subsequent text (Risikoabwaegung section was cut from DSFA, losing risk scores) 7. Cookie legal basis: accept Art. 6(1)(f) in cookie context Reduces false positives from 7 to ~1-2 for IHK Konstanz test case. Ground truth table: zeroclaw/docs/ground-truth-ihk-konstanz.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 14:51:09 +02:00
Benjamin Admin	293c58d0dd	feat: Add actionable hints to all 138 compliance checks Build + Deploy / build-admin-compliance (push) Successful in 1m40s Details Build + Deploy / build-backend-compliance (push) Successful in 7s Details Build + Deploy / build-ai-sdk (push) Successful in 35s Details Build + Deploy / build-developer-portal (push) Successful in 8s Details Build + Deploy / build-tts (push) Successful in 7s Details Build + Deploy / build-document-crawler (push) Successful in 8s Details Build + Deploy / build-dsms-gateway (push) Successful in 7s Details Build + Deploy / build-dsms-node (push) Successful in 8s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 16s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m50s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Failing after 40s Details CI / test-python-backend (push) Successful in 37s Details CI / test-python-document-crawler (push) Successful in 25s Details CI / test-python-dsms-gateway (push) Successful in 23s Details CI / validate-canonical-controls (push) Successful in 15s Details Build + Deploy / trigger-orca (push) Successful in 2m28s Details Each check now has a "hint" field explaining what is missing and what the customer should do to fix it. Hints are shown in the frontend below failed checks in red text. Examples: - "Bei Verarbeitung auf Basis von Art. 6(1)(f) muss dokumentiert werden, warum Ihr berechtigtes Interesse die Rechte der Betroffenen ueberwiegt." - "Die ladungsfaehige Anschrift fehlt. Erforderlich: Strasse, Hausnummer, PLZ und Ort." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 14:05:01 +02:00
Benjamin Admin	b363c28539	feat: Add 76 Level-2 regex checks for document correctness verification Split dsi_document_checker.py (466 LOC) into doc_checks/ package (9 files). Two-pass L1→L2 logic: L1 checks "Is it mentioned?", L2 checks "Is it correct?" (e.g. controller has full address, specific Art. 6 lit., concrete time periods). 138 total checks (62 L1 + 76 L2) across 7 doc types: - DSE Art. 13: 31, Impressum §5 TMG: 16, Cookie §25 TDDDG: 15 - Widerruf §355: 15, AGB §305ff: 21, Social Media Art. 26: 20, DSFA Art. 35: 18 Frontend: hierarchical L1→L2 display with dual progress bars (green=completeness, blue=correctness). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 12:37:03 +02:00
Benjamin Admin	3c12e06faf	feat: Fix DSFA dedup + expand all checklists to 56 total checks Fixes: - 'Risikoabwaegung' is sub-section of DSFA → added to SKIP_HEADINGS - 'Social Media' standalone heading → recognized as social_media DSE - Removed 'risikobew' from DSFA pattern (was too broad) Expanded checklists: - Widerruf: 4→7 checks (+Empfaenger, kein Grund, §312k Button) - AGB: 4→9 checks (+Zahlung, Lieferung, Gewaehrleistung, Kuendigung, Datenschutz) - Social Media: +1 (Social Bookmarks) - DSFA: +1 (LFDI Richtlinie) Total: 47→56 Regex-Checks across 7 document types: DSI=9, Cookie=5, Social Media=10, DSFA=8, Impressum=6, Widerruf=7, AGB=9 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 11:55:29 +02:00
Benjamin Admin	58234ac18b	fix: DSFA must be matched before social_media in SECTION_TYPE_MAP 'Datenschutzfolgeabschätzung...Social Media' was matching as social_media (Art. 26) instead of dsfa (Art. 35) because the social_media pattern 'datenschutz.*social media' matched first. Fixed: DSFA patterns checked before social_media patterns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 11:35:10 +02:00
Benjamin Admin	3853a0838a	feat: Art. 26 Joint Controller + DSFA checklists for Social Media sections New checklists: - JOINT_CONTROLLER_CHECKLIST (Art. 26 DSGVO, 7 checks): Joint parties, arrangement, contact point, processing split, data categories, third-country transfer (USA), rights - DSFA_CHECKLIST (Art. 35 DSGVO, 5 checks): Description, necessity, risk assessment, measures, DSB involvement Section detection: 'Datenschutzerklaerung fuer Social Media' → social_media, 'Datenschutzfolgeabschaetzung/Risikoanalyse' → dsfa classify_document_type: DSFA and social_media detected before generic DSE Frontend: DOC_TYPES dropdown + ChecklistView labels updated Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 10:49:32 +02:00
Benjamin Admin	5188411828	disable: Control Library checks until doc-check Master Controls are ready 8 false positives from generic canonical_controls. Regex checks (9+5) are accurate. Re-enable when ~80 specific doc-check controls exist. See INSTRUCTION-master-controls-for-doc-check.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 10:28:25 +02:00
Benjamin Admin	fa45b5793c	feat: Control Library check via SQL (canonical_controls) instead of Qdrant Complete rewrite of rag_document_checker.py: - Queries canonical_controls table (294K controls, 10K data_protection) - Filters by category + title keywords per document type - Uses test_procedure field as actual check instructions - Regex pre-check extracts key terms from procedure → fast match - LLM fallback only for regex misses (saves tokens) - /no_think prefix for direct JSON output SQL approach advantages: - Structured data with test_procedure, pass_criteria, fail_criteria - Category filtering (data_protection, compliance, governance) - No Qdrant API key issues - Controls are actual check criteria, not general legal texts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 20:26:56 +02:00
Benjamin Admin	7e7f31c344	disable: RAG checks until Master Controls (G1 Decision Trace) are ready Current 144K controls are general legal texts, not specific check criteria. RAG integration code stays (rag_document_checker.py), just disabled in the doc-check endpoint. Re-enable when G1-G4 block is complete and 25K Master Controls with Decision Trace are available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 17:11:58 +02:00
Benjamin Admin	1ff34227bf	debug: Add logging to RAG check integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 14:57:30 +02:00
Benjamin Admin	090da0f71b	feat: RAG-based document verification against 144K Control Library New module: rag_document_checker.py - Searches RAG (Qdrant) for controls relevant to document type - Filters by regulation (DSGVO Art.13, TDDDG §25, BGB §355 etc.) - LLM (Qwen 3.5:35b) verifies each control against document text - Returns fulfilled/missing with evidence text + severity - Supports: DSI, Cookie, Impressum, Widerruf, AGB, DSFA, AVV, Loeschkonzept Integration in doc-check endpoint: - Regex checklist runs first (fast, deterministic) - RAG checks run after (semantic, catches what regex misses) - Both results combined in single response LLM prompt returns JSON: {fulfilled, evidence, issue, severity} Think-tags stripped, JSON extracted from response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 13:19:15 +02:00
Benjamin Admin	13c5880f51	fix: Restrict sub-section detection to genuinely separate document types Only Cookie and Widerruf sections are checked as separate documents. Social Media, DSFA, Betroffenenrechte, Dienste von Drittanbietern are part of the parent DSI and no longer generate false findings. Added PLAN-rag-document-check.md for Phase 2: - RAG-based checks with document-type-specific Controls - DSFA checklist (Art. 35 + Landes-Listen) - AVV checklist (Art. 28) - Reference detection (sub-doc → parent doc) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 11:02:36 +02:00
Benjamin Admin	539bc824fd	feat: Auto-detect sub-sections within a page and check each separately When a single URL contains multiple document sections (e.g. IHK DSI page with Cookies, Social Media, Dienste von Drittanbietern), the system now: 1. Extracts full page text (main document check as before) 2. Splits text at heading boundaries (short uppercase lines) 3. Classifies each section: Cookie→cookie checklist, Social Media→DSI etc. 4. Runs type-specific checklist per section 5. Returns all results: main doc + sub-sections Section type detection via SECTION_TYPE_MAP patterns: - 'Cookie*' → §25 TDDDG checklist - 'Dienste von Drittanbietern' → DSI checklist - 'Social Media' → DSI checklist (Art. 26 joint controllership) - 'Widerrufsrecht' → §355 BGB checklist - 'Impressum' → §5 TMG checklist Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 10:44:42 +02:00
Benjamin Admin	4c68caac4e	feat: Multi-URL Document Check with full checklist visibility New "Dokumenten-Pruefung" tab in Compliance Agent: - User adds multiple URLs with document type (DSI, AGB, Impressum, Cookie, Widerruf) - Each document loaded via Playwright, accordions expanded, text extracted - Checked against type-specific legal checklist - Optional: Cookie banner check via checkbox Checklisten-UX (solves "100% looks like nothing was checked"): - All checks shown per document: green checkmark + matched text excerpt - Red X for missing fields with legal reference - Builds user trust: "9 Punkte geprueft, alle bestanden" - Expandable per document with completeness bar New checklists: - Impressum: §5 TMG (6 fields: name, address, contact, register, VAT, representative) - Cookie-Richtlinie: §25 TDDDG (5 fields: types, purposes, retention, third-party, opt-out) Backend: - POST /agent/doc-check — async with polling (same pattern as /scan) - DocCheckResult includes checks[] with passed/failed + matched_text - dsi_document_checker returns all_checks in SCORE finding - Email report shows per-document checklist Files: agent_doc_check_routes.py (280 LOC), DocCheckTab.tsx (248 LOC), ChecklistView.tsx (130 LOC), dsi_document_checker.py (+70 LOC) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 10:08:40 +02:00
Benjamin Admin	a349111a01	fix: Raise full_text limit 10K→50K + combine all DSI texts for checks Two fixes: 1. consent-tester: full_text truncation raised from 10,000 to 50,000 chars (IHK Internetangebot has ~50K chars, Beschwerderecht was after 10K cutoff) 2. Backend: dse_text now combines Playwright HTML + ALL DSI discovery texts for mandatory content checking. Previously only used first 8K chars from one source, missing Verantwortlicher/DSB that were in DSI documents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-05 16:03:56 +02:00
Benjamin Admin	72761d6066	debug: Log DSI text lengths to diagnose 0% completeness bug Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-05 14:08:04 +02:00
Benjamin Admin	7c7513525e	feat: Document-centric scan results + DSI deduplication DSI Dedup (consent-tester): - Only H1/H2 headings count as documents (not H3/H4 sub-sections) - Sub-sections (Cookies, Betroffenenrechte, Social Media) are part of parent document's full text, not separate documents - Reduces IHK result from 30 to ~11 real documents Backend (agent_scan_routes): - ScanFinding gets doc_title field linking each finding to its document - doc_title set when creating DSI findings for document attribution Frontend (ScanResult.tsx): - 3 sections: Services table, Document cards, General findings - Documents: expandable cards with completeness bar (green/yellow/red) - Findings grouped under their parent document - Each card shows: title, word count, findings count, % completeness - Findings without doc_title go to "Allgemeine Findings" section Email Summary (agent_scan_helpers): - Findings listed under their parent document - General findings in separate section - No more flat mixed list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-05 09:56:29 +02:00
Benjamin Admin	cb607bf228	feat: Async scan with polling — no more timeout issues Fundamental fix: scans now run asynchronously with progress polling. Backend: - POST /scan starts background task, returns scan_id immediately - GET /scan/{scan_id} returns status + progress + result when done - 7 progress steps shown: Website scan, DSI discovery, DSE analysis, SOLL/IST comparison, corrections, report, email - In-memory job store (dict with scan_id → status/result) - No timeout limits on scan duration Frontend: - POST starts scan, receives scan_id - Polls GET every 5 seconds (max 120 attempts = 10 min) - Shows live progress message during scan - Displays result when completed, error when failed Proxy: - POST timeout reduced to 30s (just starts the job) - GET timeout 10s (just status check) - No more 504/connection-dropped errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-05 07:30:09 +02:00
Benjamin Admin	a3f7fb93f4	fix: Scan quality — raise page limit, use full DSI text for checks Bug 1: max_pages was hardcoded to 15 in backend call — raised to 50 Bug 2: DSI documents checked against text_preview (500 chars) — now uses full_text (10,000 chars) for Art. 13 mandatory field checks Bug 3: DSE text not found when Playwright misses DSE page — now falls back to DSI Discovery full_text as second source Bug 4: Backend timeout 120s too short for 50 pages — raised to 300s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 23:51:03 +02:00
Benjamin Admin	2f0f76e365	fix: Add missing 'import re' to agent_scan_routes.py NameError: name 're' is not defined at line 146 — the import was accidentally removed when extracting helper functions to agent_scan_helpers.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 22:59:53 +02:00
Benjamin Admin	f960bd052a	fix: Add missing 'import re' to agent_scan_routes.py NameError: name 're' is not defined at line 146 — the import was accidentally removed when extracting helper functions to agent_scan_helpers.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 22:59:53 +02:00
Benjamin Admin	48146cddaf	feat: DSI document discovery + completeness check in agent scan workflow Agent scan now automatically: 1. Discovers all legal documents via consent-tester /dsi-discovery endpoint 2. Classifies each as DSE/AGB/Widerruf/Cookie/Impressum 3. Checks completeness against type-specific checklists: - DSE: 9 Art. 13 DSGVO mandatory fields (controller, DPO, purposes, legal basis, recipients, third-country, retention, rights, complaint) - AGB: §305ff BGB (scope, contract formation, liability, jurisdiction) - Widerruf: §355 BGB (right info, 14-day deadline, form, consequences) 4. Adds findings per document to scan results 5. Shows discovered documents with completeness % in email summary 6. Returns discovered_documents list in API response New files: - dsi_document_checker.py (229 LOC) — checklists + classifier - agent_scan_helpers.py (109 LOC) — extracted summary builder + corrections Refactor: agent_scan_routes.py 537→448 LOC (under 500 budget) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 22:10:13 +02:00
Benjamin Admin	53f6f30cf0	feat: DSI document discovery + completeness check in agent scan workflow Agent scan now automatically: 1. Discovers all legal documents via consent-tester /dsi-discovery endpoint 2. Classifies each as DSE/AGB/Widerruf/Cookie/Impressum 3. Checks completeness against type-specific checklists: - DSE: 9 Art. 13 DSGVO mandatory fields (controller, DPO, purposes, legal basis, recipients, third-country, retention, rights, complaint) - AGB: §305ff BGB (scope, contract formation, liability, jurisdiction) - Widerruf: §355 BGB (right info, 14-day deadline, form, consequences) 4. Adds findings per document to scan results 5. Shows discovered documents with completeness % in email summary 6. Returns discovered_documents list in API response New files: - dsi_document_checker.py (229 LOC) — checklists + classifier - agent_scan_helpers.py (109 LOC) — extracted summary builder + corrections Refactor: agent_scan_routes.py 537→448 LOC (under 500 budget) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 22:09:45 +02:00
Benjamin Admin	d3c8811fdb	feat: IAB TCF 2.2 — TC String encoder + purpose mapping + UI - TCFEncoderService: generates base64url-encoded TC Strings per IAB spec with 12 purposes, vendor consent bitfield, CMP metadata - Category-to-purpose mapping (necessary→none, statistics→1,7,8,9,10, marketing→1,2,3,4,5,6,7,12, functional→1,11) - tcf_routes: 5 endpoints (purposes, features, mapping, encode, encode-categories) - banner_consent_service: auto-generates TC String when tcf_enabled=true - TCFSettings.tsx: enable/disable toggle, purpose grid with category mapping, TC String test generator, CMP registration info - New "TCF/IAB" tab in cookie-banner page (7 tabs total) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 07:01:37 +02:00
Benjamin Admin	c89a68e59e	feat: Whistleblower backend + Scanner banner-check (last 2 gaps) Whistleblower (HinSchG): - Migration 118: 3 tables (reports, messages, measures) with HinSchG deadlines (7d acknowledgment, 3mo feedback) - whistleblower_routes.py: 14 endpoints (CRUD, acknowledge, close, messages, measures, public submit, anonymous status check) - Frontend api-operations.ts rewired from Go SDK to compliance proxy - Access key format XXXX-XXXX-XXXX for anonymous reporters Scanner banner-check (TTDSG § 25): - CMP Dashboard: green "Kein Cookie-Banner erforderlich" when no trackers detected + no banner configured - Red warning "Cookie-Banner fehlt!" when trackers found but no banner - Mandatory note: Impressum (DDG § 5) + DSE (DSGVO Art. 13) still required [migration-approved] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 00:22:18 +02:00
Benjamin Admin	060f351da7	feat: Art. 11 DSGVO — reject DSR when data subject not identifiable - New DSRArt11Service: handles rejection with proper legal basis, automated email notification to requester explaining Art. 11 - POST /dsr/{id}/reject-art11 endpoint - ActionButtons.tsx: "Nicht identifizierbar (Art. 11)" button shown when identity is not yet verified - Also fixes: DSR export type-cast rollback handling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 23:30:18 +02:00
Benjamin Admin	02468c94c0	feat: DSR User Data Export — Art. 15 PDF + Art. 20 JSON/CSV - DSRExportService: aggregates all CMP data about a user from Banner Consents, Einwilligungen, Audit Trail, DSR History - GET /dsr/{id}/export-user-data?format=json\|csv\|pdf endpoint - PDF: A4 reportlab with 4 sections (Consents, Einwilligungen, Audit-Trail, DSR-Anfragen) + cover page - CSV: BOM-encoded for Excel with flattened data rows - JSON: structured export with all data categories - ActionButtons.tsx: PDF/JSON/CSV export buttons now functional Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 22:42:03 +02:00
Benjamin Admin	630fffc0cc	feat: Academy integration — training gap detection after document approval (F7) - Migration 115: compliance_role_training_mapping table (org roles → training codes) - TrainingLinkService: queries training_modules/matrix/assignments to find gaps per person and role. Gracefully degrades when Go training tables don't exist yet. - document_review_routes: 2 new endpoints (training-requirements, training-gaps) - _notify_approval() now checks training gaps and sends emails to persons with outstanding modules, linking to /sdk/training/learner [migration-approved] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 22:03:25 +02:00
Benjamin Admin	965af3a34c	feat: A/B Testing + Compliance Report PDF (F5 + F8) F5: A/B Testing for Consent Rate - Migration 116: banner_variants table + variant tracking in audit log - BannerABService: deterministic sticky bucketing via device hash, chi-squared significance testing, variant CRUD - banner_ab_routes: 6 endpoints (CRUD + stats + assign) - ABTestPanel.tsx: variant creation, traffic sliders, opt-in comparison chart with winner/significance badges - New "A/B-Test" tab in cookie-banner page F8: Compliance Report PDF - CompliancePDFGenerator: reportlab-based A4 PDF covering all modules (Company Profile, TOM, VVT, DSFA, Risks, Vendors, Incidents, Reviews, Consents, Roles) - compliance_report_routes: GET /compliance/report/pdf - "Compliance-Report herunterladen" button on SDK dashboard [migration-approved] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 21:42:50 +02:00
Benjamin Admin	c3fcfe88ee	feat: Vendor-level consent + Consent analytics (F4 + F6) F4: Granular Vendor-Level Consent - Migration 113: vendor_consents JSONB on banner_consents + audit_log - ConsentCreate schema + BannerConsentDB model extended - banner_consent_service stores vendor_consents alongside categories - Audit trail includes vendor-level decisions + user_agent F6: Consent Rate Analytics - Migration 114: user_agent on audit_log + time-series index - BannerAnalyticsService: time series, category breakdown, device stats - banner_analytics_routes: 4 endpoints (overview, time-series, categories, devices) - AnalyticsDashboard.tsx: KPIs, bar chart, category bars, device breakdown - New "Analytik" tab in cookie-banner page [migration-approved] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 20:58:06 +02:00
Benjamin Admin	9b4be663f7	feat: Rollenkonzept backend + SOP template (Phase 1-3) - Migration 111: 3 new tables (org_roles, document_reviews, document_role_mapping) with seed data mapping all 71 doc types to 7 compliance roles - org_role_routes.py: CRUD for roles, seed defaults, test email, mapping API - document_review_routes.py: Review lifecycle (create→send→approve/reject) with approval notification to all affected roles - Migration 112: SOP template (ISO 9001 structure, 21 placeholders) - Added standard_operating_procedure to TemplateType, doc-labels, presets [migration-approved] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 13:03:38 +02:00
Benjamin Admin	a1f5d883cc	feat: Cookie-Banner ↔ Backend Integration (DSR, Retention, Consent Proof) Phase 1: Vendor sync from service registry (82+ services → banner vendors) Phase 2: Category-based retention (marketing=90d, statistics=790d, not hardcoded 365d) Phase 3: DSR ↔ Banner email linking (link-email, by-email, Art.17 erasure, Art.15/20 export) Phase 4: Consent sync (Banner → Einwilligungen bridge) Phase 6: Consent proof (SHA256 config hash + config_version in audit log, Art. 7(1) DSGVO) New files: - banner_dsr_service.py — email linking + DSR integration - vendor_banner_sync.py — service registry → vendor configs - migration 106 — linked_email, banner_config_hash, consent_version columns Tests: 20+ new backend tests + 2 Playwright E2E test suites (API + UI) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 19:52:04 +02:00
Benjamin Admin	17c67b4f25	feat: Cookie-Banner ↔ Backend Integration (DSR, Retention, Consent Proof) Phase 1: Vendor sync from service registry (82+ services → banner vendors) Phase 2: Category-based retention (marketing=90d, statistics=790d, not hardcoded 365d) Phase 3: DSR ↔ Banner email linking (link-email, by-email, Art.17 erasure, Art.15/20 export) Phase 4: Consent sync (Banner → Einwilligungen bridge) Phase 6: Consent proof (SHA256 config hash + config_version in audit log, Art. 7(1) DSGVO) New files: - banner_dsr_service.py — email linking + DSR integration - vendor_banner_sync.py — service registry → vendor configs - migration 106 — linked_email, banner_config_hash, consent_version columns Tests: 20+ new backend tests + 2 Playwright E2E test suites (API + UI) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 19:41:22 +02:00
Benjamin Admin	c5b22e0c99	fix: derive intake flags from DETECTED SERVICES, not from text content Fundamental architecture fix: data processing happens through APIs/scripts/ cookies — NOT through visible page text. A news site about healthcare does NOT process health data. Before: Qwen reads website text → guesses "health_data: true" (WRONG) After: Google Analytics detected → tracking: true (CORRECT, deterministic) New flow: detect services from HTML → map service categories to flags → feed flags into UCCA assessment. No LLM needed for flag extraction. SERVICE_TO_FLAGS maps categories: tracking→tracking, marketing→marketing+ third_party_sharing, payment→payment_data, heatmap→profiling, etc. SPECIFIC_SERVICE_FLAGS for Klarna (Art.22), Stripe (US transfer), etc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 08:37:51 +02:00
Benjamin Admin	e318215cc5	refactor: split agent_analyze_routes (420→309 LOC) + agent docs + migration - Extracted website compliance checks + helpers to website_compliance_checks.py - Created agent documentation (zeroclaw/docs/compliance-agent.md) - DB migration 086 executed (compliance_agent_scans table) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 08:22:52 +02:00
Benjamin Admin	58957a4aaa	fix: Playwright user permission + etracker DSE matching + CMP skip 1. Dockerfile: install Playwright AS appuser (not root) so chromium binary is accessible at runtime. Was causing 500 error. 2. DSE service matching: text-search fallback when LLM extraction fails. If "etracker" appears in DSE text, mark as documented even without LLM parsing the service list. 3. CMP skip: consent managers in category "cmp" skipped (not just "other" with id "cmp"). NOT DEPLOYED — RAG pipeline is running. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 19:36:46 +02:00
Benjamin Admin	cedc5de15d	feat: Phase 10 — Playwright website scanner replaces httpx New /website-scan endpoint in consent-tester service: - Real browser renders JavaScript (finds dynamic content) - Clicks navigation menus (discovers hidden sub-pages like IHK DSB page) - Follows links within DSE to find regional privacy policies - Collects rendered HTML for each page (after JS execution) Backend integration: - agent_scan_routes tries Playwright first, falls back to httpx - DSE text and HTML extracted from Playwright-rendered pages - Service detection runs on rendered HTML (catches JS-loaded scripts) Also fixes: - GA regex: G-[A-Z0-9]{8,12} prevents CSS class false positives - etracker added to service registry - External page scanning blocked (same-domain only) - CSS/JS/image files excluded from page list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 19:16:50 +02:00
Benjamin Admin	4bf92f42b8	feat: Phase 9 — Authenticated Testing + Legal Basis Validator (lit. mapping) Phase 9: Playwright login + 5 post-login checks: - §312k BGB: Kündigungsbutton (2 Klicks) - Art. 17 DSGVO: Konto löschen - Art. 20 DSGVO: Daten exportieren - Art. 7(3): Einwilligungen widerrufen - Art. 15: Profildaten einsehen Auto-detects login form selectors. Credentials destroyed after test. Legal Basis Validator: Checks 7 common lit-mapping mistakes: - Cookie tracking on lit. f instead of lit. a (Planet49) - Analytics on lit. b (contract overextension) - Klarna without Art. 22 reference - Session recording without consent Integrated into website scan pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 16:08:41 +02:00
Benjamin Admin	8336c01c5c	feat: Phase 6-8 — PDF export, recurring scans, multi-website compare Phase 6: PDF export via WeasyPrint — POST /agent/scans/pdf generates printable compliance report with findings table, service comparison, risk badge, and legal disclaimer. Phase 7: Recurring scans — POST /agent/monitored-urls to add URLs, POST /agent/run-scheduled triggers all enabled scans (cron/ZeroClaw). In-memory storage with DB upgrade path. Phase 8: Multi-website compare — POST /agent/compare with 2-5 URLs, parallel scanning, comparison table (risk, findings, services, compliance features per site). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 15:27:51 +02:00
Benjamin Admin	e35db90232	feat: Phase 5 — DB persistence for scan results + Phase 10 in plan - Migration 086: compliance_agent_scans table (findings, services, corrections) - agent_history_routes.py: POST /scans (save), GET /scans (list), GET /scans/{id} - Scan results survive page reloads and can be reviewed later - Phase 10 (Playwright website scanner) added to product roadmap Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 15:17:51 +02:00
Benjamin Admin	5c5054f740	feat: Phase 3 — registry 82 services, mandatory checker, SDK flow step - website_scanner.py: imports from master service_registry.py (82 services) - agent_scan_routes.py: mandatory content checks (documents + DSE sections) - steps-betrieb.ts: Compliance Agent step added to SDK Flow (seq 5000) - PLAN: Phase 9 (Authenticated Testing) added to product roadmap Mandatory checks know what MUST be there: - Documents: Impressum, DSE, AGB, Widerrufsbelehrung - DSE content: 9 Art. 13 DSGVO fields (DSB, Speicherdauer, etc.) - Impressum content: 5 §5 TMG fields (GF, HRB, USt-ID, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 15:04:44 +02:00
Benjamin Admin	0ba76d041a	feat: DSE parser + matcher — textblock references in scan findings - dse_parser.py: HTML → structured sections (heading, number, content, parent) Uses heading hierarchy (h1-h4) with regex fallback - dse_matcher.py: matches detected services against DSE sections Exact name → provider → category matching with insertion point suggestion - agent_scan_routes: TextReference model in findings (original text, section, paragraph, correction type, insert_after) Enables showing: "Google Analytics not found in DSE, insert after Section 2.4 Cookies und Tracking" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 11:55:26 +02:00
Benjamin Admin	4298ae17ab	feat: Phase 0+1 — LLM intake extraction + control relevance filter Phase 0: Qwen extracts 14 structured intake flags (personal_data, marketing, profiling, ai_usage, etc.) instead of keyword matching. Fallback to keywords if LLM unavailable. Flags feed into UCCA for accurate scoring. Phase 1: Control relevance filter removes false positives. C_TRANSPARENCY only recommended if AI/ML keywords found in text. 7 control rules with keyword lists + intake flag fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 11:36:24 +02:00
Benjamin Admin	6a77cf6a89	feat: HTML email format, tab info hints, scan history - Summary now renders as styled HTML (table layout, colored risk badge, warning banners) instead of plaintext in <div> - Tab info text explains scope: "Analysiert nur die eingegebene URL" vs "Scannt automatisch 5-10 Unterseiten" - Scan history with findings count badge and page count Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 11:04:29 +02:00
Benjamin Admin	b2a28eb4cd	feat: DSR Prozessbeschreibungen Art. 15-21 mit Swim-Lane-Diagrammen Build + Deploy / build-admin-compliance (push) Successful in 10s Details Build + Deploy / build-backend-compliance (push) Successful in 9s Details Build + Deploy / build-ai-sdk (push) Successful in 8s Details Build + Deploy / build-developer-portal (push) Successful in 7s Details Build + Deploy / build-tts (push) Successful in 7s Details Build + Deploy / build-document-crawler (push) Successful in 7s Details Build + Deploy / build-dsms-gateway (push) Successful in 7s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go (push) Failing after 41s Details CI / test-python-backend (push) Successful in 35s Details CI / test-python-document-crawler (push) Successful in 25s Details CI / test-python-dsms-gateway (push) Successful in 21s Details CI / loc-budget (push) Failing after 13s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m29s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 13s Details Build + Deploy / trigger-orca (push) Successful in 1m53s Details 7 vollstaendige Prozessbeschreibungen fuer den Document Generator: - Art. 15: Auskunftsrecht (30 Tage, 6 Schritte, Informationskatalog) - Art. 16: Berichtigungsrecht (14 Tage, inkl. Art. 19 Mitteilung) - Art. 17: Loeschungsrecht (14 Tage, Art. 17(3) Ausnahmen-Checkliste) - Art. 18: Einschraenkungsrecht (14 Tage, erlaubte Verarbeitung) - Art. 19: Mitteilungspflicht (automatisch bei Art. 16/17/18) - Art. 20: Datenuebertragbarkeit (30 Tage, JSON/CSV/XML Export) - Art. 21: Widerspruchsrecht (30 Tage, Sonderfall Direktwerbung) Jede Beschreibung enthaelt: - Mermaid Swim-Lane-Diagramm (Betroffener/Sachbearbeitung/Fachabteilung/DSB) - Detaillierte Schritt-Tabelle mit Verantwortlichkeiten und Fristen - Rechtsgrundlagen-Verweise - Firmen-Platzhalter (FIRMENNAME, VERSION, DATUM, DSB_NAME) Integration: - 7 neue Typen in VALID_DOCUMENT_TYPES (legal_template_routes.py) - Neue Kategorie "DSR-Prozesse" im Document Generator Frontend - DSR types-core.ts: templateType Feld verknuepft DSR → Document Generator - Migration 085 seeded die Templates in die legal_templates Tabelle [migration-approved] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 19:25:38 +02:00
Benjamin Admin	b39c1d5dce	feat: DSR Prozessbeschreibungen Art. 15-21 mit Swim-Lane-Diagrammen Build + Deploy / build-admin-compliance (push) Successful in 1m56s Details Build + Deploy / build-backend-compliance (push) Successful in 3m5s Details Build + Deploy / build-ai-sdk (push) Successful in 47s Details Build + Deploy / build-developer-portal (push) Successful in 1m5s Details Build + Deploy / build-tts (push) Successful in 1m23s Details Build + Deploy / build-document-crawler (push) Successful in 33s Details Build + Deploy / build-dsms-gateway (push) Successful in 23s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 17s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m40s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 42s Details CI / test-python-backend (push) Successful in 47s Details CI / test-python-document-crawler (push) Successful in 33s Details CI / test-python-dsms-gateway (push) Successful in 22s Details CI / validate-canonical-controls (push) Successful in 18s Details Build + Deploy / trigger-orca (push) Successful in 2m53s Details 7 vollstaendige Prozessbeschreibungen fuer den Document Generator: - Art. 15: Auskunftsrecht (30 Tage, 6 Schritte, Informationskatalog) - Art. 16: Berichtigungsrecht (14 Tage, inkl. Art. 19 Mitteilung) - Art. 17: Loeschungsrecht (14 Tage, Art. 17(3) Ausnahmen-Checkliste) - Art. 18: Einschraenkungsrecht (14 Tage, erlaubte Verarbeitung) - Art. 19: Mitteilungspflicht (automatisch bei Art. 16/17/18) - Art. 20: Datenuebertragbarkeit (30 Tage, JSON/CSV/XML Export) - Art. 21: Widerspruchsrecht (30 Tage, Sonderfall Direktwerbung) Jede Beschreibung enthaelt: - Mermaid Swim-Lane-Diagramm (Betroffener/Sachbearbeitung/Fachabteilung/DSB) - Detaillierte Schritt-Tabelle mit Verantwortlichkeiten und Fristen - Rechtsgrundlagen-Verweise - Firmen-Platzhalter (FIRMENNAME, VERSION, DATUM, DSB_NAME) Integration: - 7 neue Typen in VALID_DOCUMENT_TYPES (legal_template_routes.py) - Neue Kategorie "DSR-Prozesse" im Document Generator Frontend - DSR types-core.ts: templateType Feld verknuepft DSR → Document Generator - Migration 085 seeded die Templates in die legal_templates Tabelle [migration-approved] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 17:53:44 +02:00
Benjamin Admin	b06a33a5fe	fix: syntax error — missing closing paren in scan summary builder	2026-04-28 17:41:11 +02:00
Benjamin Admin	6c0e76f96d	feat: show scanned pages in email summary + frontend (expandable list) Email now lists all scanned URLs with checkmark/cross status. Frontend shows collapsible "X Seiten gescannt — Details anzeigen". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 17:26:03 +02:00
Benjamin Admin	0106f3b5b6	fix: use Ollama directly for correction generation (bypass SDK think-mode) SDK LLM chat returns empty content due to Qwen think-mode. Direct Ollama /api/generate call with stream:false gets the full response including think tags which we strip. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 16:30:51 +02:00

1 2 3 4 5

230 Commits