breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	6c223c7c9b	feat(compliance-check): exec-summary + voll-audit + TDM-respect + cookie-KB-extended + saving-scan-funnel CI / detect-changes (push) Successful in 10s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 14s Details CI / loc-budget (push) Failing after 15s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m43s Details CI / test-go (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 37s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details P1 — Exec-Summary oben im Email-Report (4 KPIs + 2 CTAs, dunkler Gradient) P3 — no_direct_sales-Flag fuer OEM-Konfigurator-Sites; AGB/Widerruf/AGB als "NICHT ANWENDBAR" (grau) statt "NICHT GEFUNDEN" (rot) P5 — Voll-Audit Unification: alle Findings (MC + Pflichtangaben + Vendor + Redundanz) in /data/compliance_audits.db.unified_findings; neuer /api/compliance/agent/findings/<id> Endpoint + FindingsTab im Audit-UI mit Filter + CSV-Export P7 — Crawl-Hardening: TDM-Reservation-Check (robots.txt / ai.txt / Header / Meta) vor jedem Run mit 24h-Cache; HeadlessChrome-UA (Firma noch nicht gegruendet — Switch via BREAKPILOT_BRANDED_UA env); per-Domain Rate-Limit 1 req/s + max 2 concurrent P2 — Cookie-Knowledge-DB additiv erweitert (35 -> 74 Cookies): Adobe, Meta, Microsoft, LinkedIn, TikTok, HubSpot, Marketo, Salesforce, Hotjar, FullStory, Mouseflow, Intercom, Drift, Zendesk, Cloudflare, Stripe, OneTrust/Cookiebot/Usercentrics, Matomo, Pinterest, Snapchat, X/Twitter, YouTube, Vimeo, Klaviyo, Mailchimp, Mixpanel, Segment, Amplitude, Optimizely, Datadog; Wire-in in cookie_function_classifier liefert compliance_risk-Label (kritisch/hoch/mittel/gering) pro Vendor A — k-Anonymitaets-Helper (benchmark_k_anonymity) fuer P6-Vorbereitung B — Cross-Tenant-Domain-Assertion im /findings-Endpoint (expected_domain Query-Param -> 403 bei Mismatch) C — Saving-Scan-Funnel: /api/compliance/agent/saving-scan/start mit Validierung + 24h-Rate-Limit pro Domain + Lead-Persistenz in saving_scan_leads + Auto-Discovery via _run_compliance_check; 6 Tests D — Risk-Badge im Email-Vendor-Row Rechtliche Leitplanken (Memory feedback_oem_data_legal.md): nur eigene Knapp-Bewertungen + Source-Pointer, keine 1:1-Kopien fremder CMP-Texte. TDM-Opt-Out-Respect nach § 44b UrhG. KEINE Schema-Aenderungen — alles in Sidecar-SQLite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 23:48:34 +02:00
Benjamin Admin	df7d83134b	feat(agent): migrate compliance-check results to banner + documents (M1-M5) After a compliance-check run finishes, the user can now apply the extracted vendor inventory directly to their own: - CookieBanner config (admin /sdk/einwilligungen) - Cookie-Policy / VVT-Register / Privacy-Policy templates (admin /sdk/document-generator) Backend: - migration_to_banner.py: vendor list -> CookieBannerConfig with ESSENTIAL/PERFORMANCE/PERSONALIZATION/EXTERNAL_MEDIA buckets + review flags (broken opt-out URLs, missing expiry, no cookies listed) - migration_to_document.py: vendor list -> pre-fills for 3 doc templates, recipient-type aware (INTERNAL/GROUP/PROCESSOR/CONTROLLER) - agent_migration_routes.py: GET /banner-preview, /document-preview, /summary keyed on check_id - compliance_audit_log: new check_payloads table persists cmp_vendors + extracted_profile so the preview survives an app restart - tests: 9 mapper units + 4 endpoint integration tests Frontend: - MigrationPanel.tsx: modal showing banner-config diff + document pre-fills, plus links into the existing editors - ComplianceCheckTab.tsx: replaces standalone audit link with the panel; net -3 lines, stays at the 500-cap Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 14:06:28 +02:00
Benjamin Admin	6ed30dae5b	feat(agent): MC scorecard + audit drill-down + tenant trend (A1-A6) Now that all 1874 MCs run per check (Task #30 cap removal), the report was about to drown in noise. This commit adds the full aggregation / persistence / drill-down stack so each MC is actionable, not just counted. A1 mc_scorecard.py (new): build_scorecard(checks) -> per-regulation PASS/FAIL/SKIP + severity top_fails(checks, n) -> N most severe failed MCs full_audit_records(...) -> flat rows ready for sidecar SQLite A2 Email rendering: agent_doc_check_scorecard.py (new) builds an HTML scorecard table (regulation × passed/failed/HIGH/MEDIUM/score) shown at the top of the email. agent_doc_check_report._render_document now collapses the 500-MC L2 forest into 'X/Y bestanden (Z Fail)' summary plus a top-10 fails block per doc — old verbose render is gone. A3 compliance_audit_log.py (new) — sidecar SQLite at /data/compliance_audits.db (separate from compliance Postgres schema to comply with the no-new-migrations rule in CLAUDE.md): check_runs(check_id, ts, tenant_id, site_name, base_domain, doc_count, scorecard json, vvt_summary json) mc_results(check_id, doc_type, mc_id, label, passed, skipped, severity, regulation, matched_text, hint) Route persists every run after the email is sent. docker-compose.yml adds compliance-audit volume + env. A4 backfill_mc_regulation_llm.py (new) — Qwen-tagged backfill for the 1636 MCs the regex pass couldn't classify. Batches of 25, format=json, output constrained to the canonical regulation list. Run manually: docker exec bp-compliance-backend python3 \ /app/scripts/backfill_mc_regulation_llm.py [--dry-run] A5 Admin audit tab — GET /api/compliance/agent/audit/<check_id> proxied via /api/sdk/v1/agent/audit/<id>. New page /sdk/agent/audit/[checkId] renders scorecard + filterable MC table (status / doc_type / regulation, expandable rows with matched_text + hint). ComplianceCheckTab now shows 'Voll-Audit oeffnen' link. A6 Trend per tenant — GET /api/compliance/agent/audit/tenant/<id> returns recent runs. Email scorecard shows per-regulation delta badges ('(+12%)', '(-3%)') compared with the previous run for the same tenant + base_domain. Lookup is one SQLite query. Plumbing: rag_document_checker.py — SELECT now includes 'article'; MC results carry 'regulation' + 'article' through to CheckItem. agent_doc_check_routes.CheckItem schema gains regulation + article fields (defaults '') so old clients still parse. agent_compliance_check_routes — response gains 'check_id' so the frontend can build the audit link.	2026-05-17 13:45:58 +02:00
Benjamin Admin	0d0e705117	feat: Unified Compliance-Check — 8 document types in one form New 3-tab structure: Website-Scan, Compliance-Check, Banner-Check. Compliance-Check Tab (replaces Dokumenten-Pruefung + Impressum-Check): - 8 document rows: DSI, Impressum, Social Media, Cookie, AGB, Nutzungsbedingungen, Widerruf, DSB-Kontakt - Each row: URL input + "Text laden" + file upload + manual text - "Text laden" extracts via consent-tester, shows in editable textarea - User verifies/corrects text before checking - Empty fields = "not present" → own finding Business Profiler (business_profiler.py): - Detects B2B/B2C/B2G from all documents together - Recognizes regulated professions, online shops, editorial content - Context-aware: INFO checks become PASS/FAIL based on profile Backend: /compliance-check + /extract-text endpoints Frontend: ComplianceCheckTab.tsx + DocumentRow.tsx API proxies: compliance-check/route.ts + extract-text/route.ts Also: Impressum regex fixes (Telefon, AG, Geschaeftsfuehrung) and INFO severity for context-dependent checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 20:56:10 +02:00
Benjamin Admin	02ff96f74e	fix: resolve all merge conflict markers from feat/zeroclaw-compliance-agent Build + Deploy / build-admin-compliance (push) Successful in 2m7s Details Build + Deploy / build-backend-compliance (push) Failing after 5m21s Details Build + Deploy / build-ai-sdk (push) Successful in 53s Details Build + Deploy / build-developer-portal (push) Successful in 1m18s Details Build + Deploy / build-tts (push) Successful in 1m42s Details Build + Deploy / build-document-crawler (push) Successful in 45s Details Build + Deploy / build-dsms-gateway (push) Successful in 27s Details Build + Deploy / build-dsms-node (push) Successful in 19s Details CI / branch-name (push) Has been skipped Details Build + Deploy / trigger-orca (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 19s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 3m6s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 55s Details CI / test-python-backend (push) Successful in 44s Details CI / test-python-document-crawler (push) Successful in 30s Details CI / test-python-dsms-gateway (push) Successful in 26s Details CI / validate-canonical-controls (push) Successful in 18s Details 9 files had conflict markers from the branch merge. All resolved keeping the feature branch version. Also split agent_scan_routes.py (534→367 LOC) by extracting Pydantic models to agent_scan_models.py. [guardrail-change] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 12:15:07 +02:00
Benjamin Admin	36c6101b91	Merge feat/zeroclaw-compliance-agent into main Brings all compliance doc-check features: - 162 regex checks + 1874 Master Controls - LLM-agnostic agent with tool calling - Banner check (46 checks, 30 CMPs, stealth, Shadow DOM) - Impressum check (24 checks) - Deep consent verification (DataLayer, GCM, TCF) - CMP E2E tests (39 tests) - HTML email reports, FAQ, persistent history Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 11:44:20 +02:00
Benjamin Admin	4bfb438c92	feat: 4 banner check upgrades — 30 CMPs, stealth, Shadow DOM, categories Build + Deploy / build-admin-compliance (push) Successful in 2m17s Details Build + Deploy / build-backend-compliance (push) Successful in 3m17s Details Build + Deploy / build-ai-sdk (push) Successful in 56s Details Build + Deploy / build-developer-portal (push) Successful in 1m37s Details Build + Deploy / build-tts (push) Successful in 1m33s Details Build + Deploy / build-document-crawler (push) Successful in 42s Details Build + Deploy / build-dsms-gateway (push) Successful in 33s Details Build + Deploy / build-dsms-node (push) Successful in 16s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 25s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 3m33s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Failing after 1m18s Details CI / test-python-backend (push) Successful in 53s Details CI / test-python-document-crawler (push) Successful in 36s Details CI / test-python-dsms-gateway (push) Successful in 33s Details CI / validate-canonical-controls (push) Successful in 24s Details Build + Deploy / trigger-orca (push) Successful in 3m19s Details 1. 30 CMP selectors (was 10): Added Sourcepoint, Iubenda, Complianz, CookieFirst, HubSpot, Osano, Piwik PRO, Cookie Consent (Insites), Axeptio, Termly, CookieScript, Civic UK, GDPR Cookie Compliance, CookieHub, Ketch, Admiral, Sibbo, Evidon, LiveRamp, Adsimple. Plus improved generic fallback: role=dialog, aria-label, data-* attrs. 2. Playwright stealth mode: playwright-stealth against bot detection. Removes WebDriver flag, simulates plugins, realistic viewport/locale. Launch args: --disable-blink-features=AutomationControlled. 3. Shadow DOM: Recursive JS-based search through shadowRoot elements for consent banners. Fallback click via page.evaluate() when normal Playwright selectors can't penetrate Shadow DOM. 4. Category selection UI: User can choose which cookie categories to test (Notwendig, Statistik, Marketing, Funktional, Praeferenzen). Pill-style checkboxes in BannerCheckTab, forwarded through API chain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-09 08:42:30 +02:00
Benjamin Admin	7c17321089	feat: Cookie Banner Check as standalone tab in Compliance Agent Build + Deploy / build-admin-compliance (push) Successful in 2m7s Details Build + Deploy / build-backend-compliance (push) Successful in 10s Details Build + Deploy / build-ai-sdk (push) Successful in 8s Details Build + Deploy / build-developer-portal (push) Successful in 7s Details Build + Deploy / build-tts (push) Successful in 7s Details Build + Deploy / build-document-crawler (push) Successful in 9s Details Build + Deploy / build-dsms-gateway (push) Successful in 8s Details Build + Deploy / build-dsms-node (push) Successful in 8s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 17s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 3m21s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Failing after 47s Details CI / test-python-backend (push) Successful in 47s Details CI / test-python-document-crawler (push) Successful in 31s Details CI / test-python-dsms-gateway (push) Successful in 26s Details CI / validate-canonical-controls (push) Successful in 16s Details Build + Deploy / trigger-orca (push) Successful in 2m23s Details New "Banner-Check" tab with: - URL input → Playwright 3-phase test (before/reject/accept) - Shield icon + provider detection - Progress bar with pass/fail percentage - 3-phase summary (cookies + scripts per phase) - Violations (red) and passes (green) in structured list Backend: new POST /api/compliance/agent/banner-check endpoint that proxies to consent-tester:8094/scan. Next step: Upgrade banner checks to L1/L2 format with expert hints (same quality as document checks). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 17:39:44 +02:00
Benjamin Admin	4c68caac4e	feat: Multi-URL Document Check with full checklist visibility New "Dokumenten-Pruefung" tab in Compliance Agent: - User adds multiple URLs with document type (DSI, AGB, Impressum, Cookie, Widerruf) - Each document loaded via Playwright, accordions expanded, text extracted - Checked against type-specific legal checklist - Optional: Cookie banner check via checkbox Checklisten-UX (solves "100% looks like nothing was checked"): - All checks shown per document: green checkmark + matched text excerpt - Red X for missing fields with legal reference - Builds user trust: "9 Punkte geprueft, alle bestanden" - Expandable per document with completeness bar New checklists: - Impressum: §5 TMG (6 fields: name, address, contact, register, VAT, representative) - Cookie-Richtlinie: §25 TDDDG (5 fields: types, purposes, retention, third-party, opt-out) Backend: - POST /agent/doc-check — async with polling (same pattern as /scan) - DocCheckResult includes checks[] with passed/failed + matched_text - dsi_document_checker returns all_checks in SCORE finding - Email report shows per-document checklist Files: agent_doc_check_routes.py (280 LOC), DocCheckTab.tsx (248 LOC), ChecklistView.tsx (130 LOC), dsi_document_checker.py (+70 LOC) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 10:08:40 +02:00
Benjamin Admin	cb607bf228	feat: Async scan with polling — no more timeout issues Fundamental fix: scans now run asynchronously with progress polling. Backend: - POST /scan starts background task, returns scan_id immediately - GET /scan/{scan_id} returns status + progress + result when done - 7 progress steps shown: Website scan, DSI discovery, DSE analysis, SOLL/IST comparison, corrections, report, email - In-memory job store (dict with scan_id → status/result) - No timeout limits on scan duration Frontend: - POST starts scan, receives scan_id - Polls GET every 5 seconds (max 120 attempts = 10 min) - Shows live progress message during scan - Displays result when completed, error when failed Proxy: - POST timeout reduced to 30s (just starts the job) - GET timeout 10s (just status check) - No more 504/connection-dropped errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-05 07:30:09 +02:00
Benjamin Admin	47ec792acf	fix: raise scan proxy timeout from 3 to 10 min (50 pages + 20 DSI docs + LLM) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-05 00:25:33 +02:00
Benjamin Admin	b53b36fdc5	feat: 5-tab agent UI — PDF export, compare, auth test, all proxies - 5 tabs: Schnellanalyse, Website-Scan, Cookie-Test, Vergleich, Login-Test - PDF download button in ScanResult - CompareResult: side-by-side compliance comparison table - AuthTestResult: 5 post-login checks with legal refs - API proxies: /scans/pdf, /compare, /authenticated-scan - Compare: textarea for 2-5 URLs, parallel scanning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 16:43:08 +02:00
Benjamin Admin	85c4cbbf37	fix: increase scan proxy timeout from 3 to 5 minutes	2026-04-29 16:24:22 +02:00
Benjamin Admin	16c40ddae4	feat: consent-test email with phase-structured findings Email shows 3 phases (Before/After Reject/After Accept) with: - Violation cards per phase (CRITICAL/HIGH badges) - Undocumented services in Phase C - Summary table (critical/high/undocumented counts) - Dark Pattern warning if tracking persists after rejection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 13:14:01 +02:00
Benjamin Admin	b7f9099ad9	feat: Cookie-Test tab — 3-phase consent test UI + API proxy Third tab "Cookie-Test" in Compliance Agent: - Phase A: Before consent (tracking without permission) - Phase B: After rejection (CRITICAL if tracking persists) - Phase C: After acceptance (undocumented services) - CMP badge (Didomi, OneTrust, etc.) - Violation cards with severity badges and legal references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 12:38:15 +02:00
Benjamin Admin	ac8eb1bf99	feat: "Als Email senden" Button im Compliance Advisor Chat-Verlauf wird als strukturiertes Beratungsprotokoll per Email an den DSB gesendet. Button erscheint im Header sobald Nachrichten vorhanden sind. Zeigt Checkmark nach erfolgreichem Versand. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 23:17:13 +02:00
Benjamin Admin	0f1fae61a6	feat: Website-Scan tab in agent UI — service table, SOLL/IST, corrections - Tab system: Schnellanalyse (single page) + Website-Scan (multi-page) - ScanResult component: service comparison table, severity-colored findings - Expandable correction suggestions with copy button (pre-launch mode) - API proxy route for /agent/scan endpoint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 15:52:40 +02:00
Benjamin Admin	0c0dd4e3a6	feat: ZeroClaw compliance agent — document analysis + role assignment + email Add autonomous compliance agent that fetches web documents (cookie banners, privacy policies), classifies them via Qwen/Ollama, assesses DSGVO compliance, assigns to the responsible role, and sends notification emails. Components: - ZeroClaw SOP (6-step workflow: fetch, classify, assess, summarize, assign, notify) - Backend: /api/compliance/agent/analyze (combined endpoint) - Backend: /api/compliance/agent/notify (standalone email) - Frontend: /sdk/agent page (Manager UI with URL input + results) - Helper scripts + E2E test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-27 23:28:21 +02:00

18 Commits