breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	36c6101b91	Merge feat/zeroclaw-compliance-agent into main Brings all compliance doc-check features: - 162 regex checks + 1874 Master Controls - LLM-agnostic agent with tool calling - Banner check (46 checks, 30 CMPs, stealth, Shadow DOM) - Impressum check (24 checks) - Deep consent verification (DataLayer, GCM, TCF) - CMP E2E tests (39 tests) - HTML email reports, FAQ, persistent history Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 11:44:20 +02:00
Benjamin Admin	445a2f7c7c	docs: Instruktion fuer RAG-Pipeline — Dokumenten-Upload Backend Vollstaendige Spezifikation: - DB-Schema (iace_uploaded_documents) - 3 Go Endpoints (POST/GET/DELETE) - Async PDF → Text → Chunks → Embed → Qdrant Pipeline - Tenant-isolierte Collections (bp_norms_tenant_{id}) - Multi-Collection RAG-Suche - Frontend-API-Vertrag - Sicherheit (Tenant-Isolation, Datei-Validierung) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-09 08:09:40 +02:00
Benjamin Admin	55e44df256	docs: Instruktion fuer RAG-Pipeline — TRBS + TRGS + ASR Ingest ~120 gemeinfreie Technische Regeln (amtliche Bekanntmachungen §5 UrhG) von baua.de fuer die RAG-Pipeline. Crawling + Embedding + Qdrant-Import. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-09 07:57:40 +02:00
Benjamin Admin	1b5c6bd340	docs: Batch test results for 9 websites + EUIPO analysis Build + Deploy / build-ai-sdk (push) Failing after 33s Details Build + Deploy / build-developer-portal (push) Successful in 7s Details Build + Deploy / build-tts (push) Successful in 7s Details Build + Deploy / build-document-crawler (push) Successful in 7s Details Build + Deploy / build-dsms-gateway (push) Successful in 8s Details Build + Deploy / build-admin-compliance (push) Successful in 1m51s Details Build + Deploy / build-backend-compliance (push) Successful in 8s Details CI / loc-budget (push) Failing after 18s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 19s Details Build + Deploy / build-dsms-node (push) Successful in 8s Details CI / branch-name (push) Has been skipped Details Build + Deploy / trigger-orca (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / nodejs-build (push) Successful in 3m8s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Failing after 46s Details CI / test-python-backend (push) Successful in 41s Details CI / test-python-document-crawler (push) Successful in 32s Details CI / test-python-dsms-gateway (push) Successful in 24s Details Tested BMW, Stadt Koeln, BfDI, Sparkasse, Caritas, TUEV Sued, Spiegel, ETO Gruppe, EUIPO. Key findings: - Stadt Koeln + ETO Gruppe best (95% correctness) - BMW, Sparkasse, Spiegel genuinely deficient (verified) - EUIPO uses EU Regulation 2018/1725, not GDPR — needs separate checklist - ~0-2 false positives per website after LLM verification 7 regex fixes emerged from batch testing (soft hyphens, word insertions, numbered headings, German section names, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-08 00:41:28 +02:00
Benjamin Admin	313ee5073b	plan: Banner-Check upgrade to L1/L2 with expert hints Detailed plan for upgrading the 22 existing Playwright-based banner checks to the same quality level as the document checks: - 6 L1 + 30 L2 hierarchical checks - Expert hints with EuGH/CNIL/DSK/EDPB references - 3-phase evidence (before consent, after reject, after accept) - Dark pattern detection (button size, color, click asymmetry) - Estimated 3-4h implementation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 17:48:11 +02:00
Benjamin Admin	fa4fd87102	fix: 7 regex bugs from IHK Konstanz ground truth analysis Build + Deploy / build-admin-compliance (push) Successful in 9s Details CI / loc-budget (push) Failing after 18s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m57s Details Build + Deploy / trigger-orca (push) Successful in 2m24s Details Build + Deploy / build-backend-compliance (push) Successful in 8s Details Build + Deploy / build-ai-sdk (push) Successful in 42s Details Build + Deploy / build-developer-portal (push) Successful in 8s Details Build + Deploy / build-tts (push) Successful in 7s Details Build + Deploy / build-document-crawler (push) Successful in 7s Details Build + Deploy / build-dsms-gateway (push) Successful in 8s Details Build + Deploy / build-dsms-node (push) Successful in 8s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Failing after 49s Details CI / test-python-backend (push) Successful in 42s Details CI / test-python-document-crawler (push) Successful in 28s Details CI / test-python-dsms-gateway (push) Successful in 23s Details CI / validate-canonical-controls (push) Successful in 15s Details Fixes based on manual verification of all 30 failed checks: 1. Cookie table: recognize "folgende cookies" + column headers as text 2. Cookie names: add JSESSIONID, cookieinfo, et_id, BT_* patterns 3. Essential justified: match "sitzung zuordnen", "betrieb der website" 4. Social bookmarks: recognize as 2-click alternative 5. DSFA plural: "kanaelen" now matches alongside "kanal" 6. Section splitter: skip-headings no longer lose subsequent text (Risikoabwaegung section was cut from DSFA, losing risk scores) 7. Cookie legal basis: accept Art. 6(1)(f) in cookie context Reduces false positives from 7 to ~1-2 for IHK Konstanz test case. Ground truth table: zeroclaw/docs/ground-truth-ihk-konstanz.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 14:51:09 +02:00
Benjamin Admin	e19d9ca532	docs: Master Controls spec for document checker — 80-100 specific check criteria Detailed requirements for the pipeline session: - Binary yes/no check_question per control - Concrete pass_criteria + fail_criteria (not 'check completeness') - correction_template from our Template Generator - 8 document types: DSI, Cookie, Impressum, Widerruf, AGB, DSFA, AVV, Loeschkonzept - ~80-100 total controls (not 25K generic ones) - Examples for DSI, Cookie, Impressum with exact field expectations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 07:53:36 +02:00
Benjamin Admin	13c5880f51	fix: Restrict sub-section detection to genuinely separate document types Only Cookie and Widerruf sections are checked as separate documents. Social Media, DSFA, Betroffenenrechte, Dienste von Drittanbietern are part of the parent DSI and no longer generate false findings. Added PLAN-rag-document-check.md for Phase 2: - RAG-based checks with document-type-specific Controls - DSFA checklist (Art. 35 + Landes-Listen) - AVV checklist (Art. 28) - Reference detection (sub-doc → parent doc) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 11:02:36 +02:00
Benjamin Admin	4f92e5056c	docs: Complete agent architecture reference for reuse in other agents Full documentation of the ZeroClaw compliance agent architecture: - System overview diagram (Frontend → Backend → LLM → Playwright) - Detailed request flow for Website-Scan mode (7 steps) - All 5 components: Frontend, Backend, Consent-Tester, Ollama, Soul Files - 20 banner checks across 3 files - LLM call patterns (/api/generate + /api/chat + think-mode stripping) - Blueprint for creating new agents (5 steps: Soul, Route, Page, Proxy, Docker) - Timeouts, environment variables, file reference with LOC counts Designed as reusable blueprint for Sales, HR, Finance, or other agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 22:26:56 +02:00
Benjamin Admin	0837680e03	docs: Add EUIPO Unblu Chat findings (3 new, total 10 findings) Finding 8: Unblu chat consent links to third-party DSE (unblu.com) instead of EUIPO's own privacy policy (Art. 13 DSGVO) Finding 9: Cookie consent delegated to third-party terms without own legal basis (§25 TDDDG) Finding 10: Click-outside-dialog = accept — accidental click counts as consent (Planet49, Art. 7(1) DSGVO) New planned agent checks: - Drittanbieter-DSE-Check: detect consent linking to external DSE - Modal-Dismiss-Check: Playwright test if backdrop click = consent - Dark-Pattern-Sprache: detect "muessen/erforderlich" for non-essential cookies Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 07:48:35 +02:00
Benjamin Admin	7ebd25c59c	docs: Add EUIPO registration as compliance agent reference test case Real-world case from EU authority (EUIPO) with 7 findings: - Grammatically broken consent text (bad DE translation) - Coupling prohibition violation (login = consent, Art. 7(4) DSGVO) - No reject button, no granularity, no active opt-in - Broken link layout (DSE/ToS links appear after submit button) - Includes correction suggestion and planned agent check implementations - Pattern: WSO2 Identity Server default templates (systemic issue) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 07:28:32 +02:00
Benjamin Admin	e318215cc5	refactor: split agent_analyze_routes (420→309 LOC) + agent docs + migration - Extracted website compliance checks + helpers to website_compliance_checks.py - Created agent documentation (zeroclaw/docs/compliance-agent.md) - DB migration 086 executed (compliance_agent_scans table) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 08:22:52 +02:00
Benjamin Admin	891fc5bea0	docs: add keyword-based checker problem to migration instruction mandatory_content_checker.py keywords break with alternative formulations. Solution: LLM-based check per mandatory field (9 calls, parallelizable). For other session to implement alongside Dict→Control migration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 18:18:45 +02:00
Benjamin Admin	2c9cea74e3	docs: instruction for hardcoded knowledge → Control Library migration 6 files with hardcoded legal knowledge identified. Review deadline 2026-07-01. legal_basis_validator.py marked with warning log on every use. Instruction file for other session to execute migration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 16:33:48 +02:00
Benjamin Admin	e35db90232	feat: Phase 5 — DB persistence for scan results + Phase 10 in plan - Migration 086: compliance_agent_scans table (findings, services, corrections) - agent_history_routes.py: POST /scans (save), GET /scans (list), GET /scans/{id} - Scan results survive page reloads and can be reviewed later - Phase 10 (Playwright website scanner) added to product roadmap Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 15:17:51 +02:00
Benjamin Admin	5c5054f740	feat: Phase 3 — registry 82 services, mandatory checker, SDK flow step - website_scanner.py: imports from master service_registry.py (82 services) - agent_scan_routes.py: mandatory content checks (documents + DSE sections) - steps-betrieb.ts: Compliance Agent step added to SDK Flow (seq 5000) - PLAN: Phase 9 (Authenticated Testing) added to product roadmap Mandatory checks know what MUST be there: - Documents: Impressum, DSE, AGB, Widerrufsbelehrung - DSE content: 9 Art. 13 DSGVO fields (DSB, Speicherdauer, etc.) - Impressum content: 5 §5 TMG fields (GF, HRB, USt-ID, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 15:04:44 +02:00
Benjamin Admin	0266dfd011	docs: Compliance Agent product roadmap — 8 phases, PoC to production P0: UCCA score calibration + control relevance filter P1: Headless browser consent test (before/after cookie banner) + 80+ services P2: Scan acceleration, DB persistence, PDF export P3: Recurring scans, multi-website comparison Investor demo scenario included. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 11:32:27 +02:00
Benjamin Admin	d0dc284cd5	docs: add Phase 5 (Payment/Marketing checks) + Phase 6 (auto-corrections) - Payment: Stripe, PayPal, Klarna (Art. 22 Bonitaetspruefung!), Adyen, Mollie - Marketing: GA, Meta Pixel, TikTok, Hotjar, Clarity, Newsletter-Anbieter - Each service: DSE mention check, consent check, third-country check - Pre-launch mode: agent generates ready-to-insert DSE text blocks via Qwen - Correction types: missing service, wrong legal basis, outdated entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 15:26:29 +02:00
Benjamin Admin	24fb1e14e0	docs: add Phase 4b — SOLL/IST Dienstleister-Abgleich (DSE vs. Website) Automated comparison: services mentioned in privacy policy vs. actually embedded on website. Three categories: undocumented (Art. 13 violation), outdated (cleanup), correctly documented (check third country only). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 15:20:12 +02:00
Benjamin Admin	6aa753146f	docs: extend plan with third-party service detection + Drittland registry 80+ services: CDN (Cloudflare, Akamai), Fonts (Google Fonts LG München), Tracking (GA, Meta Pixel, Matomo), Captcha, Maps, Video, Payment. Static registry with country, EU adequacy, consent requirement, legal ref. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 15:18:43 +02:00
Benjamin Admin	acd2d5f944	docs: add Phase 4 (Website-Scan) to Control Relevance Filter plan Multi-page crawl: scan 5-10 strategic pages (start, footer links) for chatbot widgets, AI text mentions, and tracking services. Feed results into relevance filter to reduce false positives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 15:11:19 +02:00
Benjamin Admin	2a6f526c88	docs: plan for Control Relevance Filter (3-stage: rules, LLM, follow-up) Addresses false-positive controls like C_TRANSPARENCY being recommended when no AI usage is evident. Plan for separate implementation session. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 14:32:25 +02:00
Benjamin Admin	0c0dd4e3a6	feat: ZeroClaw compliance agent — document analysis + role assignment + email Add autonomous compliance agent that fetches web documents (cookie banners, privacy policies), classifies them via Qwen/Ollama, assesses DSGVO compliance, assigns to the responsible role, and sends notification emails. Components: - ZeroClaw SOP (6-step workflow: fetch, classify, assess, summarize, assign, notify) - Backend: /api/compliance/agent/analyze (combined endpoint) - Backend: /api/compliance/agent/notify (standalone email) - Frontend: /sdk/agent page (Manager UI with URL input + results) - Helper scripts + E2E test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-27 23:28:21 +02:00

23 Commits