Commit Graph

11 Commits

Author SHA1 Message Date
Benjamin Admin 1b5c6bd340 docs: Batch test results for 9 websites + EUIPO analysis
Build + Deploy / build-ai-sdk (push) Failing after 33s
Build + Deploy / build-developer-portal (push) Successful in 7s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 7s
Build + Deploy / build-dsms-gateway (push) Successful in 8s
Build + Deploy / build-admin-compliance (push) Successful in 1m51s
Build + Deploy / build-backend-compliance (push) Successful in 8s
CI / loc-budget (push) Failing after 18s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 19s
Build + Deploy / build-dsms-node (push) Successful in 8s
CI / branch-name (push) Has been skipped
Build + Deploy / trigger-orca (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / nodejs-build (push) Successful in 3m8s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 46s
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Successful in 32s
CI / test-python-dsms-gateway (push) Successful in 24s
Tested BMW, Stadt Koeln, BfDI, Sparkasse, Caritas, TUEV Sued,
Spiegel, ETO Gruppe, EUIPO. Key findings:

- Stadt Koeln + ETO Gruppe best (95% correctness)
- BMW, Sparkasse, Spiegel genuinely deficient (verified)
- EUIPO uses EU Regulation 2018/1725, not GDPR — needs separate checklist
- ~0-2 false positives per website after LLM verification

7 regex fixes emerged from batch testing (soft hyphens, word
insertions, numbered headings, German section names, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-08 00:41:28 +02:00
Benjamin Admin 313ee5073b plan: Banner-Check upgrade to L1/L2 with expert hints
Detailed plan for upgrading the 22 existing Playwright-based banner
checks to the same quality level as the document checks:
- 6 L1 + 30 L2 hierarchical checks
- Expert hints with EuGH/CNIL/DSK/EDPB references
- 3-phase evidence (before consent, after reject, after accept)
- Dark pattern detection (button size, color, click asymmetry)
- Estimated 3-4h implementation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 17:48:11 +02:00
Benjamin Admin fa4fd87102 fix: 7 regex bugs from IHK Konstanz ground truth analysis
Build + Deploy / build-admin-compliance (push) Successful in 9s
Build + Deploy / build-backend-compliance (push) Successful in 8s
Build + Deploy / build-ai-sdk (push) Successful in 42s
Build + Deploy / build-developer-portal (push) Successful in 8s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 7s
Build + Deploy / build-dsms-gateway (push) Successful in 8s
Build + Deploy / build-dsms-node (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 18s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m57s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 49s
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Successful in 28s
CI / test-python-dsms-gateway (push) Successful in 23s
CI / validate-canonical-controls (push) Successful in 15s
Build + Deploy / trigger-orca (push) Successful in 2m24s
Fixes based on manual verification of all 30 failed checks:
1. Cookie table: recognize "folgende cookies" + column headers as text
2. Cookie names: add JSESSIONID, cookieinfo, et_id, BT_* patterns
3. Essential justified: match "sitzung zuordnen", "betrieb der website"
4. Social bookmarks: recognize as 2-click alternative
5. DSFA plural: "kanaelen" now matches alongside "kanal"
6. Section splitter: skip-headings no longer lose subsequent text
   (Risikoabwaegung section was cut from DSFA, losing risk scores)
7. Cookie legal basis: accept Art. 6(1)(f) in cookie context

Reduces false positives from 7 to ~1-2 for IHK Konstanz test case.
Ground truth table: zeroclaw/docs/ground-truth-ihk-konstanz.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 14:51:09 +02:00
Benjamin Admin e19d9ca532 docs: Master Controls spec for document checker — 80-100 specific check criteria
Detailed requirements for the pipeline session:
- Binary yes/no check_question per control
- Concrete pass_criteria + fail_criteria (not 'check completeness')
- correction_template from our Template Generator
- 8 document types: DSI, Cookie, Impressum, Widerruf, AGB, DSFA, AVV, Loeschkonzept
- ~80-100 total controls (not 25K generic ones)
- Examples for DSI, Cookie, Impressum with exact field expectations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 07:53:36 +02:00
Benjamin Admin 13c5880f51 fix: Restrict sub-section detection to genuinely separate document types
Only Cookie and Widerruf sections are checked as separate documents.
Social Media, DSFA, Betroffenenrechte, Dienste von Drittanbietern are
part of the parent DSI and no longer generate false findings.

Added PLAN-rag-document-check.md for Phase 2:
- RAG-based checks with document-type-specific Controls
- DSFA checklist (Art. 35 + Landes-Listen)
- AVV checklist (Art. 28)
- Reference detection (sub-doc → parent doc)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 11:02:36 +02:00
Benjamin Admin d0dc284cd5 docs: add Phase 5 (Payment/Marketing checks) + Phase 6 (auto-corrections)
- Payment: Stripe, PayPal, Klarna (Art. 22 Bonitaetspruefung!), Adyen, Mollie
- Marketing: GA, Meta Pixel, TikTok, Hotjar, Clarity, Newsletter-Anbieter
- Each service: DSE mention check, consent check, third-country check
- Pre-launch mode: agent generates ready-to-insert DSE text blocks via Qwen
- Correction types: missing service, wrong legal basis, outdated entry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 15:26:29 +02:00
Benjamin Admin 24fb1e14e0 docs: add Phase 4b — SOLL/IST Dienstleister-Abgleich (DSE vs. Website)
Automated comparison: services mentioned in privacy policy vs. actually
embedded on website. Three categories: undocumented (Art. 13 violation),
outdated (cleanup), correctly documented (check third country only).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 15:20:12 +02:00
Benjamin Admin 6aa753146f docs: extend plan with third-party service detection + Drittland registry
80+ services: CDN (Cloudflare, Akamai), Fonts (Google Fonts LG München),
Tracking (GA, Meta Pixel, Matomo), Captcha, Maps, Video, Payment.
Static registry with country, EU adequacy, consent requirement, legal ref.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 15:18:43 +02:00
Benjamin Admin acd2d5f944 docs: add Phase 4 (Website-Scan) to Control Relevance Filter plan
Multi-page crawl: scan 5-10 strategic pages (start, footer links) for
chatbot widgets, AI text mentions, and tracking services. Feed results
into relevance filter to reduce false positives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 15:11:19 +02:00
Benjamin Admin 2a6f526c88 docs: plan for Control Relevance Filter (3-stage: rules, LLM, follow-up)
Addresses false-positive controls like C_TRANSPARENCY being recommended
when no AI usage is evident. Plan for separate implementation session.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:32:25 +02:00
Benjamin Admin 0c0dd4e3a6 feat: ZeroClaw compliance agent — document analysis + role assignment + email
Add autonomous compliance agent that fetches web documents (cookie banners,
privacy policies), classifies them via Qwen/Ollama, assesses DSGVO compliance,
assigns to the responsible role, and sends notification emails.

Components:
- ZeroClaw SOP (6-step workflow: fetch, classify, assess, summarize, assign, notify)
- Backend: /api/compliance/agent/analyze (combined endpoint)
- Backend: /api/compliance/agent/notify (standalone email)
- Frontend: /sdk/agent page (Manager UI with URL input + results)
- Helper scripts + E2E test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 23:28:21 +02:00