c702260ec1592705c2b5921467d688b0f4150814
7 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c702260ec1 |
fix: 5 regex bugs + text extraction scroll + GT update
Build + Deploy / build-backend-compliance (push) Successful in 23s
Build + Deploy / build-ai-sdk (push) Successful in 13s
Build + Deploy / build-admin-compliance (push) Successful in 13s
Build + Deploy / build-developer-portal (push) Successful in 14s
Build + Deploy / build-tts (push) Successful in 15s
Build + Deploy / build-document-crawler (push) Successful in 13s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go (push) Successful in 39s
Build + Deploy / build-dsms-gateway (push) Successful in 15s
Build + Deploy / build-dsms-node (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 15s
CI / nodejs-build (push) Successful in 2m26s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-python-backend (push) Successful in 39s
CI / test-python-document-crawler (push) Successful in 25s
CI / test-python-dsms-gateway (push) Successful in 22s
CI / validate-canonical-controls (push) Successful in 15s
Build + Deploy / trigger-orca (push) Successful in 2m28s
Root cause: Spiegel DSI text was truncated (lazy-loading) — the rights/DSB/complaints sections at the bottom were never extracted. Fixes: 1. Text extraction: scroll to bottom before innerText (dsi_discovery.py) 2. V.i.S.d.P.: add "verantwortlicher i.s.v." + "§18 Abs. N MStV" pattern 3. USt-IdNr: add "umsatzsteuer-id" + "DE 212 442 423" (with spaces) 4. Profiler: remove generic "anwalt"/"praxis" (false positive on Spiegel "Redaktionsanwalt"), keep only "rechtsanwalt", "kanzlei" etc. 5. Section splitter: auto_fill_from_dsi() fills empty Cookie/Social-Media rows from sections found in the DSI text Ground Truth 06-spiegel.md fully rewritten with verified data from live website — 3 L1 False Negatives identified (DSB, Beschwerderecht, Betroffenenrechte all present on website but not in extracted text). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
407a9503e4 |
fix(profiler): fix B2G false positive + add consulting/manufacturing
Build + Deploy / build-admin-compliance (push) Successful in 2m27s
Build + Deploy / build-backend-compliance (push) Successful in 3m40s
Build + Deploy / build-ai-sdk (push) Successful in 1m0s
Build + Deploy / build-developer-portal (push) Successful in 1m16s
Build + Deploy / build-tts (push) Successful in 1m54s
Build + Deploy / build-document-crawler (push) Successful in 1m2s
Build + Deploy / build-dsms-gateway (push) Successful in 31s
Build + Deploy / build-dsms-node (push) Successful in 20s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m44s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 49s
CI / test-python-backend (push) Successful in 36s
CI / test-python-document-crawler (push) Successful in 25s
CI / test-python-dsms-gateway (push) Successful in 21s
CI / validate-canonical-controls (push) Successful in 14s
Build + Deploy / trigger-orca (push) Successful in 3m23s
- Remove generic B2G keywords (behörde, amt, öffentlich) that match in every DSI due to "Aufsichtsbehörde", "Amtsgericht", "veröffentlichen" - Remove "server" from it_services (too generic, appears in every DSI) - Add consulting, manufacturing, media industries - Add B2B fallback for GmbH/AG without B2C signals - Add 10 ground truth files for unified compliance check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
36c6101b91 |
Merge feat/zeroclaw-compliance-agent into main
Brings all compliance doc-check features: - 162 regex checks + 1874 Master Controls - LLM-agnostic agent with tool calling - Banner check (46 checks, 30 CMPs, stealth, Shadow DOM) - Impressum check (24 checks) - Deep consent verification (DataLayer, GCM, TCF) - CMP E2E tests (39 tests) - HTML email reports, FAQ, persistent history Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
1b5c6bd340 |
docs: Batch test results for 9 websites + EUIPO analysis
Build + Deploy / build-admin-compliance (push) Successful in 1m51s
Build + Deploy / build-backend-compliance (push) Successful in 8s
Build + Deploy / build-ai-sdk (push) Failing after 33s
Build + Deploy / build-developer-portal (push) Successful in 7s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 7s
Build + Deploy / build-dsms-gateway (push) Successful in 8s
Build + Deploy / build-dsms-node (push) Successful in 8s
CI / branch-name (push) Has been skipped
Build + Deploy / trigger-orca (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 18s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m8s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 46s
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Successful in 32s
CI / test-python-dsms-gateway (push) Successful in 24s
CI / validate-canonical-controls (push) Successful in 19s
Tested BMW, Stadt Koeln, BfDI, Sparkasse, Caritas, TUEV Sued, Spiegel, ETO Gruppe, EUIPO. Key findings: - Stadt Koeln + ETO Gruppe best (95% correctness) - BMW, Sparkasse, Spiegel genuinely deficient (verified) - EUIPO uses EU Regulation 2018/1725, not GDPR — needs separate checklist - ~0-2 false positives per website after LLM verification 7 regex fixes emerged from batch testing (soft hyphens, word insertions, numbered headings, German section names, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
fa4fd87102 |
fix: 7 regex bugs from IHK Konstanz ground truth analysis
Build + Deploy / build-admin-compliance (push) Successful in 9s
Build + Deploy / build-backend-compliance (push) Successful in 8s
Build + Deploy / build-ai-sdk (push) Successful in 42s
Build + Deploy / build-developer-portal (push) Successful in 8s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 7s
Build + Deploy / build-dsms-gateway (push) Successful in 8s
Build + Deploy / build-dsms-node (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 18s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m57s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 49s
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Successful in 28s
CI / test-python-dsms-gateway (push) Successful in 23s
CI / validate-canonical-controls (push) Successful in 15s
Build + Deploy / trigger-orca (push) Successful in 2m24s
Fixes based on manual verification of all 30 failed checks: 1. Cookie table: recognize "folgende cookies" + column headers as text 2. Cookie names: add JSESSIONID, cookieinfo, et_id, BT_* patterns 3. Essential justified: match "sitzung zuordnen", "betrieb der website" 4. Social bookmarks: recognize as 2-click alternative 5. DSFA plural: "kanaelen" now matches alongside "kanal" 6. Section splitter: skip-headings no longer lose subsequent text (Risikoabwaegung section was cut from DSFA, losing risk scores) 7. Cookie legal basis: accept Art. 6(1)(f) in cookie context Reduces false positives from 7 to ~1-2 for IHK Konstanz test case. Ground truth table: zeroclaw/docs/ground-truth-ihk-konstanz.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
4f92e5056c |
docs: Complete agent architecture reference for reuse in other agents
Full documentation of the ZeroClaw compliance agent architecture: - System overview diagram (Frontend → Backend → LLM → Playwright) - Detailed request flow for Website-Scan mode (7 steps) - All 5 components: Frontend, Backend, Consent-Tester, Ollama, Soul Files - 20 banner checks across 3 files - LLM call patterns (/api/generate + /api/chat + think-mode stripping) - Blueprint for creating new agents (5 steps: Soul, Route, Page, Proxy, Docker) - Timeouts, environment variables, file reference with LOC counts Designed as reusable blueprint for Sales, HR, Finance, or other agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
e318215cc5 |
refactor: split agent_analyze_routes (420→309 LOC) + agent docs + migration
- Extracted website compliance checks + helpers to website_compliance_checks.py - Created agent documentation (zeroclaw/docs/compliance-agent.md) - DB migration 086 executed (compliance_agent_scans table) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |