051890c370584e44a37235120f7be73e4507c5aa
5 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
4bfb438c92 |
feat: 4 banner check upgrades — 30 CMPs, stealth, Shadow DOM, categories
Build + Deploy / build-admin-compliance (push) Successful in 2m17s
Build + Deploy / build-backend-compliance (push) Successful in 3m17s
Build + Deploy / build-ai-sdk (push) Successful in 56s
Build + Deploy / build-developer-portal (push) Successful in 1m37s
Build + Deploy / build-tts (push) Successful in 1m33s
Build + Deploy / build-document-crawler (push) Successful in 42s
Build + Deploy / build-dsms-gateway (push) Successful in 33s
Build + Deploy / build-dsms-node (push) Successful in 16s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 25s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m33s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 1m18s
CI / test-python-backend (push) Successful in 53s
CI / test-python-document-crawler (push) Successful in 36s
CI / test-python-dsms-gateway (push) Successful in 33s
CI / validate-canonical-controls (push) Successful in 24s
Build + Deploy / trigger-orca (push) Successful in 3m19s
1. 30 CMP selectors (was 10): Added Sourcepoint, Iubenda, Complianz, CookieFirst, HubSpot, Osano, Piwik PRO, Cookie Consent (Insites), Axeptio, Termly, CookieScript, Civic UK, GDPR Cookie Compliance, CookieHub, Ketch, Admiral, Sibbo, Evidon, LiveRamp, Adsimple. Plus improved generic fallback: role=dialog, aria-label, data-* attrs. 2. Playwright stealth mode: playwright-stealth against bot detection. Removes WebDriver flag, simulates plugins, realistic viewport/locale. Launch args: --disable-blink-features=AutomationControlled. 3. Shadow DOM: Recursive JS-based search through shadowRoot elements for consent banners. Fallback click via page.evaluate() when normal Playwright selectors can't penetrate Shadow DOM. 4. Category selection UI: User can choose which cookie categories to test (Notwendig, Statistik, Marketing, Funktional, Praeferenzen). Pill-style checkboxes in BannerCheckTab, forwarded through API chain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
686834cea0 |
feat: 4 remaining tasks — EU institutions, banner integration, JS-sites, Caritas fixes
Build + Deploy / build-admin-compliance (push) Successful in 8s
Build + Deploy / build-backend-compliance (push) Successful in 8s
Build + Deploy / build-ai-sdk (push) Failing after 36s
Build + Deploy / build-developer-portal (push) Successful in 8s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 7s
Build + Deploy / build-dsms-gateway (push) Successful in 8s
Build + Deploy / build-dsms-node (push) Successful in 8s
CI / branch-name (push) Has been skipped
Build + Deploy / trigger-orca (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m14s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 46s
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Successful in 29s
CI / test-python-dsms-gateway (push) Successful in 30s
CI / validate-canonical-controls (push) Successful in 16s
1. EU Institution Checks (Verordnung 2018/1725): - New doc_type "eu_institution" with 9 L1 + 15 L2 checks - Both German + English patterns (EU institutions are multilingual) - Auto-detection via "2018/1725", "EDSB", "EDPS" keywords - Correct article references (Art. 15 instead of 13, Art. 5 instead of 6) 2. Banner Check Integration: - banner_runner.py maps scan results to 36 L1/L2 structured checks - BannerCheckTab shows hierarchical ChecklistView with hints - 3-phase summary (cookies/scripts before/after consent) - /scan endpoint now includes structured_checks in response 3. JS-heavy Website Fixes (dm, Zalando, HWK): - dsi_helpers.py: goto_resilient (networkidle→domcontentloaded fallback) - try_dismiss_consent_banner before text extraction - PDF redirect detection (dm.de redirects to GCS PDF) 4. Caritas False Positive Fixes: - Phone regex allows parentheses: +49 (0)761 → now matches - "Recht auf Widerspruch" (3 words) + §23 KDG → matches Art. 21 - Church authorities: "Katholisches Datenschutzzentrum" recognized Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
a349111a01 |
fix: Raise full_text limit 10K→50K + combine all DSI texts for checks
Two fixes: 1. consent-tester: full_text truncation raised from 10,000 to 50,000 chars (IHK Internetangebot has ~50K chars, Beschwerderecht was after 10K cutoff) 2. Backend: dse_text now combines Playwright HTML + ALL DSI discovery texts for mandatory content checking. Previously only used first 8K chars from one source, missing Verantwortlicher/DSB that were in DSI documents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
a3f7fb93f4 |
fix: Scan quality — raise page limit, use full DSI text for checks
Bug 1: max_pages was hardcoded to 15 in backend call — raised to 50
Bug 2: DSI documents checked against text_preview (500 chars) — now uses
full_text (10,000 chars) for Art. 13 mandatory field checks
Bug 3: DSE text not found when Playwright misses DSE page — now falls
back to DSI Discovery full_text as second source
Bug 4: Backend timeout 120s too short for 50 pages — raised to 300s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
4e63a6050d |
feat: Generic legal document discovery (DSI, AGB, Widerruf, Cookie-Richtlinie)
New service: dsi_discovery.py — finds ALL legal documents on any website:
- Technology-agnostic: HTML, SPA, WordPress, Typo3, custom CMS
- Structure-agnostic: accordions, sidebars, footers, inline links, tabs
- Format-agnostic: HTML pages, anchor sections, PDFs, cross-domain links
- Language-agnostic: 26 EU/EEA languages with document-type keywords
Document types discovered:
- Datenschutzinformationen / Privacy Policies (Art. 13/14 DSGVO)
- AGB / Terms of Service / Nutzungsbedingungen
- Widerrufsbelehrung / Right of Withdrawal (§355 BGB)
- Cookie-Richtlinie / Cookie Policy
- All cross-domain variants (e.g. help.instagram.com from instagram.com)
API: POST /dsi-discovery { url, max_documents }
Returns: list of documents with title, url, language, type, word_count, text_preview
Features:
- Expands all accordions, details, tabs, dropdowns before scanning
- Follows cross-domain links (same registrable domain)
- Re-expands after navigation back to source page
- Handles anchor links (#sections) separately from full pages
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|