298c95731af0f58e35c2ae9da434e205b4452671
3 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
298c95731a |
feat: Generic legal document discovery (DSI, AGB, Widerruf, Cookie-Richtlinie)
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 22s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m35s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 52s
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Successful in 29s
CI / test-python-dsms-gateway (push) Successful in 21s
CI / validate-canonical-controls (push) Successful in 14s
New service: dsi_discovery.py — finds ALL legal documents on any website:
- Technology-agnostic: HTML, SPA, WordPress, Typo3, custom CMS
- Structure-agnostic: accordions, sidebars, footers, inline links, tabs
- Format-agnostic: HTML pages, anchor sections, PDFs, cross-domain links
- Language-agnostic: 26 EU/EEA languages with document-type keywords
Document types discovered:
- Datenschutzinformationen / Privacy Policies (Art. 13/14 DSGVO)
- AGB / Terms of Service / Nutzungsbedingungen
- Widerrufsbelehrung / Right of Withdrawal (§355 BGB)
- Cookie-Richtlinie / Cookie Policy
- All cross-domain variants (e.g. help.instagram.com from instagram.com)
API: POST /dsi-discovery { url, max_documents }
Returns: list of documents with title, url, language, type, word_count, text_preview
Features:
- Expands all accordions, details, tabs, dropdowns before scanning
- Follows cross-domain links (same registrable domain)
- Re-expands after navigation back to source page
- Handles anchor links (#sections) separately from full pages
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
129849aa21 |
feat: 9 new banner checks (12-20), total 20 compliance checks
CI / branch-name (push) Has been skipped
CI / test-python-dsms-gateway (push) Successful in 21s
CI / validate-canonical-controls (push) Successful in 13s
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 15s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m38s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 45s
CI / test-python-backend (push) Successful in 52s
CI / test-python-document-crawler (push) Successful in 30s
Check 12: Click count — reject requires more clicks than accept (CNIL 150M EUR) Check 13: Color contrast — reject button invisible (same bg as banner) Check 14: Google Consent Mode — analytics_storage 'granted' as default Check 15: Pre-consent cookies — tracking cookies set before any interaction Check 16: Registration coupling — login button = consent (Art. 7(4) DSGVO) Check 17: Language mismatch — banner vs page language (all 26 EU languages) Check 18: Consent cookie expiry — >13 months violates CNIL guidelines Check 19: Nudging — reject button below fold / requires scrolling Check 20: Emotional language (Stirring) — "volle Funktionalitaet" etc. Language detection covers: BG, CS, DA, DE, EL, EN, ES, ET, FI, FR, GA, HR, HU, IS, IT, LT, LV, MT, NL, NO, PL, PT, RO, SK, SL, SV New file: banner_advanced_checks.py (396 LOC) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
7fc43a3f1f |
feat: 3 new banner legal checks (11 total) + extract banner_text_checker
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 46s
CI / validate-canonical-controls (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 18s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m32s
CI / test-python-backend (push) Successful in 40s
CI / test-python-document-crawler (push) Successful in 25s
CI / test-python-dsms-gateway (push) Successful in 21s
New checks (from EUIPO reference case): - Check 9: Third-party DSE link — detects when consent dialog links to external domain's privacy policy instead of own DSE (Art. 13 DSGVO) - Check 10: Dark-pattern language — detects "muessen/erforderlich" for non-essential cookies suggesting false technical necessity (EDPB Rn. 70) - Check 11: Non-modal dismiss = consent — detects when clicking outside dialog closes it (possibly treating as consent, Planet49 violation) Refactor: extracted _check_banner_text (375 LOC) from consent_scanner.py into services/banner_text_checker.py to keep both files under 500 LOC. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |