Files
breakpilot-compliance/backend-compliance/tests/test_cross_domain_doc_check.py
T
Benjamin Admin d208a2bde2
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 13s
CI / go-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / python-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
feat: Mail-Restrukturierung + B22 Cross-Domain-Doc-Detector
User-Feedback BMW v5: "740 Cookies verschwunden auf 31, Übersicht
verloren". Drei Anpassungen:

Mail-Restrukturierung (_executive_summary.py + _compose.py):
  - render_executive_summary(): Top-of-mail TL;DR mit
    Compliance-Score (gross + farbig), Top-3-Findings nach
    Severity, Cookie-Statistik (deklariert/Browser/Drittland),
    Severity-Verteilungs-Chips.
  - collapsible(): wrapt jeden Block in <details>/<summary>.
    Mailpit + alle modernen Mail-Clients rendern das nativ.
  - _compose.py: alle 18+ B-Blöcke + per_doc + per_theme +
    legacy_html in Akkordeons. NUR Critical-Findings + Sofort-
    massnahmen sind immer offen — Reviewer sieht ~15 Zeilen
    Übersicht und klappt selektiv auf.
  - Cookie-Inventar (742) hat jetzt eigene Sektion ganz oben
    (Akkordeon "🍪 Cookie-Inventar"), Vendor-Karten parallel.

B22 Cross-Domain-Legal-Doc-Detector (cross_domain_doc_check.py):
  Real-Beispiel User-Feedback: Elli's AGB liegt auf docs.logpay.de
  statt elli.eco. Detektor erkennt SLD-Mismatch:
  - HIGH bei agb / widerruf (vertragsrelevant)
  - MEDIUM bei dse / nutzungsbedingungen
  - INFO bei cookie / impressum (Best-Practice)
  Norm: DSGVO Art. 28 (AVV-Pflicht für Hosting) + Art. 13 Abs. 1
  lit. e (Empfänger) + § 312i BGB (Cool-URLs).
  9/9 Tests grün inkl. Elli/LogPay Pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-08 11:35:55 +02:00

89 lines
3.1 KiB
Python

"""Tests for B22 Cross-Domain-Legal-Doc-Detector."""
from compliance.services.cross_domain_doc_check import (
_site_origin_sld,
_sld,
check_cross_domain_docs,
)
class TestSld:
def test_simple(self):
assert _sld("www.bmw.de") == "bmw"
def test_compound_tld(self):
assert _sld("docs.example.co.uk") == "example"
def test_no_www(self):
assert _sld("elli.eco") == "elli"
class TestPrimaryDetection:
def test_majority_wins(self):
state = {"doc_entries": [
{"url": "https://elli.eco/de/impressum"},
{"url": "https://elli.eco/de/datenschutz"},
{"url": "https://docs.logpay.de/_docs/agb.pdf"},
]}
assert _site_origin_sld(state) == "elli"
def test_auto_discovered_excluded(self):
# discovery results don't influence primary detection
state = {"doc_entries": [
{"url": "https://elli.eco/de/impressum", "auto_discovered": False},
{"url": "https://discovered.tld/foo", "auto_discovered": True},
]}
assert _site_origin_sld(state) == "elli"
class TestCheck:
def test_elli_logpay_pattern(self):
state = {"doc_entries": [
{"doc_type": "dse", "url": "https://www.elli.eco/de/datenschutz"},
{"doc_type": "impressum",
"url": "https://www.elli.eco/de/impressum"},
{"doc_type": "agb",
"url": "https://docs.logpay.de/_docs/de/"
"allgemeine_geschaeftsbedingungen_de_EM.pdf"},
]}
findings = check_cross_domain_docs(state)
assert len(findings) == 1
f = findings[0]
assert f["check_id"] == "CROSS-DOMAIN-DOC-001"
assert f["severity"] == "HIGH" # AGB is HIGH
assert f["doc_type"] == "agb"
assert f["site_sld"] == "elli"
assert f["host_sld"] == "logpay"
def test_same_subdomain_no_finding(self):
# docs.bmw.de is same SLD as www.bmw.de — no finding
state = {"doc_entries": [
{"doc_type": "dse",
"url": "https://www.bmw.de/de/datenschutz.html"},
{"doc_type": "agb",
"url": "https://docs.bmw.de/agb.pdf"},
]}
findings = check_cross_domain_docs(state)
assert findings == []
def test_no_primary_no_finding(self):
# No URLs at all
state = {"doc_entries": []}
assert check_cross_domain_docs(state) == []
def test_severity_per_doc_type(self):
state = {"doc_entries": [
{"doc_type": "agb", "url": "https://acme.de/x"},
{"doc_type": "dse",
"url": "https://docs.thirdparty.com/agb"},
{"doc_type": "impressum",
"url": "https://www.other.com/impressum"},
]}
findings = check_cross_domain_docs(state)
sev_by_doc = {f["doc_type"]: f["severity"] for f in findings}
# agb is on primary (acme.de) — no finding
# dse on thirdparty.com → MEDIUM
# impressum on other.com → INFO
assert sev_by_doc.get("dse") == "MEDIUM"
assert sev_by_doc.get("impressum") == "INFO"