feat: Browser-Matrix C2 + B11 AI-Retention + Impressum-Specialist-Agent + B1 Mobile Playwright

Task #15 Stage 1.c-e — Browser-Matrix Backend-Integration:
  - _phase_c2_browser_matrix.py: ruft consent-tester /scan-matrix wenn
    env BROWSER_MATRIX=true, fuellt state["browser_matrix"] +
    state["browser_aggregate"] + state["browser_matrix_html"]
  - V2-Mail-Block: 🌐 Browser-Matrix Tabelle (Profile · Score ·
    Sub-Scores PC/RR/BD · Bewertung) mit Worst-of-Header
  - Orchestrator ruft run_phase_c2 nach run_phase_c
  KNOWN: Stage 1.b (consent_scanner browser_profile-Param) bleibt
    zurueckgestellt (Datei in loc-exception, Hook-Patch verweigert).
    Stage 1.a-Shim laeuft im consent-tester — alle Profile aktuell
    auf Chromium, echte Engine-Diversitaet kommt mit 1.b.

Task #17 TH-RETENTION-002 als B11 ai_retention_granularity_check:
  - Erkennt AI-Provider-Kontext (vertex/openai/anthropic/etc)
  - In +-800-char-Window: prueft ≥2 Datenkategorien aus Standard-Liste
    (Texteingaben/IP/Geraet/Session/Fehlerprotokoll/Zeitstempel)
  - Wenn 1 pauschale Speicherdauer + ≥2 Kategorien aber kein
    per-Kategorie-Differential → LOW
  - Smoke: Elli-Mock-DSE trifft LOW "AI-Speicherdauer pauschal"

Task #18 Specialist-Agents Phase-1-Prototyp:
  - compliance/services/specialist_agents/__init__.py mit Architektur-Doku
  - impressum_agent.py: 9 Pflichtangaben § 5 TMG + § 1 DL-InfoV
    als Pattern-Registry (Name, Email, Telefon, HR, USt-IdNr,
    Vertretungsberechtigt, Aufsichtsbehoerde, Berufsangaben, OS-Link)
  - business_scope-aware (OS-Link nur fuer ecommerce, Aufsichtsbehoerde
    nur fuer regulated_profession/financial/insurance)
  - Phase-1 ist Pattern-Match-only (kein LLM), demonstriert die
    Schnittstelle. Phase 2 ersetzt Pattern durch System-Prompt + KB.
  - Smoke: minimal-Impressum triggert 4 Findings korrekt

Task #7 B1 Playwright Mobile-Verifikation:
  - consent-tester/services/mobile_reachability_scanner.py: echte
    WebKit-launch + p.devices['iPhone 15'] preset + de-DE locale +
    Europe/Berlin timezone
  - Footer-Anchor-Suche via locator("footer >> text=/.../i") fuer
    13 Reopen-Phrasen
  - Tap-Target-Boundingbox-Messung (Apple HIG / WCAG ≥44x44)
  - Click-Behavior: DOM-Modal-Snapshot vor/nach, erkennt CMP-Open
  - Output: has_anchor, anchor_text, tap_target_px, click_opens_cmp,
    engine_meta, screenshot_b64 (Footer-Crop wenn kein Anchor)
  - consent-tester/routes_mobile.py POST /scan-mobile-reachability
  - Backend _b1_wiring erweitert: ruft Mobile-Endpoint zuerst,
    Fallback auf statischen HTTP-Fetch. Mobile-Daten enrichen
    finding.mobile_playwright + Severity-Bump bei
    tap-target<44 / click-doesnt-open-CMP.
  KNOWN: WebKit-System-Libs sind im Dockerfile ergaenzt (Stage 1.a-
    Commit), greifen aber erst nach CI/CD-Rebuild des consent-tester.
    Bis dahin faellt B1 sauber auf statischen Fetch zurueck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-06 22:20:25 +02:00
parent e1dadc8027
commit 37093ff9e3
11 changed files with 702 additions and 7 deletions
@@ -42,6 +42,29 @@ async def run_b1(state: dict) -> None:
return
_update(check_id, "Mobile Consent-Reachability prüfen...", 95)
# Try the new Playwright WebKit + iPhone scan first (Task #7).
# Falls back to static HTTP fetch on error.
mobile = None
try:
from ._constants import CONSENT_TESTER_URL
async with httpx.AsyncClient(timeout=60.0) as c:
r = await c.post(
f"{CONSENT_TESTER_URL}/scan-mobile-reachability",
json={"url": homepage_url},
)
if r.status_code == 200:
mobile = r.json()
logger.info(
"B1 Mobile-Playwright: has_anchor=%s tap=%s click_opens=%s",
mobile.get("has_anchor"),
mobile.get("tap_target_px"),
mobile.get("click_opens_cmp"),
)
except Exception as e:
logger.info("B1 Mobile-Playwright fallback to static fetch: %s", e)
page_html = None
try:
async with httpx.AsyncClient(
timeout=20.0, follow_redirects=True,
@@ -50,21 +73,53 @@ async def run_b1(state: dict) -> None:
"Version/17.5 Mobile/15E148 Safari/604.1"},
) as c:
r = await c.get(homepage_url)
if r.status_code != 200:
logger.info("B1: homepage fetch %s → HTTP %d", homepage_url, r.status_code)
return
page_html = r.text
if r.status_code == 200:
page_html = r.text
except Exception as e:
logger.warning("B1: homepage fetch failed: %s", e)
if not page_html and not mobile:
return
finding = evaluate_reachability(page_html, homepage_url)
finding = evaluate_reachability(page_html or "", homepage_url)
# Enrich finding with mobile-playwright details when available
if mobile and mobile.get("has_anchor"):
finding["mobile_playwright"] = {
"has_anchor": mobile.get("has_anchor"),
"anchor_text": mobile.get("anchor_text"),
"tap_target_px": mobile.get("tap_target_px"),
"click_opens_cmp": mobile.get("click_opens_cmp"),
"engine_meta": mobile.get("engine_meta"),
}
# Tap-target rule (Apple HIG / WCAG 2.5.5): ≥ 44 px each side
tp = mobile.get("tap_target_px") or {}
if tp and (tp.get("w", 0) < 44 or tp.get("h", 0) < 44):
finding["notes"] = (finding.get("notes") or []) + [
f"tap-target nur {tp.get('w')}×{tp.get('h')}px "
"(Apple HIG / WCAG verlangen ≥ 44×44)",
]
if finding.get("passed"):
finding["passed"] = False
finding["severity"] = "MEDIUM"
finding["severity_reason"] = "misclassified"
# If anchor exists in DOM but click doesn't open CMP, bump severity
if mobile.get("has_anchor") and not mobile.get("click_opens_cmp"):
finding["notes"] = (finding.get("notes") or []) + [
"click auf Footer-Link öffnet CMP nicht direkt",
]
if finding.get("severity_reason") != "factually_wrong":
finding["severity"] = "MEDIUM"
finding["severity_reason"] = "misclassified"
finding["passed"] = False
state["reachability_finding"] = finding
state["reachability_html"] = _render_block(finding)
logger.info(
"B1 Reachability: passed=%s severity=%s reason=%s",
"B1 Reachability: passed=%s severity=%s reason=%s mobile=%s",
finding["passed"], finding.get("severity"),
finding.get("severity_reason"),
bool(mobile),
)
@@ -9,6 +9,9 @@ from __future__ import annotations
import html
import logging
from compliance.services.ai_retention_granularity_check import (
check_ai_retention_granularity,
)
from compliance.services.impressum_multi_entity_check import (
check_multi_entity_impressum,
)
@@ -24,6 +27,7 @@ def run_b9b10(state: dict) -> None:
new: list[dict] = []
new.extend(check_multi_entity_impressum(state))
new.extend(check_transfer_mechanism(state))
new.extend(check_ai_retention_granularity(state))
if not new:
return
extras.extend(new)
@@ -51,6 +51,8 @@ async def run_compliance_check(check_id: str, req) -> None:
await run_phase_b(state)
# Phase C: Step 3b-d (banner + cross-check + TCF) + Step 4
await run_phase_c(state)
# Phase C-2: optional browser-matrix scan (env BROWSER_MATRIX=true)
await run_phase_c2(state)
# Phase D-1/D-2: Step 5 vendor extraction + finalize
await run_phase_d1(state)
await run_phase_d2(state)
@@ -0,0 +1,146 @@
"""Phase C-2 — Browser-Matrix Multi-Browser Scan (Stage 1.c).
After the single-browser scan in Phase C, optionally fan out to the
consent-tester /scan-matrix endpoint that runs the same probe on
chromium / firefox / webkit / mobile-safari and returns a worst-of
robustness score per browser.
Activated by env `BROWSER_MATRIX=true`. Default off so existing
runs aren't slowed down 4× while we tune.
The state gets these new keys:
state["browser_matrix"] list[dict] per-profile results
state["browser_aggregate"] dict worst/best score + verbal
state["browser_matrix_html"] str pre-rendered V2 block
"""
from __future__ import annotations
import logging
import os
from html import escape as h
import httpx
from ._constants import CONSENT_TESTER_URL
from ._helpers import _update
logger = logging.getLogger(__name__)
async def run_phase_c2(state: dict) -> None:
if os.environ.get("BROWSER_MATRIX", "false").lower() not in (
"true", "1", "yes", "on",
):
return
check_id = state["check_id"]
req = state["req"]
banner_url = ""
for d in req.documents:
if d.url:
from urllib.parse import urlparse
p = urlparse(d.url)
if p.scheme and p.netloc:
banner_url = f"{p.scheme}://{p.netloc}"
break
if not banner_url:
return
_update(check_id, "Browser-Matrix: Multi-Engine-Scan...", 83)
profiles_env = os.environ.get("BROWSER_MATRIX_PROFILES", "")
profiles = [p.strip() for p in profiles_env.split(",") if p.strip()] or None
try:
async with httpx.AsyncClient(timeout=600.0) as c:
r = await c.post(
f"{CONSENT_TESTER_URL}/scan-matrix",
json={
"url": banner_url,
"timeout_per_phase": 10,
"categories": [],
"browser_profiles": profiles,
},
)
if r.status_code != 200:
logger.warning("browser-matrix scan HTTP %d", r.status_code)
return
data = r.json()
except Exception as e:
logger.warning("browser-matrix scan failed: %s", e)
return
state["browser_matrix"] = data.get("browser_matrix") or []
state["browser_aggregate"] = data.get("aggregate") or {}
state["browser_matrix_html"] = _render(
state["browser_matrix"], state["browser_aggregate"],
)
logger.info(
"browser-matrix: %d profiles, worst=%s@%s%%, best=%s@%s%%",
len(state["browser_matrix"]),
state["browser_aggregate"].get("worst_profile"),
state["browser_aggregate"].get("worst_score"),
state["browser_aggregate"].get("best_profile"),
state["browser_aggregate"].get("best_score"),
)
def _render(rows: list[dict], aggregate: dict) -> str:
if not rows:
return ""
table_rows = []
for r in rows:
sev = ("fail" if r["score"] < 60
else "warn" if r["score"] < 80 else "pass")
color = ("#dc2626" if sev == "fail"
else "#f59e0b" if sev == "warn" else "#15803d")
dims = r.get("dimensions") or {}
dims_str = (
f"PC {int(dims.get('pre_consent',0)*100)}% · "
f"RR {int(dims.get('reject_respect',0)*100)}% · "
f"BD {int(dims.get('banner_design',0)*100)}%"
)
table_rows.append(
"<tr>"
f"<td style='padding:6px 10px;border-bottom:1px solid #e5e7eb;'>"
f"{h(r.get('label') or r.get('profile_id') or '')}</td>"
f"<td style='padding:6px 10px;border-bottom:1px solid #e5e7eb;"
f"color:{color};font-weight:600;'>{r['score']}%</td>"
f"<td style='padding:6px 10px;border-bottom:1px solid #e5e7eb;"
f"color:#475569;font-size:12px;'>{dims_str}</td>"
f"<td style='padding:6px 10px;border-bottom:1px solid #e5e7eb;"
f"color:#475569;font-size:13px;'>{h(r.get('verbal',''))}</td>"
"</tr>"
)
worst = aggregate.get("worst_score", 0)
sev_color = ("#dc2626" if worst < 60
else "#f59e0b" if worst < 80 else "#15803d")
head = (
f"<p style='margin:0 0 8px;font-size:13px;color:#475569;'>"
f"<strong style='color:{sev_color};'>Worst-of {worst}%</strong> "
f"(Profil <code>{aggregate.get('worst_profile','')}</code>) — "
f"Best-of {aggregate.get('best_score','')}% "
f"(<code>{aggregate.get('best_profile','')}</code>). "
"Aggregierter Score nach Worst-of-Regel: ein HIGH-Verstoß "
"auf einem Browser kappt den Gesamt-Score.</p>"
)
return (
"<div style='margin:24px 0;padding:16px;border-left:4px solid "
f"{sev_color};background:#f8fafc;border-radius:4px;'>"
"<h2 style='margin:0 0 8px;color:#1e293b;font-size:16px;'>"
"🌐 Browser-Matrix · Consent-Robustness pro Engine"
"</h2>"
f"{head}"
"<table style='width:100%;border-collapse:collapse;font-size:13px;"
"margin-top:8px;background:#fff;'>"
"<thead><tr style='background:#f1f5f9;'>"
"<th style='text-align:left;padding:6px 10px;'>Browser-Profil</th>"
"<th style='text-align:left;padding:6px 10px;'>Score</th>"
"<th style='text-align:left;padding:6px 10px;'>Pre-Consent · "
"Reject-Respekt · Banner-Design</th>"
"<th style='text-align:left;padding:6px 10px;'>Bewertung</th>"
"</tr></thead>"
f"<tbody>{''.join(table_rows)}</tbody></table>"
"</div>"
)