Files
breakpilot-compliance/backend-compliance/compliance/services/mail_render_v2/_compose.py
T
Benjamin Admin c908fcd5eb feat(b19): Cookie-Coherence — 3-Layer-Lookup + Vendor-Karten + CSV
Adressiert das BMW-Beispiel (740 Cookies, Salesforce als "essential"
mit 1-Jahres-Lifetime, Pseudo-Zwecke wie "Siehe dazugehörige
Datenverarbeitung"). User-Konzept "Regulation als Code".

Step 1 — cookie_library_lookup.py (3 Layer):
  1. Override = cookie_knowledge_db.py + extended (74) für
     Schrems-II / EUGH / EU-Alternative — BreakPilot-juristische-IP.
  2. Truth-Base = compliance.cookie_library (2287 aus Open Cookie
     Database, CC0). actual_category als Wahrheit.
  3. Auto-Learning = cookie_behavior_audits — Cross-Site-Konsens
     wenn ≥3 Sites denselben Cookie melden.

  Match: exact > prefix (mit Separator-Check) > wildcard. Kurze
  Library-Namen ("c", "ID") brauchen exact-match — verhindert
  False-Positive auf "completely_unknown". Trailing-Underscore
  in OCD ("guest_uuid_essential_") wird als implicit-wildcard
  interpretiert.

Step 2 — cookie_coherence_check.py (B19, 6 Finding-Typen):
  - MARKETING_AS_ESSENTIAL (HIGH): KB sagt actual=marketing, Site
    deklariert essential/erforderlich → Einwilligung wird umgangen
  - LIFETIME_TOO_LONG_FOR_ESSENTIAL (MED): essential + >90d
  - PSEUDO_PURPOSE (LOW): "Siehe dazugehörige Datenverarbeitung"
    / <4 Wörter (suppressed wenn Vendor-Purpose substantial ist)
  - MISSING_COUNTRY (LOW): vendor_country leer trotz KB-Hit
  - UNKNOWN_VENDOR (LOW): nicht in KB → Auto-Learning-Kandidat
  - DUPLICATE_VENDOR (MED): selber Vendor in N Kategorien =
    Stack-Aufspaltung um Marketing unter "essential" zu schmuggeln

  Jedes Finding mit recommended_action ("Cookie X aus 'erforderlich'
  raus und in 'Marketing' setzen").

Step 3 — cookie_observation_logger.py:
  Loggt nach jedem Audit alle (cookie, site, declared_purpose) in
  compliance.cookie_behavior_audits → Basis für Cross-Site-Konsens
  in Layer 3.

Step 4 — cookie_csv_exporter.py:
  cookies-full-{check_id}.csv mit 21 Spalten (Name, Vendor decl/KB,
  Cat decl/KB, Lifetime decl/KB, Country, Opt-Out, 8x FIND_* flags,
  recommended_action). UTF-8 BOM für Excel.
  ZIP-Attachment: erweitert audit_walk_zip_builder um extra_files=
  parameter; phase_e ruft mit cookies-full-...csv auf.

Step 5 — mail_render_v2/_vendor_cards.py:
  Statt 740 Cookie-Rows: Aggregation pro Vendor mit Cookie-Count +
  Issue-Count + 1-2 Beispiel-Cookies + Issue-Type-Tags. Top 30
  Vendoren in der Mail, Rest nur in CSV. Sortiert nach Issue-Score.

Step 6 — render_info_box_rechtsrahmen():
  Generic Header-Info-Box mit Art. 13 DSGVO + § 25 TDDDG + Art. 5
  + § 5 UWG + § 30/130 OWiG. Immer angezeigt, kein explicit-
  finding-mapping (User-mündigkeit).

Orchestrator + _compose: run_b19 + render_vendor_cards +
  render_info_box_rechtsrahmen ins V2-Layout.

Tests: 28/28 grün (15 lookup + 13 coherence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 23:48:04 +02:00

92 lines
3.2 KiB
Python

"""Mail-V2 compose — single entrypoint that returns the full HTML.
Call `compose_v2(state)` from the email-dispatch phase when
`MAIL_RENDER_V2=true`. Default remains the legacy compose so we can
A/B in Mailpit.
"""
from __future__ import annotations
import os
from ._blocks import (
render_attachments,
render_caveats,
render_header,
render_per_doc,
render_per_theme,
render_sofortmassnahmen,
render_toc,
)
from ._blocks_findings import (
render_critical,
render_internal_reminders,
render_manual_review,
)
from ._vendor_cards import (
render_info_box_rechtsrahmen,
render_vendor_cards,
)
from ._legacy_wrappers import render_all_legacy
from ._style import page_close, page_open
def compose_v2(state: dict) -> str:
"""Build the full audit-mail HTML in the V2 layout."""
site = state.get("site_name") or ""
parts = [
page_open(site),
render_header(state),
render_info_box_rechtsrahmen(),
render_toc(state),
render_vendor_cards(
state.get("cmp_vendors") or [],
state.get("cookie_coherence_findings") or [],
),
render_critical(state),
render_manual_review(state),
render_internal_reminders(state),
render_sofortmassnahmen(state),
render_per_doc(state),
render_per_theme(state),
# B4 — Cross-Doc Vendor-Consistency (Elli Vertex↔Iadvize pattern)
state.get("vendor_consistency_html", ""),
# B5 — AI-Act Art. 50 Transparenzpflicht
state.get("ai_act_html", ""),
# B6/B7/B8/B9/B10 — DPO + Staleness + CMP + MultiEntity + Transfer
state.get("extra_findings_html", ""),
# B12 Chatbot-Cookie-Klassifikation
state.get("chatbot_cookie_html", ""),
# B13 Widerrufsbelehrung-Reachability (B2C-Pflicht)
state.get("widerruf_reach_html", ""),
# B14 Widersprüchliche Speicherdauer im selben Doc
state.get("retention_conflict_html", ""),
# B15 AI-Act Rechtsgrundlage (LLM-Vendor auf lit. f)
state.get("ai_legal_basis_html", ""),
# B16 Footer-Label-vs-URL-Slug-Drift (SEO / Bookmarks)
state.get("url_slug_drift_html", ""),
# B17 Audit-Walk-Video (Beweis-Aufzeichnung)
state.get("audit_walk_html", ""),
# B18 Impressum-Specialist-Agent (Pattern + LLM)
state.get("impressum_agent_html", ""),
# B19 Cookie-Coherence-Check (Salesforce-as-essential etc.)
state.get("cookie_coherence_html", ""),
# Browser-Matrix (Stage 1.c)
state.get("browser_matrix_html", ""),
# All legacy build_*_html() wrapped in V2 sections — preserves
# every information block from the old renderer (Exec Summary,
# Banner-Screenshot, VVT, Redundancy, Solutions, Diff, etc.)
render_all_legacy(state),
render_caveats(state),
render_attachments(state),
page_close(state.get("check_id", ""),
os.environ.get("BUILD_SHA", "unknown")),
]
return "".join(p for p in parts if p)
def is_v2_enabled() -> bool:
return os.environ.get("MAIL_RENDER_V2", "false").lower() in (
"true", "1", "yes", "on",
)