Files
breakpilot-compliance/backend-compliance/compliance/api/agent_doc_check_redundancy.py
T
Benjamin Admin 662327e8b4
CI / nodejs-build (push) Successful in 2m47s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
feat(compliance-check): MC-Classification + Embedding + Vendor-Redundanz + Action-Recipes + Borlabs-Features
Massiv-Update auf Basis BMW-Test-Iterationen (v1→v9):

Core Compliance-Check
- Sonnet check_type Klassifikation: text/process/review fuer alle 1874 MCs
  in compliance.doc_check_controls (script + Sidecar /data/mc_classification.db).
  rag_document_checker filtert auf check_type='text' fuer doc_check.
  Plus fits_doc_type-Audit (v2) + ui_only-Audit fuer DSA/E-Commerce-MCs in
  falscher doc_type-Schublade.
- scope_requires-Filter: biometric/ai_decision/child_targeting MCs werden
  per business_profile gefiltert (FRT skipped fuer BMW etc.).
- Embedding-Match (BGE-M3) als Phase-3 nach Regex-Match:
  Per-doc_type-Threshold-Override (impressum 0.50, dse/cookie 0.60),
  Short-Field-Rescue (15-Wort-Chunks) fuer Pflichtfelder im Impressum.
  Title+check_question als Embedding-Input fuer mehr Kontext.
- Cookie-Text-Routing: consent-tester gibt cmp_cookie_text aus dem
  CMP-Reconstruct zurueck, Backend bevorzugt das gegen DOM-Extraction
  wenn richer (BMW 1824 vs 600 Worte).

Vendor-Redundanz + EU-Alternativen + Cost-Saving
- vendor_redundancy.analyze() — funktionale Kategorisierung der CMP-Vendors,
  Detektion von Mehrfach-Anbietern pro Kategorie, EU-Alternative-Lookup
  (Matomo, IONOS, HERE, Friendly Captcha, Smart AdServer, ...).
- vendor_cost_estimator: Tier-Inferenz aus Cookie-Footprint (Cookie-Anzahl
  + Premium-Feature-Cookies + Third-Party-Quote → starter/professional/
  enterprise/premier).
- Self-Service-Werbung (Google/Meta/Pinterest/...) = 0 Lizenz-Kosten
  (nur Media-Spend, separat). DSP-Plattformen behalten enge Range.
- Tier-aware Saving-Range: bei Enterprise/Premier nutzen wir den
  oberen 40-100%-Band der Listpreise, nicht starter→premier.
- Multi-Function-Tools (Matomo Pro, SAP CX, IONOS Cloud, Userlike, Smart
  AdServer, HERE Maps, Vimeo Pro, LamaPoll) — ein Tool ersetzt mehrere
  Kategorien gleichzeitig.

Cookie-Wissens-DB + Funktionale Klassifikation
- cookie_knowledge_db: 50 kuratierte Top-Cookies (Google/Meta/Adobe/MS/...)
  mit vendor, exact_purpose, data_collected, IAB-TCF-IDs, reid_risk,
  schrems_ii_status, EuGH-Urteile, EU-Alternative.
- cookie_function_classifier: pro Cookie funktionale Rolle (tracking_id,
  ad_pixel, session_id, ab_test, csrf, ...) + blocking_impact.

Country-Inferenz aus Rechtsform
- cookie_link_validator: Country-Field wird aus Vendor-Name abgeleitet
  (A/S=DK, GmbH=DE, Inc=US, B.V.=NL, ...) plus Vendor-Lookup-Table.
  Reduziert false-positive no_country-Flags bei eindeutig-EU-Vendors
  (Adform DK, Pinterest IE).

Action-Recipes + Doc-Anchor-Locator
- finding_action_recipes: pro Finding-Typ (no_cookies_listed, no_country,
  broken_opt_out, "Auftragsverarbeiter erwaehnen", "Art. 22 Profiling",
  ...) eine strukturierte Anweisung mit what/why/fix_text/where/example.
  Zum 1:1-Einfuegen in Kunden-Dokumente.
- doc_anchor_locator: Embedding-basiert (BGE-M3 cosine) — sucht den
  passenden Absatz im existierenden Kundendokument fuer jeden Finding.
  Per-Run Thread-Local-Cache. Fallback: keyword-Match.
- Email-Rendering integriert Recipe + Anchor pro Doc-Pruefungs-Fail
  + Vendor-Flag-Liste mit aufklappbarer Action-Liste.
- Score-Erklaerung pro Vendor-Zeile (3/5-Untertitel + Tooltip).

Migration-Pipeline (Compliance-Check -> Customer Banner/Documents)
- migration_to_banner.py: Vendor-Liste -> CookieBannerConfig mit
  4 Kategorien + Review-Flags.
- migration_to_document.py: Vendor-Liste -> Cookie-Policy + VVT-Register
  + Privacy-Policy-Pre-Fills.
- agent_migration_routes: 3 Preview-Endpoints (banner-preview,
  document-preview, summary). Persistierung der cmp_vendors in
  /data/compliance_audits.db check_payloads-Tabelle.

Borlabs-Parity Cookie-Banner-Features
- Consent-Historie im Banner: window.bpShowConsentHistory() + localStorage.
- Content-Blocker: cookie-banner-content-blocker.ts — YouTube/Maps/Video
  Placeholder bis Einwilligung.
- Google Consent Mode v2 erweitert: wait_for_update + region=EEA/CH/GB.
- Consent-Log Export (CSV/JSON) per einwilligungen_export_routes.

Bug-Fixes
- canonical_control_routes: _jsonish-Helper fuer string-typed jsonb,
  similar-controls-Endpoint mit _has_embedding_col()-Cache (kein 500 mehr).
- Control-Library Frontend: defensive .map-Coercer in 2 Detail-Views.
- Embedding-Service-Batching (32er Batches statt 165 in einem Call).
- KeyError 'control_id' in MC-Result-Aggregation (defensive .get).
- Master-Controls-Klick-Through von /sdk/master-controls auf
  /sdk/control-library?control=<id> mit URL-Param-Auto-Open.
- Dockerfile: /data pre-chowned auf appuser (Audit-DB-Schreibrecht).
- Cookie-Text-Routing-Bug (cmp_reconstructed > DOM-extraction).
- doc_type-aware MC-Filter (statt all-text-MCs).
- Master-Contract-Dedup (60 BMW-Internal-Eintraege = 1 Adobe-Vertrag).
- A3-v2-Audit hat 24 UI-Sprache-MCs als 'process' reklassifiziert.

Tests
- test_migration_mappers.py (9 Tests)
- test_migration_endpoints.py (4 Tests)

Skripte (one-shot)
- classify_mc_check_type.py (v1) + _v2 (PK=control_id,doc_type)
- audit_mc_doctype_fit.py (v1 fits) + _v2 (ui_only + scope_requires)

BMW-Run-Bilanz v1 (broken) -> v9 (alle Fixes):
  DSE     7,5% -> 81-83%
  Impressum 4%   -> 100% (6 echte MCs alle erfuellt)
  Cookie  0%    -> 79-83% (CMP-Text-Routing + Embedding)
  Plus: 10 Konsolidierungs-Kategorien, geschaetzte Saving 200k-3M / Jahr
  Plus: Action-Recipes + Doc-Anchors fuer jeden Fail

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:30:08 +02:00

142 lines
6.7 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
Email-Renderer fuer den Vendor-Redundanz + EU-Alternativen + Cost-/Savings-Block.
Wird im Email-Body unter dem VVT eingebaut.
"""
from __future__ import annotations
def _fmt_eur(low: int, high: int) -> str:
if not low and not high:
return "im Listpreis bundled"
if low == high:
return f"~{low:,}".replace(",", ".")
return f"{low:,}{high:,}".replace(",", ".")
def build_redundancy_html(report: dict | None) -> str:
if not report:
return ""
s = report.get("summary") or {}
redundancies = report.get("redundancies") or []
eu_alts = report.get("eu_alternatives") or []
multi = report.get("multi_function_tools") or []
cur = s.get("estimated_current_year_eur") or [0, 0]
sav = s.get("estimated_saving_year_eur") or [0, 0]
pct = s.get("estimated_saving_pct") or "n/a"
parts = [
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
'max-width:700px;margin:0 auto 16px;padding:14px 18px;'
'background:#fef3c7;border:1px solid #fcd34d;border-radius:8px">',
'<h3 style="margin:0 0 6px;font-size:14px;color:#92400e">'
'Optimierungspotenzial: Redundanzen + EU-Alternativen</h3>',
f'<p style="margin:0 0 10px;font-size:11px;color:#78350f">'
f'<strong>{s.get("redundancy_count", 0)}</strong> Kategorien mit '
f'mehreren Anbietern · <strong>{s.get("consolidation_potential", 0)}</strong> '
f'Anbieter konsolidierbar · '
f'<strong>{s.get("eu_alternative_count", 0)}</strong> EU-Alternativen verfuegbar</p>',
'<div style="background:#fff;border:1px solid #fcd34d;border-radius:6px;'
'padding:10px 12px;margin-bottom:10px">',
'<div style="font-size:10px;color:#94a3b8;margin-bottom:6px;text-transform:uppercase;letter-spacing:0.5px">'
'Diese Schaetzung umfasst NUR die als redundant erkannten Tools — '
'nicht den Gesamt-Stack der Website</div>',
f'<div style="font-size:11px;color:#78350f">'
f'Listpreis-Schaetzung der <strong>redundanten</strong> Tools '
f'(Mehrfach-Anbieter in derselben Funktions-Kategorie):'
f' <strong>{_fmt_eur(*cur)}/Jahr</strong></div>',
f'<div style="font-size:11px;color:#16a34a;margin-top:4px">'
f'Sparpotenzial durch Konsolidierung auf je 1 EU-Tool pro Kategorie:'
f' <strong>{_fmt_eur(*sav)}/Jahr</strong> ({pct})</div>',
'<div style="font-size:10px;color:#94a3b8;margin-top:8px;font-style:italic">'
'<strong>Wichtige Einschraenkungen:</strong><br/>'
'• Konzern-Konditionen liegen ueblicherweise 3050% unter Listpreis — '
'realistisches Saving entsprechend €X·0,5 bis €X·0,7.<br/>'
'• Eintraege "<em>Eigene Marke — Tool</em>" (z.B. "BMW AG — Adobe Analytics") '
'gehoeren oft zu einem einzigen Master-Vertrag, nicht zu mehreren Lizenzen.<br/>'
'• Media-Spend (Google Ads, Meta Ads) ist NICHT enthalten — nur Tooling-Lizenzen.<br/>'
'• Quelle: Gartner/Forrester 2025 + oeffentliche Listpreise.'
'</div></div>',
]
if redundancies:
parts.append(
'<table style="width:100%;border-collapse:collapse;font-size:11px;'
'margin-bottom:10px">'
'<thead><tr style="background:#fde68a;color:#78350f;text-align:left">'
'<th style="padding:6px 8px">Kategorie</th>'
'<th style="padding:6px 8px">#</th>'
'<th style="padding:6px 8px">Anbieter</th>'
'<th style="padding:6px 8px">EU-Empfehlung</th>'
'<th style="padding:6px 8px;text-align:right">Saving / Jahr</th>'
'</tr></thead><tbody>'
)
for r in redundancies[:12]:
vendors_str = ", ".join(r.get("vendors", [])[:6])
if len(r.get("vendors", [])) > 6:
vendors_str += f" (+{len(r['vendors']) - 6} weitere)"
sav_r = r.get("estimated_saving_year_eur") or [0, 0]
parts.append(
f'<tr style="border-top:1px solid #fde68a;vertical-align:top">'
f'<td style="padding:5px 8px;color:#78350f;font-weight:600">{r["category_label"]}</td>'
f'<td style="padding:5px 8px;text-align:center">{r["count"]}</td>'
f'<td style="padding:5px 8px;color:#1e293b;font-size:10px">{vendors_str}</td>'
f'<td style="padding:5px 8px;color:#16a34a;font-size:10px">{r.get("suggested_eu_tool") or ""}</td>'
f'<td style="padding:5px 8px;text-align:right;color:#16a34a;font-weight:600">'
f'{_fmt_eur(*sav_r)}</td></tr>'
)
hint = r.get("consolidation_hint")
if hint:
parts.append(
f'<tr><td colspan="5" style="padding:0 8px 8px;color:#94a3b8;font-size:10px;font-style:italic">'
f'Hinweis: {hint}</td></tr>'
)
caveats = r.get("caveats") or []
if caveats:
parts.append(
f'<tr><td colspan="5" style="padding:0 8px 8px;color:#94a3b8;font-size:10px">'
f'<strong>Moegliche Gruende fuer Mehrfach-Einsatz:</strong> '
+ "; ".join(caveats) + '</td></tr>'
)
parts.append('</tbody></table>')
if multi:
parts.append(
'<div style="margin-top:8px"><strong style="font-size:11px;color:#78350f">'
'Multi-Funktions-Tools (1 Tool ersetzt mehrere Kategorien):</strong>'
'<ul style="margin:6px 0 0 18px;padding:0;font-size:11px;color:#78350f">'
)
for t in multi[:4]:
cats = ", ".join(t.get("replaces_categories", []))
parts.append(
f'<li style="margin-bottom:3px"><strong>{t["name"]}</strong>'
f' ({t["country"]}) — ersetzt <em>{cats}</em>'
f' ({t.get("potential_replacements", 0)} Anbieter heute)</li>'
)
parts.append('</ul></div>')
if eu_alts:
parts.append(
'<details style="margin-top:8px"><summary style="font-size:11px;color:#78350f;'
'cursor:pointer">EU-Alternativen pro Anbieter (Details)</summary>'
'<ul style="margin:6px 0 0 18px;padding:0;font-size:10px;color:#475569">'
)
for e in eu_alts[:20]:
first_alt = (e.get("alternatives") or [{}])[0]
parts.append(
f'<li style="margin-bottom:3px"><strong>{e["current_vendor"]}</strong>'
f'{first_alt.get("name", "")} ({first_alt.get("country", "")})'
f' <span style="color:#94a3b8">— {first_alt.get("notes", "")}</span></li>'
)
parts.append('</ul></details>')
parts.append('</div>')
return "".join(parts)