Files
breakpilot-compliance/backend-compliance/compliance/services/vendor_redundancy.py
T
Benjamin Admin 662327e8b4
CI / nodejs-build (push) Successful in 2m47s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
feat(compliance-check): MC-Classification + Embedding + Vendor-Redundanz + Action-Recipes + Borlabs-Features
Massiv-Update auf Basis BMW-Test-Iterationen (v1→v9):

Core Compliance-Check
- Sonnet check_type Klassifikation: text/process/review fuer alle 1874 MCs
  in compliance.doc_check_controls (script + Sidecar /data/mc_classification.db).
  rag_document_checker filtert auf check_type='text' fuer doc_check.
  Plus fits_doc_type-Audit (v2) + ui_only-Audit fuer DSA/E-Commerce-MCs in
  falscher doc_type-Schublade.
- scope_requires-Filter: biometric/ai_decision/child_targeting MCs werden
  per business_profile gefiltert (FRT skipped fuer BMW etc.).
- Embedding-Match (BGE-M3) als Phase-3 nach Regex-Match:
  Per-doc_type-Threshold-Override (impressum 0.50, dse/cookie 0.60),
  Short-Field-Rescue (15-Wort-Chunks) fuer Pflichtfelder im Impressum.
  Title+check_question als Embedding-Input fuer mehr Kontext.
- Cookie-Text-Routing: consent-tester gibt cmp_cookie_text aus dem
  CMP-Reconstruct zurueck, Backend bevorzugt das gegen DOM-Extraction
  wenn richer (BMW 1824 vs 600 Worte).

Vendor-Redundanz + EU-Alternativen + Cost-Saving
- vendor_redundancy.analyze() — funktionale Kategorisierung der CMP-Vendors,
  Detektion von Mehrfach-Anbietern pro Kategorie, EU-Alternative-Lookup
  (Matomo, IONOS, HERE, Friendly Captcha, Smart AdServer, ...).
- vendor_cost_estimator: Tier-Inferenz aus Cookie-Footprint (Cookie-Anzahl
  + Premium-Feature-Cookies + Third-Party-Quote → starter/professional/
  enterprise/premier).
- Self-Service-Werbung (Google/Meta/Pinterest/...) = 0 Lizenz-Kosten
  (nur Media-Spend, separat). DSP-Plattformen behalten enge Range.
- Tier-aware Saving-Range: bei Enterprise/Premier nutzen wir den
  oberen 40-100%-Band der Listpreise, nicht starter→premier.
- Multi-Function-Tools (Matomo Pro, SAP CX, IONOS Cloud, Userlike, Smart
  AdServer, HERE Maps, Vimeo Pro, LamaPoll) — ein Tool ersetzt mehrere
  Kategorien gleichzeitig.

Cookie-Wissens-DB + Funktionale Klassifikation
- cookie_knowledge_db: 50 kuratierte Top-Cookies (Google/Meta/Adobe/MS/...)
  mit vendor, exact_purpose, data_collected, IAB-TCF-IDs, reid_risk,
  schrems_ii_status, EuGH-Urteile, EU-Alternative.
- cookie_function_classifier: pro Cookie funktionale Rolle (tracking_id,
  ad_pixel, session_id, ab_test, csrf, ...) + blocking_impact.

Country-Inferenz aus Rechtsform
- cookie_link_validator: Country-Field wird aus Vendor-Name abgeleitet
  (A/S=DK, GmbH=DE, Inc=US, B.V.=NL, ...) plus Vendor-Lookup-Table.
  Reduziert false-positive no_country-Flags bei eindeutig-EU-Vendors
  (Adform DK, Pinterest IE).

Action-Recipes + Doc-Anchor-Locator
- finding_action_recipes: pro Finding-Typ (no_cookies_listed, no_country,
  broken_opt_out, "Auftragsverarbeiter erwaehnen", "Art. 22 Profiling",
  ...) eine strukturierte Anweisung mit what/why/fix_text/where/example.
  Zum 1:1-Einfuegen in Kunden-Dokumente.
- doc_anchor_locator: Embedding-basiert (BGE-M3 cosine) — sucht den
  passenden Absatz im existierenden Kundendokument fuer jeden Finding.
  Per-Run Thread-Local-Cache. Fallback: keyword-Match.
- Email-Rendering integriert Recipe + Anchor pro Doc-Pruefungs-Fail
  + Vendor-Flag-Liste mit aufklappbarer Action-Liste.
- Score-Erklaerung pro Vendor-Zeile (3/5-Untertitel + Tooltip).

Migration-Pipeline (Compliance-Check -> Customer Banner/Documents)
- migration_to_banner.py: Vendor-Liste -> CookieBannerConfig mit
  4 Kategorien + Review-Flags.
- migration_to_document.py: Vendor-Liste -> Cookie-Policy + VVT-Register
  + Privacy-Policy-Pre-Fills.
- agent_migration_routes: 3 Preview-Endpoints (banner-preview,
  document-preview, summary). Persistierung der cmp_vendors in
  /data/compliance_audits.db check_payloads-Tabelle.

Borlabs-Parity Cookie-Banner-Features
- Consent-Historie im Banner: window.bpShowConsentHistory() + localStorage.
- Content-Blocker: cookie-banner-content-blocker.ts — YouTube/Maps/Video
  Placeholder bis Einwilligung.
- Google Consent Mode v2 erweitert: wait_for_update + region=EEA/CH/GB.
- Consent-Log Export (CSV/JSON) per einwilligungen_export_routes.

Bug-Fixes
- canonical_control_routes: _jsonish-Helper fuer string-typed jsonb,
  similar-controls-Endpoint mit _has_embedding_col()-Cache (kein 500 mehr).
- Control-Library Frontend: defensive .map-Coercer in 2 Detail-Views.
- Embedding-Service-Batching (32er Batches statt 165 in einem Call).
- KeyError 'control_id' in MC-Result-Aggregation (defensive .get).
- Master-Controls-Klick-Through von /sdk/master-controls auf
  /sdk/control-library?control=<id> mit URL-Param-Auto-Open.
- Dockerfile: /data pre-chowned auf appuser (Audit-DB-Schreibrecht).
- Cookie-Text-Routing-Bug (cmp_reconstructed > DOM-extraction).
- doc_type-aware MC-Filter (statt all-text-MCs).
- Master-Contract-Dedup (60 BMW-Internal-Eintraege = 1 Adobe-Vertrag).
- A3-v2-Audit hat 24 UI-Sprache-MCs als 'process' reklassifiziert.

Tests
- test_migration_mappers.py (9 Tests)
- test_migration_endpoints.py (4 Tests)

Skripte (one-shot)
- classify_mc_check_type.py (v1) + _v2 (PK=control_id,doc_type)
- audit_mc_doctype_fit.py (v1 fits) + _v2 (ui_only + scope_requires)

BMW-Run-Bilanz v1 (broken) -> v9 (alle Fixes):
  DSE     7,5% -> 81-83%
  Impressum 4%   -> 100% (6 echte MCs alle erfuellt)
  Cookie  0%    -> 79-83% (CMP-Text-Routing + Embedding)
  Plus: 10 Konsolidierungs-Kategorien, geschaetzte Saving 200k-3M / Jahr
  Plus: Action-Recipes + Doc-Anchors fuer jeden Fail

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:30:08 +02:00

728 lines
33 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
Vendor Redundancy + EU-Alternatives Analyzer.
Eingang: Liste von Vendors aus dem CMP-Capture (z.B. BMW 90 Vendors).
Ausgang: drei strukturierte Listen die im Email + Migration-Modal
gerendert werden:
1. functional_categories : Vendor → Funktionsklasse (analytics,
advertising, cdn, captcha, chat, …)
2. redundancies : Kategorien mit ≥2 Vendors die dasselbe tun
→ Konsolidierungspotenzial
3. eu_alternatives : pro US-Vendor passender EU-Ersatz aus
kuratierter Lookup-Tabelle (Matomo statt
Adobe Analytics, IONOS statt AWS, etc.)
4. multi_function_tools : EU-Tools die mehrere Kategorien abdecken
(z.B. SAP CX = Analytics + CRM + Marketing)
"""
from __future__ import annotations
import logging
import re
from collections import defaultdict
from typing import Iterable
logger = logging.getLogger(__name__)
# ─── Kategorisierung ──────────────────────────────────────────────────
# Substring-Match (lowercase) → Kategorie. Erste Treffer gewinnt.
_CATEGORY_RULES: list[tuple[str, str]] = [
# Web Analytics / Behavior
("adobe analytics", "web_analytics"),
("adobe target", "personalisation"),
("adobe campaign", "marketing_automation"),
("adobe staging library", "tag_management"),
("adobelaunch", "tag_management"),
("google analytics", "web_analytics"),
("matomo", "web_analytics"),
("hotjar", "web_analytics"),
("content square", "web_analytics"),
("contentsquare", "web_analytics"),
("dynatrace", "monitoring"),
("performance analytics", "web_analytics"),
("form analytics", "web_analytics"),
("form campaign analytics","web_analytics"),
("psyma", "survey"),
("qualtrics", "survey"),
# Tag Management
("google tag manager", "tag_management"),
("gtm", "tag_management"),
# Advertising / Retargeting
("google ads", "advertising"),
("google advertising", "advertising"),
("doubleclick", "advertising"),
("googleads", "advertising"),
("meta pixel", "advertising"),
("meta platforms", "advertising"),
("facebook", "advertising"),
("adform", "advertising"),
("criteo", "advertising"),
("outbrain", "advertising"),
("taboola", "advertising"),
("teads", "advertising"),
("pinterest", "advertising"),
("linkedin insight", "advertising"),
("youtube performance", "advertising"),
("youtube player", "external_media"),
("amazon advertising", "advertising"),
("instagram", "advertising"),
("dotaki", "advertising"),
# Video / Embeds
("youtube", "external_media"),
("vimeo", "external_media"),
("jw player", "external_media"),
("jw video", "external_media"),
("jwplayer", "external_media"),
("jwconnatix", "external_media"),
# Maps / Geo
("google maps", "maps"),
("google geolocation", "maps"),
("geolocation", "maps"),
# CDN / Infrastructure
("akamai", "cdn"),
("amazon web services", "cloud_infra"),
("aws", "cloud_infra"),
("baqend", "cdn"),
("speedkit", "cdn"),
("speedcurve", "monitoring"),
("salesforce", "crm"),
# Chat / Support
("genesys", "chat"),
("ckm", "chat"),
("chat widget", "chat"),
# Captcha / Bot-Protection
("hcaptcha", "captcha"),
("recaptcha", "captcha"),
# Sales / Lead-Tracking
("salesviewer", "lead_tracking"),
# Marketing/Sales overlay
("nayoki", "social_aggregator"),
# Site-eigene Funktionen
("infrastructure", "site_infra"),
("infrastrukturbereit", "site_infra"),
("javaserverpages", "site_infra"),
("single sign-on", "auth"),
("mybmw account", "auth"),
("sso", "auth"),
("consent", "consent_management"),
("session", "site_infra"),
("scroll", "site_infra"),
("sticky", "site_infra"),
("sidebar", "site_infra"),
("dealer search", "site_feature"),
("test drive", "site_feature"),
("vehicle configurator", "site_feature"),
("stocklocator", "site_feature"),
("eshop", "site_feature"),
("shop", "site_feature"),
("language", "site_infra"),
("sprach", "site_infra"),
("region", "site_infra"),
("ip popup", "site_infra"),
("popup", "site_infra"),
("dynatrace", "monitoring"),
]
def classify_vendor(name: str) -> str:
"""Map a vendor name to a functional category."""
n = (name or "").lower()
for needle, cat in _CATEGORY_RULES:
if needle in n:
return cat
return "other"
# ─── EU-Alternativen ─────────────────────────────────────────────────
# Kuratierte Liste — pro US-/Nicht-EU-Vendor passende(r) EU-Ersatz.
# Quellen: Matomo Vergleich, etracker SoMo-Studie, IONOS-Pakete,
# Friendly Captcha Whitepaper, SAP CX-Suite, Brevo / CleverReach DE-Listen.
_EU_ALTERNATIVES: dict[str, list[dict]] = {
"adobe analytics": [
{"name": "Matomo (On-Premise)", "vendor": "InnoCraft", "country": "DE-self-hosted",
"license": "GPL", "notes": "100% DSGVO, keine 3rd-Country, gleicher Funktionsumfang"},
{"name": "etracker Analytics", "vendor": "etracker GmbH", "country": "DE",
"license": "Commercial", "notes": "DSGVO-konform aus Hamburg, IP-Anonymisierung"},
{"name": "Mapp Intelligence", "vendor": "Mapp Digital", "country": "DE",
"license": "Commercial", "notes": "Enterprise-Alternative, Server in DE"},
],
"google analytics": [
{"name": "Matomo", "vendor": "InnoCraft", "country": "DE-self-hosted",
"license": "GPL", "notes": "Direkter Drop-in-Ersatz mit GA-Migrationspfad"},
{"name": "Plausible Analytics", "vendor": "Plausible Insights", "country": "EE",
"license": "AGPL/Commercial", "notes": "Cookielos, ohne Einwilligung nutzbar"},
{"name": "Fathom Analytics EU", "vendor": "Fathom", "country": "DE-Region",
"license": "Commercial", "notes": "Cookielos, EU-Hosting"},
],
"content square": [
{"name": "Mouseflow EU", "vendor": "Mouseflow ApS", "country": "DK",
"license": "Commercial", "notes": "Session-Recording + Heatmaps EU-Hosting"},
{"name": "Hotjar EU", "vendor": "Hotjar Ltd", "country": "MT",
"license": "Commercial", "notes": "EU-DataCenter (Frankfurt), Einwilligung erforderlich"},
],
"dynatrace": [
{"name": "Dynatrace EU", "vendor": "Dynatrace", "country": "AT",
"license": "Commercial", "notes": "Bereits EU (Linz). Cluster auf EU einstellen"},
],
"speedcurve": [
{"name": "SpeedCurve EU", "vendor": "SpeedCurve", "country": "EU-tenant",
"license": "Commercial", "notes": "Region-Tenant explizit konfigurieren"},
{"name": "Calibre", "vendor": "Calibre", "country": "AU/EU",
"license": "Commercial", "notes": "Performance Monitoring, EU-Region"},
],
"akamai": [
{"name": "Bunny CDN", "vendor": "BunnyWay d.o.o.", "country": "SI",
"license": "Commercial", "notes": "Slowenischer CDN, EU-Backbone"},
{"name": "Cloudflare EU-Only", "vendor": "Cloudflare", "country": "Multi",
"license": "Commercial", "notes": "EU-Datacenter erzwingbar via 'Geo Steering'"},
{"name": "IONOS CDN", "vendor": "IONOS SE", "country": "DE",
"license": "Commercial", "notes": "100% DE-Hosting"},
],
"amazon web services": [
{"name": "IONOS Cloud", "vendor": "IONOS SE", "country": "DE",
"license": "Commercial", "notes": "DE-Hosting, BSI C5-zertifiziert"},
{"name": "OVHcloud", "vendor": "OVH SAS", "country": "FR",
"license": "Commercial", "notes": "FR-Hosting, SecNumCloud-zertifiziert"},
{"name": "Hetzner Cloud", "vendor": "Hetzner Online GmbH", "country": "DE",
"license": "Commercial", "notes": "DE/FI-Hosting, sehr kostenguenstig"},
{"name": "STACKIT", "vendor": "Schwarz IT (Lidl-Gruppe)", "country": "DE",
"license": "Commercial", "notes": "Souveraener DE-Cloud, fuer Enterprise"},
],
"salesforce": [
{"name": "SAP Customer Experience", "vendor": "SAP SE", "country": "DE",
"license": "Commercial", "notes": "Vollstaendige CRM-Suite EU-Hosting"},
{"name": "weclapp", "vendor": "weclapp SE", "country": "DE",
"license": "Commercial", "notes": "Cloud-CRM aus Marburg"},
],
"adobe campaign": [
{"name": "CleverReach", "vendor": "CleverReach GmbH", "country": "DE",
"license": "Commercial", "notes": "E-Mail-Marketing DE-Hosting"},
{"name": "Brevo (Sendinblue)", "vendor": "Brevo", "country": "FR",
"license": "Commercial", "notes": "Marketing-Automation EU-Hosting"},
{"name": "Inxmail", "vendor": "Inxmail GmbH", "country": "DE",
"license": "Commercial", "notes": "Enterprise-E-Mail-Marketing aus Freiburg"},
],
"google ads": [
{"name": "Smart AdServer (Equativ)", "vendor": "Equativ", "country": "FR",
"license": "Commercial", "notes": "FR-Hosting, Programmatic + Direct-Sold"},
{"name": "Bing Ads (Microsoft Advertising EU)", "vendor": "Microsoft", "country": "Multi",
"license": "Commercial", "notes": "EU-Datacenter optional"},
],
"google maps": [
{"name": "HERE Maps", "vendor": "HERE Technologies", "country": "DE",
"license": "Commercial", "notes": "Berliner Anbieter, professionelle Karten + Routing"},
{"name": "OpenStreetMap (self-host)", "vendor": "OSM Foundation", "country": "DE-self-host",
"license": "ODbL", "notes": "Frei, OSM-Tiles self-hosted oder via Maptiler EU"},
{"name": "Maptiler Cloud EU", "vendor": "MapTiler", "country": "CH",
"license": "Commercial", "notes": "Schweizer Anbieter, EU-Tiles"},
],
"criteo": [ # criteo IS EU but use as example for retargeting alts
{"name": "Smart AdServer (Equativ)", "vendor": "Equativ", "country": "FR",
"license": "Commercial", "notes": "Retargeting + Display, FR-Hosting"},
],
"hcaptcha": [
{"name": "Friendly Captcha", "vendor": "Friendly Captcha GmbH", "country": "DE",
"license": "Commercial", "notes": "100% DSGVO, ohne Cookie, Hosting in DE"},
{"name": "Turnstile (Cloudflare EU-Only)", "vendor": "Cloudflare", "country": "Multi",
"license": "Commercial", "notes": "Ohne Cookie, EU-Region erzwingbar"},
],
"qualtrics": [
{"name": "LamaPoll", "vendor": "Lamano GmbH", "country": "DE",
"license": "Commercial", "notes": "DSGVO-Surveys aus Berlin"},
{"name": "evasys", "vendor": "evasys GmbH", "country": "DE",
"license": "Commercial", "notes": "Enterprise-Survey-Plattform aus Lueneburg"},
],
"meta pixel": [
{"name": "Smart AdServer (Equativ)", "vendor": "Equativ", "country": "FR",
"license": "Commercial", "notes": "EU-Alternative fuer Conversion-Tracking"},
],
"facebook": [
{"name": "Smart AdServer (Equativ)", "vendor": "Equativ", "country": "FR",
"license": "Commercial", "notes": "Programmatic ohne Meta"},
],
"linkedin insight": [
{"name": "Xing Insights", "vendor": "New Work SE", "country": "DE",
"license": "Commercial", "notes": "DE/AT/CH B2B-Targeting aus Hamburg"},
],
"outbrain": [
{"name": "Plista", "vendor": "Plista GmbH", "country": "DE",
"license": "Commercial", "notes": "Native Advertising aus Berlin"},
],
"taboola": [
{"name": "Plista", "vendor": "Plista GmbH", "country": "DE",
"license": "Commercial", "notes": "Native Advertising aus Berlin"},
],
"genesys": [
{"name": "Userlike", "vendor": "Userlike UG", "country": "DE",
"license": "Commercial", "notes": "Live-Chat aus Koeln, BSI-konform"},
{"name": "LiveZilla / EasyChat EU", "vendor": "LiveZilla GmbH", "country": "DE",
"license": "Commercial", "notes": "DSGVO-Live-Chat"},
],
"salesviewer": [
{"name": "Leadinfo", "vendor": "Leadinfo BV", "country": "NL",
"license": "Commercial", "notes": "B2B-Webvisitor-Tracking EU"},
{"name": "Albacross EU", "vendor": "Albacross", "country": "SE",
"license": "Commercial", "notes": "EU-Tenant verfuegbar"},
],
"youtube": [
{"name": "Vimeo Pro EU", "vendor": "Vimeo", "country": "Multi",
"license": "Commercial", "notes": "EU-Region waehlbar, weniger Tracking"},
{"name": "Self-hosted video (BunnyStream)", "vendor": "BunnyWay", "country": "SI",
"license": "Commercial", "notes": "Eigene Player + CDN ohne Drittanbieter"},
],
"amazon advertising": [
{"name": "Smart AdServer (Equativ)", "vendor": "Equativ", "country": "FR",
"license": "Commercial", "notes": "Retail-Media-Alternative FR"},
],
"instagram": [
{"name": "Pinterest EU + Owned-Channels", "vendor": "Mix", "country": "Multi",
"license": "Commercial", "notes": "Owned-Channels (Newsletter via CleverReach)"},
],
}
# ─── Kosten-Annahmen (oeffentliche Listenpreise, Schaetzung) ──────
#
# Format: (low_year_eur, high_year_eur, tier_assumption)
# Tier: 'sme' = <100 Mitarbeiter, 'mid' = 100-1000, 'ent' = >1000.
# Quellen: oeffentliche Listenpreise + Branchen-Benchmarks (Gartner,
# Forrester 2025). Konkrete Vertrags-Konditionen koennen 30-70% abweichen
# (Volumen-Rabatte, Bundling). Werden im Output explizit als
# 'Schaetzbereich' markiert.
_COST_LOOKUP: dict[str, tuple[int, int, str]] = {
"adobe analytics": (120_000, 600_000, "ent"),
"adobe target": ( 80_000, 350_000, "ent"),
"adobe campaign": ( 60_000, 250_000, "ent"),
"adobe staging library":( 0, 0, "ent"), # bundled
"google analytics": ( 0, 150_000, "ent"), # GA4 free, GA360 ~150k
"matomo": ( 6_000, 30_000, "mid"), # Cloud/On-Prem
"hotjar": ( 3_600, 18_000, "mid"),
"content square": ( 60_000, 300_000, "ent"),
"contentsquare": ( 60_000, 300_000, "ent"),
"dynatrace": ( 50_000, 400_000, "ent"), # per-host pricing
"performance analytics":( 5_000, 40_000, "mid"),
"qualtrics": ( 25_000, 150_000, "ent"),
# Self-Service-Werbung — KEIN Tool-Lizenz, nur Media-Spend (separat).
# Wir zaehlen 0 hier, weil "Sparpotenzial bei der Lizenz" = 0 ist.
# Konsolidierung wuerde nur Media-Spend reduzieren — anderes Thema.
"google ads": ( 0, 0, "ent"),
"google advertising": ( 0, 0, "ent"),
"doubleclick": ( 0, 0, "ent"),
"meta pixel": ( 0, 0, "ent"),
"facebook": ( 0, 0, "ent"),
"amazon advertising": ( 0, 0, "ent"),
"youtube performance": ( 0, 0, "ent"),
"youtube player": ( 0, 0, "ent"),
"instagram": ( 0, 0, "ent"),
# Echte DSP-/Plattform-Lizenzen — hier zahlt der Kunde eine Saas-Fee
# ON TOP des Media-Spends. Range bewusst enger gehalten (Faktor max 4x).
"adform": ( 80_000, 300_000, "ent"),
"criteo": ( 50_000, 200_000, "ent"),
"outbrain": ( 30_000, 120_000, "ent"),
"taboola": ( 30_000, 120_000, "ent"),
"teads": ( 25_000, 100_000, "ent"),
"pinterest": ( 15_000, 60_000, "ent"),
"linkedin insight": ( 10_000, 50_000, "ent"),
"google maps": ( 2_000, 30_000, "mid"),
"akamai": ( 50_000, 500_000, "ent"),
"amazon web services": (100_000, 3_000_000, "ent"),
"baqend": ( 6_000, 60_000, "mid"),
"speedkit": ( 6_000, 60_000, "mid"),
"speedcurve": ( 2_400, 24_000, "mid"),
"salesforce": (100_000, 1_500_000, "ent"), # CRM seats
"genesys": ( 80_000, 800_000, "ent"), # contact-center seats
"ckm": ( 15_000, 120_000, "mid"),
"hcaptcha": ( 0, 12_000, "sme"), # free tier OR pro
"salesviewer": ( 3_600, 18_000, "mid"),
"youtube": ( 0, 50_000, "ent"), # embed kostenlos, Production-Kosten variieren
}
# ─── EU-Alternativen-Kosten (gleiche Tier-Logik) ───────────────────
_EU_ALT_COSTS: dict[str, tuple[int, int]] = {
"Matomo (On-Premise)": ( 3_000, 15_000),
"Matomo (Pro / Cloud EU)": ( 6_000, 30_000),
"Matomo": ( 6_000, 30_000),
"etracker Analytics": ( 10_000, 60_000),
"Mapp Intelligence": ( 40_000, 200_000),
"Plausible Analytics": ( 240, 6_000),
"Fathom Analytics EU": ( 240, 6_000),
"Mouseflow EU": ( 12_000, 60_000),
"Hotjar EU": ( 3_600, 18_000),
"Dynatrace EU": ( 50_000, 400_000), # gleicher Preis, nur Region
"SpeedCurve EU": ( 2_400, 24_000),
"Calibre": ( 3_600, 30_000),
"Bunny CDN": ( 1_200, 12_000),
"Cloudflare EU-Only": ( 6_000, 80_000),
"IONOS CDN": ( 3_000, 30_000),
"IONOS Cloud": ( 30_000, 600_000),
"OVHcloud": ( 30_000, 600_000),
"Hetzner Cloud": ( 6_000, 120_000),
"STACKIT": ( 50_000, 800_000),
"SAP Customer Experience": ( 80_000, 1_200_000),
"weclapp": ( 12_000, 80_000),
"CleverReach": ( 2_400, 24_000),
"Brevo (Sendinblue)": ( 600, 24_000),
"Inxmail": ( 8_000, 60_000),
"Smart AdServer (Equativ)": ( 30_000, 300_000),
"Bing Ads (Microsoft Advertising EU)": ( 30_000, 3_000_000),
"HERE Maps": ( 1_200, 24_000),
"OpenStreetMap (self-host)": ( 0, 6_000), # nur Server-Kosten
"Maptiler Cloud EU": ( 600, 12_000),
"Friendly Captcha": ( 600, 9_600),
"Turnstile (Cloudflare EU-Only)": ( 0, 6_000),
"LamaPoll": ( 1_200, 24_000),
"evasys": ( 6_000, 60_000),
"Xing Insights": ( 6_000, 60_000),
"Plista": ( 20_000, 150_000),
"Userlike": ( 1_200, 30_000),
"LiveZilla / EasyChat EU": ( 600, 12_000),
"Leadinfo": ( 1_200, 12_000),
"Albacross EU": ( 3_600, 24_000),
"Vimeo Pro EU": ( 900, 6_000),
"Self-hosted video (BunnyStream)": ( 600, 12_000),
"Pinterest EU + Owned-Channels": ( 600, 24_000),
}
# ─── Bekannte Gruende fuer Duplikate (sollen Konsolidierung NICHT empfehlen) ─
_DUPLICATION_CAVEATS = {
"web_analytics": [
"A/B-Vergleich verschiedener Anbieter waehrend Migration",
"Marketing nutzt Adobe, Produkt nutzt Matomo — Inhouse-Politik",
"Regional split (Adobe fuer DE, GA fuer International)",
],
"advertising": [
"Brand-Kampagne vs Performance-Kampagne (verschiedene DSPs)",
"Saisonal: Black Friday/Super Bowl nutzt mehr Kanaele",
"Markenspezifisch: BMW M-Modelle anders targetet als 1er-Serie",
],
"cdn": [
"Multi-CDN-Strategie fuer Ausfallsicherheit (Akamai + Cloudflare)",
"Event-CDN-Spike (Auto-Show, Modell-Launch) braucht Skalierung",
"Regionale Latenz-Optimierung (Akamai APAC, AWS US)",
],
"marketing_automation": [
"Salesforce Marketing Cloud fuer B2C, Adobe Campaign fuer B2B",
"Lead-Generierung (Adobe) vs Loyalitaet (Salesforce)",
],
"monitoring": [
"APM (Dynatrace) misst Backend, RUM (SpeedCurve) misst Frontend",
],
"captcha": [
"Stufenweise Migration zu cookieless Captcha",
],
}
def _company_tier_bounds(company_tier: str | None) -> tuple[float, float]:
"""Wie viel der Listpreis-Range tatsaechlich verwenden — abhaengig
vom Company-Tier. Bei 'enterprise' / 'premier' nutzen wir den UPPER
Teil (50-100%) statt starter→premier.
"""
t = (company_tier or "professional").lower()
if t == "premier": return (0.70, 1.00)
if t == "enterprise": return (0.40, 0.85)
if t == "professional": return (0.20, 0.60)
return (0.05, 0.40) # 'sme' / starter
def _estimate_savings_for_redundancy(
redundancy: dict, vendors: Iterable[dict],
company_tier: str = "enterprise",
) -> dict:
"""Schaetzbereich pro Redundanz: derzeitige Kosten + EU-Konsolidierungs-Saving.
Beruecksichtigt den company_tier — wir wollen fuer ein Konzern wie
BMW nicht die starter-Range mit anzeigen. Realistic Range ergibt
sich aus tier_bounds × (low, high).
"""
low_frac, high_frac = _company_tier_bounds(company_tier)
current_low = current_high = 0
matched_vendors = []
cat_vendors = [v for v in vendors if v.get("name") in redundancy.get("vendors", [])]
for v in cat_vendors:
name = (v.get("name") or "").lower()
for k, (lo, hi, _tier) in _COST_LOOKUP.items():
if k in name:
# Tier-aware: nimm low_frac..high_frac des Pricing-Bereichs
span = hi - lo
current_low += int(lo + span * low_frac)
current_high += int(lo + span * high_frac)
matched_vendors.append(v.get("name"))
break
# Konsolidierung: ein einziges EU-Tool ersetzt alle in der Kategorie
suggested_eu = None
suggested_low = suggested_high = 0
# 1. Multi-Funktions-Tool das diese Kategorie abdeckt
for tool in _MULTI_FUNCTION_TOOLS:
if redundancy["category"] in tool["covers"]:
suggested_eu = tool["name"]
cost = _EU_ALT_COSTS.get(tool["name"])
if cost:
suggested_low, suggested_high = cost
break
# 2. Sonst: EU-Alternative aus den Eintraegen — ABER NUR FUR VENDORS
# AUS DER AKTUELLEN KATEGORIE (sonst kommt Userlike fuer Werbung)
if not suggested_eu:
for v in cat_vendors:
n = (v.get("name") or "").lower()
for k, alts in _EU_ALTERNATIVES.items():
if k in n and alts:
suggested_eu = alts[0]["name"]
cost = _EU_ALT_COSTS.get(alts[0]["name"])
if cost:
suggested_low, suggested_high = cost
break
if suggested_eu:
break
saving_low = max(0, current_low - suggested_high)
saving_high = max(0, current_high - suggested_low)
return {
"current_estimate_year_eur": [current_low, current_high],
"suggested_eu_tool": suggested_eu,
"suggested_estimate_year_eur": [suggested_low, suggested_high],
"estimated_saving_year_eur": [saving_low, saving_high],
"caveats": _DUPLICATION_CAVEATS.get(redundancy["category"], []),
"cost_disclaimer": (
"Schaetzbereich auf Basis oeffentlicher Listenpreise. Tatsaechliche "
"Vertragspreise koennen 30-70% niedriger liegen (Volumen, Bundling, "
"Konzern-Konditionen). Bitte mit der jeweiligen Einkaufsabteilung verifizieren."
),
}
# ─── Multi-Funktions-Tools (Konsolidierungs-Ankerpunkte) ───────────
_MULTI_FUNCTION_TOOLS = [
{
"name": "Matomo (Pro / Cloud EU)",
"vendor": "InnoCraft",
"country": "DE-self-host / EU",
"covers": ["web_analytics", "tag_management", "personalisation"],
"notes": "Ersetzt Adobe Analytics + GTM + Adobe Target in einem Tool. "
"100% DSGVO ohne Einwilligung wenn IP anonymisiert.",
},
{
"name": "SAP Customer Experience Suite",
"vendor": "SAP SE",
"country": "DE",
"covers": ["crm", "marketing_automation", "personalisation", "survey"],
"notes": "Ersetzt Salesforce + Adobe Campaign + Qualtrics. EU-Hosting, "
"tiefe ERP-Integration.",
},
{
"name": "IONOS Cloud (Compute + CDN + Storage + DNS)",
"vendor": "IONOS SE",
"country": "DE",
"covers": ["cloud_infra", "cdn", "monitoring"],
"notes": "Ersetzt AWS + Akamai + zusaetzliches Monitoring in einer "
"DE-Cloud (BSI C5).",
},
{
"name": "Userlike Suite",
"vendor": "Userlike UG",
"country": "DE",
"covers": ["chat", "consent_management"],
"notes": "Ersetzt Genesys Chat. Bietet eigenes Consent-Modul.",
},
{
"name": "Smart AdServer (Equativ)",
"vendor": "Equativ",
"country": "FR",
"covers": ["advertising"],
"notes": "Ersetzt Mehrfach-DSPs (Adform/Criteo/Outbrain/Taboola/Meta) "
"durch Programmatic+Direct-Sold EU-Stack.",
},
{
"name": "HERE Maps",
"vendor": "HERE Technologies",
"country": "DE",
"covers": ["maps"],
"notes": "Berliner Anbieter, professionelle Karten + Routing.",
},
{
"name": "Vimeo Pro EU (oder self-hosted BunnyStream)",
"vendor": "Vimeo / BunnyWay",
"country": "Multi / SI",
"covers": ["external_media"],
"notes": "Ersetzt YouTube-Embeds + JW Player in einem Player.",
},
{
"name": "LamaPoll",
"vendor": "Lamano GmbH",
"country": "DE",
"covers": ["survey"],
"notes": "DSGVO-Surveys aus Berlin. Ersetzt Qualtrics / Psyma.",
},
]
# ─── Analyse ─────────────────────────────────────────────────────────
def analyze(vendors: Iterable[dict], company_tier: str = "enterprise") -> dict:
"""Main entry. Returns categorised view + redundancies + EU options.
`company_tier` (starter|professional|enterprise|premier) steuert die
Cost-Range so dass z.B. fuer einen DAX-Konzern nicht starter-Preise
in der unteren Schranke landen.
"""
by_cat: dict[str, list[dict]] = defaultdict(list)
for v in vendors:
cat = classify_vendor(v.get("name", ""))
by_cat[cat].append(v)
# Redundancies: any category with ≥2 vendors (excl. site-internal cats)
skip_redundancy_cats = {"site_infra", "site_feature", "consent_management",
"auth", "other"}
all_vendors_list = list(vendors)
redundancies: list[dict] = []
for cat, vs in by_cat.items():
if cat in skip_redundancy_cats or len(vs) < 2:
continue
red = {
"category": cat,
"category_label": _CATEGORY_LABEL.get(cat, cat),
"count": len(vs),
"vendors": [v.get("name", "") for v in vs],
"consolidation_hint": _CONSOLIDATION_HINT.get(cat, ""),
}
red.update(_estimate_savings_for_redundancy(
red, all_vendors_list, company_tier))
redundancies.append(red)
redundancies.sort(key=lambda r: -(r.get("estimated_saving_year_eur") or [0, 0])[1])
# EU alternatives lookup
eu_alternatives: list[dict] = []
seen = set()
for v in vendors:
name = v.get("name") or ""
n_lower = name.lower()
for k, alts in _EU_ALTERNATIVES.items():
if k in n_lower and k not in seen:
eu_alternatives.append({
"current_vendor": name,
"current_recipient_type": v.get("recipient_type", ""),
"matched_key": k,
"alternatives": alts,
})
seen.add(k)
break
# Multi-function tool recommendations: only if the customer has vendors
# across the categories the tool covers
present_cats = set(by_cat.keys())
multi_function = []
for tool in _MULTI_FUNCTION_TOOLS:
covered_here = [c for c in tool["covers"] if c in present_cats]
if len(covered_here) >= 2:
# Vendor-Namen sammeln statt nur summieren — dedupliziert
unique_vendors: set[str] = set()
for c in covered_here:
for v in by_cat[c]:
unique_vendors.add(v.get("name", ""))
multi_function.append({
**tool,
"replaces_categories": covered_here,
"potential_replacements": len(unique_vendors),
})
multi_function.sort(key=lambda t: -t["potential_replacements"])
total_current_low = sum((r.get("current_estimate_year_eur") or [0, 0])[0] for r in redundancies)
total_current_high = sum((r.get("current_estimate_year_eur") or [0, 0])[1] for r in redundancies)
total_saving_low = sum((r.get("estimated_saving_year_eur") or [0, 0])[0] for r in redundancies)
total_saving_high = sum((r.get("estimated_saving_year_eur") or [0, 0])[1] for r in redundancies)
return {
"summary": {
"total_vendors": len(all_vendors_list),
"distinct_categories": len([c for c in by_cat if c != "other"]),
"redundancy_count": len(redundancies),
"eu_alternative_count": len(eu_alternatives),
"consolidation_potential": sum(r["count"] - 1 for r in redundancies),
"estimated_current_year_eur": [total_current_low, total_current_high],
"estimated_saving_year_eur": [total_saving_low, total_saving_high],
"estimated_saving_pct": (
# Beide Bounds gegen denselben Nenner (Mittelwert der
# aktuellen Schaetzung) — sonst explodiert die obere
# Schranke wenn current_low klein ist. Cap auf 95%.
(lambda mid: (
f"{min(95, int(100 * total_saving_low / mid))}"
f"{min(95, int(100 * total_saving_high / mid))}%"
))((total_current_low + total_current_high) / 2)
if total_current_high else "n/a"
),
"cost_disclaimer": (
"Schaetzbereich auf Basis oeffentlicher Listenpreise (Gartner, Forrester 2025). "
"Vertragspreise koennen 30-70% niedriger liegen (Volumen-Rabatte, Konzern-Konditionen, "
"Bundling). Werte dienen als Diskussionsgrundlage mit dem Einkauf, NICHT als Angebot."
),
},
"by_category": {cat: [v.get("name", "") for v in vs]
for cat, vs in by_cat.items()},
"redundancies": redundancies,
"eu_alternatives": eu_alternatives,
"multi_function_tools": multi_function,
}
_CATEGORY_LABEL = {
"web_analytics": "Web-Analytics",
"advertising": "Werbung / Retargeting",
"tag_management": "Tag-Management",
"marketing_automation": "Marketing-Automation",
"personalisation": "Personalisierung",
"external_media": "Externe Medien (Video)",
"maps": "Karten / Geo",
"cdn": "CDN",
"cloud_infra": "Cloud-Infrastruktur",
"monitoring": "Performance-Monitoring",
"crm": "CRM",
"chat": "Chat / Support",
"captcha": "Bot-Schutz",
"lead_tracking": "Lead-Tracking",
"survey": "Umfragen",
"social_aggregator": "Social-Media-Aggregation",
"consent_management": "Consent-Management",
"auth": "Authentifizierung",
"site_infra": "Eigene Infrastruktur",
"site_feature": "Eigene Features",
"other": "Sonstige",
}
_CONSOLIDATION_HINT = {
"web_analytics": "Mehrere Analytics-Tools sammeln meist redundante Daten. Ein Tool genuegt — Matomo (DE) ist DSGVO-Standard.",
"advertising": "Werbe-/Retargeting-Pixel sind oft austauschbar. Konzentration auf 2-3 Kanaele senkt Drittland-Risiko.",
"external_media": "Mehrere Video-Embeds nur wenn fachlich noetig. Self-hosted (BunnyStream/Vimeo) reduziert Tracking.",
"maps": "Eine Karten-Loesung reicht. HERE Maps (DE) als EU-Alternative zu Google Maps.",
"cdn": "Ein CDN+Performance-Stack genuegt. IONOS oder Bunny vereinen mehrere Funktionen.",
"marketing_automation": "Marketing-Cloud + separates E-Mail-Tool sind oft Dopplung — SAP CX oder CleverReach allein moeglich.",
"chat": "Ein Chat-System genuegt. Userlike (DE) ersetzt Genesys-Stack.",
"monitoring": "RUM + APM koennen in einem Tool gebuendelt werden (Dynatrace EU oder Sentry-Self-host).",
"survey": "Eine Survey-Plattform genuegt — LamaPoll (DE) oder Mapp.",
}