662327e8b4
CI / nodejs-build (push) Successful in 2m47s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Massiv-Update auf Basis BMW-Test-Iterationen (v1→v9): Core Compliance-Check - Sonnet check_type Klassifikation: text/process/review fuer alle 1874 MCs in compliance.doc_check_controls (script + Sidecar /data/mc_classification.db). rag_document_checker filtert auf check_type='text' fuer doc_check. Plus fits_doc_type-Audit (v2) + ui_only-Audit fuer DSA/E-Commerce-MCs in falscher doc_type-Schublade. - scope_requires-Filter: biometric/ai_decision/child_targeting MCs werden per business_profile gefiltert (FRT skipped fuer BMW etc.). - Embedding-Match (BGE-M3) als Phase-3 nach Regex-Match: Per-doc_type-Threshold-Override (impressum 0.50, dse/cookie 0.60), Short-Field-Rescue (15-Wort-Chunks) fuer Pflichtfelder im Impressum. Title+check_question als Embedding-Input fuer mehr Kontext. - Cookie-Text-Routing: consent-tester gibt cmp_cookie_text aus dem CMP-Reconstruct zurueck, Backend bevorzugt das gegen DOM-Extraction wenn richer (BMW 1824 vs 600 Worte). Vendor-Redundanz + EU-Alternativen + Cost-Saving - vendor_redundancy.analyze() — funktionale Kategorisierung der CMP-Vendors, Detektion von Mehrfach-Anbietern pro Kategorie, EU-Alternative-Lookup (Matomo, IONOS, HERE, Friendly Captcha, Smart AdServer, ...). - vendor_cost_estimator: Tier-Inferenz aus Cookie-Footprint (Cookie-Anzahl + Premium-Feature-Cookies + Third-Party-Quote → starter/professional/ enterprise/premier). - Self-Service-Werbung (Google/Meta/Pinterest/...) = 0 Lizenz-Kosten (nur Media-Spend, separat). DSP-Plattformen behalten enge Range. - Tier-aware Saving-Range: bei Enterprise/Premier nutzen wir den oberen 40-100%-Band der Listpreise, nicht starter→premier. - Multi-Function-Tools (Matomo Pro, SAP CX, IONOS Cloud, Userlike, Smart AdServer, HERE Maps, Vimeo Pro, LamaPoll) — ein Tool ersetzt mehrere Kategorien gleichzeitig. Cookie-Wissens-DB + Funktionale Klassifikation - cookie_knowledge_db: 50 kuratierte Top-Cookies (Google/Meta/Adobe/MS/...) mit vendor, exact_purpose, data_collected, IAB-TCF-IDs, reid_risk, schrems_ii_status, EuGH-Urteile, EU-Alternative. - cookie_function_classifier: pro Cookie funktionale Rolle (tracking_id, ad_pixel, session_id, ab_test, csrf, ...) + blocking_impact. Country-Inferenz aus Rechtsform - cookie_link_validator: Country-Field wird aus Vendor-Name abgeleitet (A/S=DK, GmbH=DE, Inc=US, B.V.=NL, ...) plus Vendor-Lookup-Table. Reduziert false-positive no_country-Flags bei eindeutig-EU-Vendors (Adform DK, Pinterest IE). Action-Recipes + Doc-Anchor-Locator - finding_action_recipes: pro Finding-Typ (no_cookies_listed, no_country, broken_opt_out, "Auftragsverarbeiter erwaehnen", "Art. 22 Profiling", ...) eine strukturierte Anweisung mit what/why/fix_text/where/example. Zum 1:1-Einfuegen in Kunden-Dokumente. - doc_anchor_locator: Embedding-basiert (BGE-M3 cosine) — sucht den passenden Absatz im existierenden Kundendokument fuer jeden Finding. Per-Run Thread-Local-Cache. Fallback: keyword-Match. - Email-Rendering integriert Recipe + Anchor pro Doc-Pruefungs-Fail + Vendor-Flag-Liste mit aufklappbarer Action-Liste. - Score-Erklaerung pro Vendor-Zeile (3/5-Untertitel + Tooltip). Migration-Pipeline (Compliance-Check -> Customer Banner/Documents) - migration_to_banner.py: Vendor-Liste -> CookieBannerConfig mit 4 Kategorien + Review-Flags. - migration_to_document.py: Vendor-Liste -> Cookie-Policy + VVT-Register + Privacy-Policy-Pre-Fills. - agent_migration_routes: 3 Preview-Endpoints (banner-preview, document-preview, summary). Persistierung der cmp_vendors in /data/compliance_audits.db check_payloads-Tabelle. Borlabs-Parity Cookie-Banner-Features - Consent-Historie im Banner: window.bpShowConsentHistory() + localStorage. - Content-Blocker: cookie-banner-content-blocker.ts — YouTube/Maps/Video Placeholder bis Einwilligung. - Google Consent Mode v2 erweitert: wait_for_update + region=EEA/CH/GB. - Consent-Log Export (CSV/JSON) per einwilligungen_export_routes. Bug-Fixes - canonical_control_routes: _jsonish-Helper fuer string-typed jsonb, similar-controls-Endpoint mit _has_embedding_col()-Cache (kein 500 mehr). - Control-Library Frontend: defensive .map-Coercer in 2 Detail-Views. - Embedding-Service-Batching (32er Batches statt 165 in einem Call). - KeyError 'control_id' in MC-Result-Aggregation (defensive .get). - Master-Controls-Klick-Through von /sdk/master-controls auf /sdk/control-library?control=<id> mit URL-Param-Auto-Open. - Dockerfile: /data pre-chowned auf appuser (Audit-DB-Schreibrecht). - Cookie-Text-Routing-Bug (cmp_reconstructed > DOM-extraction). - doc_type-aware MC-Filter (statt all-text-MCs). - Master-Contract-Dedup (60 BMW-Internal-Eintraege = 1 Adobe-Vertrag). - A3-v2-Audit hat 24 UI-Sprache-MCs als 'process' reklassifiziert. Tests - test_migration_mappers.py (9 Tests) - test_migration_endpoints.py (4 Tests) Skripte (one-shot) - classify_mc_check_type.py (v1) + _v2 (PK=control_id,doc_type) - audit_mc_doctype_fit.py (v1 fits) + _v2 (ui_only + scope_requires) BMW-Run-Bilanz v1 (broken) -> v9 (alle Fixes): DSE 7,5% -> 81-83% Impressum 4% -> 100% (6 echte MCs alle erfuellt) Cookie 0% -> 79-83% (CMP-Text-Routing + Embedding) Plus: 10 Konsolidierungs-Kategorien, geschaetzte Saving 200k-3M / Jahr Plus: Action-Recipes + Doc-Anchors fuer jeden Fail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
728 lines
33 KiB
Python
728 lines
33 KiB
Python
"""
|
||
Vendor Redundancy + EU-Alternatives Analyzer.
|
||
|
||
Eingang: Liste von Vendors aus dem CMP-Capture (z.B. BMW 90 Vendors).
|
||
Ausgang: drei strukturierte Listen die im Email + Migration-Modal
|
||
gerendert werden:
|
||
|
||
1. functional_categories : Vendor → Funktionsklasse (analytics,
|
||
advertising, cdn, captcha, chat, …)
|
||
2. redundancies : Kategorien mit ≥2 Vendors die dasselbe tun
|
||
→ Konsolidierungspotenzial
|
||
3. eu_alternatives : pro US-Vendor passender EU-Ersatz aus
|
||
kuratierter Lookup-Tabelle (Matomo statt
|
||
Adobe Analytics, IONOS statt AWS, etc.)
|
||
4. multi_function_tools : EU-Tools die mehrere Kategorien abdecken
|
||
(z.B. SAP CX = Analytics + CRM + Marketing)
|
||
"""
|
||
|
||
from __future__ import annotations
|
||
|
||
import logging
|
||
import re
|
||
from collections import defaultdict
|
||
from typing import Iterable
|
||
|
||
logger = logging.getLogger(__name__)
|
||
|
||
|
||
# ─── Kategorisierung ──────────────────────────────────────────────────
|
||
|
||
# Substring-Match (lowercase) → Kategorie. Erste Treffer gewinnt.
|
||
_CATEGORY_RULES: list[tuple[str, str]] = [
|
||
# Web Analytics / Behavior
|
||
("adobe analytics", "web_analytics"),
|
||
("adobe target", "personalisation"),
|
||
("adobe campaign", "marketing_automation"),
|
||
("adobe staging library", "tag_management"),
|
||
("adobelaunch", "tag_management"),
|
||
("google analytics", "web_analytics"),
|
||
("matomo", "web_analytics"),
|
||
("hotjar", "web_analytics"),
|
||
("content square", "web_analytics"),
|
||
("contentsquare", "web_analytics"),
|
||
("dynatrace", "monitoring"),
|
||
("performance analytics", "web_analytics"),
|
||
("form analytics", "web_analytics"),
|
||
("form campaign analytics","web_analytics"),
|
||
("psyma", "survey"),
|
||
("qualtrics", "survey"),
|
||
|
||
# Tag Management
|
||
("google tag manager", "tag_management"),
|
||
("gtm", "tag_management"),
|
||
|
||
# Advertising / Retargeting
|
||
("google ads", "advertising"),
|
||
("google advertising", "advertising"),
|
||
("doubleclick", "advertising"),
|
||
("googleads", "advertising"),
|
||
("meta pixel", "advertising"),
|
||
("meta platforms", "advertising"),
|
||
("facebook", "advertising"),
|
||
("adform", "advertising"),
|
||
("criteo", "advertising"),
|
||
("outbrain", "advertising"),
|
||
("taboola", "advertising"),
|
||
("teads", "advertising"),
|
||
("pinterest", "advertising"),
|
||
("linkedin insight", "advertising"),
|
||
("youtube performance", "advertising"),
|
||
("youtube player", "external_media"),
|
||
("amazon advertising", "advertising"),
|
||
("instagram", "advertising"),
|
||
("dotaki", "advertising"),
|
||
|
||
# Video / Embeds
|
||
("youtube", "external_media"),
|
||
("vimeo", "external_media"),
|
||
("jw player", "external_media"),
|
||
("jw video", "external_media"),
|
||
("jwplayer", "external_media"),
|
||
("jwconnatix", "external_media"),
|
||
|
||
# Maps / Geo
|
||
("google maps", "maps"),
|
||
("google geolocation", "maps"),
|
||
("geolocation", "maps"),
|
||
|
||
# CDN / Infrastructure
|
||
("akamai", "cdn"),
|
||
("amazon web services", "cloud_infra"),
|
||
("aws", "cloud_infra"),
|
||
("baqend", "cdn"),
|
||
("speedkit", "cdn"),
|
||
("speedcurve", "monitoring"),
|
||
("salesforce", "crm"),
|
||
|
||
# Chat / Support
|
||
("genesys", "chat"),
|
||
("ckm", "chat"),
|
||
("chat widget", "chat"),
|
||
|
||
# Captcha / Bot-Protection
|
||
("hcaptcha", "captcha"),
|
||
("recaptcha", "captcha"),
|
||
|
||
# Sales / Lead-Tracking
|
||
("salesviewer", "lead_tracking"),
|
||
|
||
# Marketing/Sales overlay
|
||
("nayoki", "social_aggregator"),
|
||
|
||
# Site-eigene Funktionen
|
||
("infrastructure", "site_infra"),
|
||
("infrastrukturbereit", "site_infra"),
|
||
("javaserverpages", "site_infra"),
|
||
("single sign-on", "auth"),
|
||
("mybmw account", "auth"),
|
||
("sso", "auth"),
|
||
("consent", "consent_management"),
|
||
("session", "site_infra"),
|
||
("scroll", "site_infra"),
|
||
("sticky", "site_infra"),
|
||
("sidebar", "site_infra"),
|
||
("dealer search", "site_feature"),
|
||
("test drive", "site_feature"),
|
||
("vehicle configurator", "site_feature"),
|
||
("stocklocator", "site_feature"),
|
||
("eshop", "site_feature"),
|
||
("shop", "site_feature"),
|
||
("language", "site_infra"),
|
||
("sprach", "site_infra"),
|
||
("region", "site_infra"),
|
||
("ip popup", "site_infra"),
|
||
("popup", "site_infra"),
|
||
("dynatrace", "monitoring"),
|
||
]
|
||
|
||
|
||
def classify_vendor(name: str) -> str:
|
||
"""Map a vendor name to a functional category."""
|
||
n = (name or "").lower()
|
||
for needle, cat in _CATEGORY_RULES:
|
||
if needle in n:
|
||
return cat
|
||
return "other"
|
||
|
||
|
||
# ─── EU-Alternativen ─────────────────────────────────────────────────
|
||
|
||
# Kuratierte Liste — pro US-/Nicht-EU-Vendor passende(r) EU-Ersatz.
|
||
# Quellen: Matomo Vergleich, etracker SoMo-Studie, IONOS-Pakete,
|
||
# Friendly Captcha Whitepaper, SAP CX-Suite, Brevo / CleverReach DE-Listen.
|
||
_EU_ALTERNATIVES: dict[str, list[dict]] = {
|
||
"adobe analytics": [
|
||
{"name": "Matomo (On-Premise)", "vendor": "InnoCraft", "country": "DE-self-hosted",
|
||
"license": "GPL", "notes": "100% DSGVO, keine 3rd-Country, gleicher Funktionsumfang"},
|
||
{"name": "etracker Analytics", "vendor": "etracker GmbH", "country": "DE",
|
||
"license": "Commercial", "notes": "DSGVO-konform aus Hamburg, IP-Anonymisierung"},
|
||
{"name": "Mapp Intelligence", "vendor": "Mapp Digital", "country": "DE",
|
||
"license": "Commercial", "notes": "Enterprise-Alternative, Server in DE"},
|
||
],
|
||
"google analytics": [
|
||
{"name": "Matomo", "vendor": "InnoCraft", "country": "DE-self-hosted",
|
||
"license": "GPL", "notes": "Direkter Drop-in-Ersatz mit GA-Migrationspfad"},
|
||
{"name": "Plausible Analytics", "vendor": "Plausible Insights", "country": "EE",
|
||
"license": "AGPL/Commercial", "notes": "Cookielos, ohne Einwilligung nutzbar"},
|
||
{"name": "Fathom Analytics EU", "vendor": "Fathom", "country": "DE-Region",
|
||
"license": "Commercial", "notes": "Cookielos, EU-Hosting"},
|
||
],
|
||
"content square": [
|
||
{"name": "Mouseflow EU", "vendor": "Mouseflow ApS", "country": "DK",
|
||
"license": "Commercial", "notes": "Session-Recording + Heatmaps EU-Hosting"},
|
||
{"name": "Hotjar EU", "vendor": "Hotjar Ltd", "country": "MT",
|
||
"license": "Commercial", "notes": "EU-DataCenter (Frankfurt), Einwilligung erforderlich"},
|
||
],
|
||
"dynatrace": [
|
||
{"name": "Dynatrace EU", "vendor": "Dynatrace", "country": "AT",
|
||
"license": "Commercial", "notes": "Bereits EU (Linz). Cluster auf EU einstellen"},
|
||
],
|
||
"speedcurve": [
|
||
{"name": "SpeedCurve EU", "vendor": "SpeedCurve", "country": "EU-tenant",
|
||
"license": "Commercial", "notes": "Region-Tenant explizit konfigurieren"},
|
||
{"name": "Calibre", "vendor": "Calibre", "country": "AU/EU",
|
||
"license": "Commercial", "notes": "Performance Monitoring, EU-Region"},
|
||
],
|
||
"akamai": [
|
||
{"name": "Bunny CDN", "vendor": "BunnyWay d.o.o.", "country": "SI",
|
||
"license": "Commercial", "notes": "Slowenischer CDN, EU-Backbone"},
|
||
{"name": "Cloudflare EU-Only", "vendor": "Cloudflare", "country": "Multi",
|
||
"license": "Commercial", "notes": "EU-Datacenter erzwingbar via 'Geo Steering'"},
|
||
{"name": "IONOS CDN", "vendor": "IONOS SE", "country": "DE",
|
||
"license": "Commercial", "notes": "100% DE-Hosting"},
|
||
],
|
||
"amazon web services": [
|
||
{"name": "IONOS Cloud", "vendor": "IONOS SE", "country": "DE",
|
||
"license": "Commercial", "notes": "DE-Hosting, BSI C5-zertifiziert"},
|
||
{"name": "OVHcloud", "vendor": "OVH SAS", "country": "FR",
|
||
"license": "Commercial", "notes": "FR-Hosting, SecNumCloud-zertifiziert"},
|
||
{"name": "Hetzner Cloud", "vendor": "Hetzner Online GmbH", "country": "DE",
|
||
"license": "Commercial", "notes": "DE/FI-Hosting, sehr kostenguenstig"},
|
||
{"name": "STACKIT", "vendor": "Schwarz IT (Lidl-Gruppe)", "country": "DE",
|
||
"license": "Commercial", "notes": "Souveraener DE-Cloud, fuer Enterprise"},
|
||
],
|
||
"salesforce": [
|
||
{"name": "SAP Customer Experience", "vendor": "SAP SE", "country": "DE",
|
||
"license": "Commercial", "notes": "Vollstaendige CRM-Suite EU-Hosting"},
|
||
{"name": "weclapp", "vendor": "weclapp SE", "country": "DE",
|
||
"license": "Commercial", "notes": "Cloud-CRM aus Marburg"},
|
||
],
|
||
"adobe campaign": [
|
||
{"name": "CleverReach", "vendor": "CleverReach GmbH", "country": "DE",
|
||
"license": "Commercial", "notes": "E-Mail-Marketing DE-Hosting"},
|
||
{"name": "Brevo (Sendinblue)", "vendor": "Brevo", "country": "FR",
|
||
"license": "Commercial", "notes": "Marketing-Automation EU-Hosting"},
|
||
{"name": "Inxmail", "vendor": "Inxmail GmbH", "country": "DE",
|
||
"license": "Commercial", "notes": "Enterprise-E-Mail-Marketing aus Freiburg"},
|
||
],
|
||
"google ads": [
|
||
{"name": "Smart AdServer (Equativ)", "vendor": "Equativ", "country": "FR",
|
||
"license": "Commercial", "notes": "FR-Hosting, Programmatic + Direct-Sold"},
|
||
{"name": "Bing Ads (Microsoft Advertising EU)", "vendor": "Microsoft", "country": "Multi",
|
||
"license": "Commercial", "notes": "EU-Datacenter optional"},
|
||
],
|
||
"google maps": [
|
||
{"name": "HERE Maps", "vendor": "HERE Technologies", "country": "DE",
|
||
"license": "Commercial", "notes": "Berliner Anbieter, professionelle Karten + Routing"},
|
||
{"name": "OpenStreetMap (self-host)", "vendor": "OSM Foundation", "country": "DE-self-host",
|
||
"license": "ODbL", "notes": "Frei, OSM-Tiles self-hosted oder via Maptiler EU"},
|
||
{"name": "Maptiler Cloud EU", "vendor": "MapTiler", "country": "CH",
|
||
"license": "Commercial", "notes": "Schweizer Anbieter, EU-Tiles"},
|
||
],
|
||
"criteo": [ # criteo IS EU but use as example for retargeting alts
|
||
{"name": "Smart AdServer (Equativ)", "vendor": "Equativ", "country": "FR",
|
||
"license": "Commercial", "notes": "Retargeting + Display, FR-Hosting"},
|
||
],
|
||
"hcaptcha": [
|
||
{"name": "Friendly Captcha", "vendor": "Friendly Captcha GmbH", "country": "DE",
|
||
"license": "Commercial", "notes": "100% DSGVO, ohne Cookie, Hosting in DE"},
|
||
{"name": "Turnstile (Cloudflare EU-Only)", "vendor": "Cloudflare", "country": "Multi",
|
||
"license": "Commercial", "notes": "Ohne Cookie, EU-Region erzwingbar"},
|
||
],
|
||
"qualtrics": [
|
||
{"name": "LamaPoll", "vendor": "Lamano GmbH", "country": "DE",
|
||
"license": "Commercial", "notes": "DSGVO-Surveys aus Berlin"},
|
||
{"name": "evasys", "vendor": "evasys GmbH", "country": "DE",
|
||
"license": "Commercial", "notes": "Enterprise-Survey-Plattform aus Lueneburg"},
|
||
],
|
||
"meta pixel": [
|
||
{"name": "Smart AdServer (Equativ)", "vendor": "Equativ", "country": "FR",
|
||
"license": "Commercial", "notes": "EU-Alternative fuer Conversion-Tracking"},
|
||
],
|
||
"facebook": [
|
||
{"name": "Smart AdServer (Equativ)", "vendor": "Equativ", "country": "FR",
|
||
"license": "Commercial", "notes": "Programmatic ohne Meta"},
|
||
],
|
||
"linkedin insight": [
|
||
{"name": "Xing Insights", "vendor": "New Work SE", "country": "DE",
|
||
"license": "Commercial", "notes": "DE/AT/CH B2B-Targeting aus Hamburg"},
|
||
],
|
||
"outbrain": [
|
||
{"name": "Plista", "vendor": "Plista GmbH", "country": "DE",
|
||
"license": "Commercial", "notes": "Native Advertising aus Berlin"},
|
||
],
|
||
"taboola": [
|
||
{"name": "Plista", "vendor": "Plista GmbH", "country": "DE",
|
||
"license": "Commercial", "notes": "Native Advertising aus Berlin"},
|
||
],
|
||
"genesys": [
|
||
{"name": "Userlike", "vendor": "Userlike UG", "country": "DE",
|
||
"license": "Commercial", "notes": "Live-Chat aus Koeln, BSI-konform"},
|
||
{"name": "LiveZilla / EasyChat EU", "vendor": "LiveZilla GmbH", "country": "DE",
|
||
"license": "Commercial", "notes": "DSGVO-Live-Chat"},
|
||
],
|
||
"salesviewer": [
|
||
{"name": "Leadinfo", "vendor": "Leadinfo BV", "country": "NL",
|
||
"license": "Commercial", "notes": "B2B-Webvisitor-Tracking EU"},
|
||
{"name": "Albacross EU", "vendor": "Albacross", "country": "SE",
|
||
"license": "Commercial", "notes": "EU-Tenant verfuegbar"},
|
||
],
|
||
"youtube": [
|
||
{"name": "Vimeo Pro EU", "vendor": "Vimeo", "country": "Multi",
|
||
"license": "Commercial", "notes": "EU-Region waehlbar, weniger Tracking"},
|
||
{"name": "Self-hosted video (BunnyStream)", "vendor": "BunnyWay", "country": "SI",
|
||
"license": "Commercial", "notes": "Eigene Player + CDN ohne Drittanbieter"},
|
||
],
|
||
"amazon advertising": [
|
||
{"name": "Smart AdServer (Equativ)", "vendor": "Equativ", "country": "FR",
|
||
"license": "Commercial", "notes": "Retail-Media-Alternative FR"},
|
||
],
|
||
"instagram": [
|
||
{"name": "Pinterest EU + Owned-Channels", "vendor": "Mix", "country": "Multi",
|
||
"license": "Commercial", "notes": "Owned-Channels (Newsletter via CleverReach)"},
|
||
],
|
||
}
|
||
|
||
|
||
# ─── Kosten-Annahmen (oeffentliche Listenpreise, Schaetzung) ──────
|
||
#
|
||
# Format: (low_year_eur, high_year_eur, tier_assumption)
|
||
# Tier: 'sme' = <100 Mitarbeiter, 'mid' = 100-1000, 'ent' = >1000.
|
||
# Quellen: oeffentliche Listenpreise + Branchen-Benchmarks (Gartner,
|
||
# Forrester 2025). Konkrete Vertrags-Konditionen koennen 30-70% abweichen
|
||
# (Volumen-Rabatte, Bundling). Werden im Output explizit als
|
||
# 'Schaetzbereich' markiert.
|
||
|
||
_COST_LOOKUP: dict[str, tuple[int, int, str]] = {
|
||
"adobe analytics": (120_000, 600_000, "ent"),
|
||
"adobe target": ( 80_000, 350_000, "ent"),
|
||
"adobe campaign": ( 60_000, 250_000, "ent"),
|
||
"adobe staging library":( 0, 0, "ent"), # bundled
|
||
"google analytics": ( 0, 150_000, "ent"), # GA4 free, GA360 ~150k
|
||
"matomo": ( 6_000, 30_000, "mid"), # Cloud/On-Prem
|
||
"hotjar": ( 3_600, 18_000, "mid"),
|
||
"content square": ( 60_000, 300_000, "ent"),
|
||
"contentsquare": ( 60_000, 300_000, "ent"),
|
||
"dynatrace": ( 50_000, 400_000, "ent"), # per-host pricing
|
||
"performance analytics":( 5_000, 40_000, "mid"),
|
||
"qualtrics": ( 25_000, 150_000, "ent"),
|
||
|
||
# Self-Service-Werbung — KEIN Tool-Lizenz, nur Media-Spend (separat).
|
||
# Wir zaehlen 0 hier, weil "Sparpotenzial bei der Lizenz" = 0 ist.
|
||
# Konsolidierung wuerde nur Media-Spend reduzieren — anderes Thema.
|
||
"google ads": ( 0, 0, "ent"),
|
||
"google advertising": ( 0, 0, "ent"),
|
||
"doubleclick": ( 0, 0, "ent"),
|
||
"meta pixel": ( 0, 0, "ent"),
|
||
"facebook": ( 0, 0, "ent"),
|
||
"amazon advertising": ( 0, 0, "ent"),
|
||
"youtube performance": ( 0, 0, "ent"),
|
||
"youtube player": ( 0, 0, "ent"),
|
||
"instagram": ( 0, 0, "ent"),
|
||
# Echte DSP-/Plattform-Lizenzen — hier zahlt der Kunde eine Saas-Fee
|
||
# ON TOP des Media-Spends. Range bewusst enger gehalten (Faktor max 4x).
|
||
"adform": ( 80_000, 300_000, "ent"),
|
||
"criteo": ( 50_000, 200_000, "ent"),
|
||
"outbrain": ( 30_000, 120_000, "ent"),
|
||
"taboola": ( 30_000, 120_000, "ent"),
|
||
"teads": ( 25_000, 100_000, "ent"),
|
||
"pinterest": ( 15_000, 60_000, "ent"),
|
||
"linkedin insight": ( 10_000, 50_000, "ent"),
|
||
|
||
"google maps": ( 2_000, 30_000, "mid"),
|
||
"akamai": ( 50_000, 500_000, "ent"),
|
||
"amazon web services": (100_000, 3_000_000, "ent"),
|
||
"baqend": ( 6_000, 60_000, "mid"),
|
||
"speedkit": ( 6_000, 60_000, "mid"),
|
||
"speedcurve": ( 2_400, 24_000, "mid"),
|
||
|
||
"salesforce": (100_000, 1_500_000, "ent"), # CRM seats
|
||
"genesys": ( 80_000, 800_000, "ent"), # contact-center seats
|
||
"ckm": ( 15_000, 120_000, "mid"),
|
||
"hcaptcha": ( 0, 12_000, "sme"), # free tier OR pro
|
||
|
||
"salesviewer": ( 3_600, 18_000, "mid"),
|
||
"youtube": ( 0, 50_000, "ent"), # embed kostenlos, Production-Kosten variieren
|
||
}
|
||
|
||
|
||
# ─── EU-Alternativen-Kosten (gleiche Tier-Logik) ───────────────────
|
||
|
||
_EU_ALT_COSTS: dict[str, tuple[int, int]] = {
|
||
"Matomo (On-Premise)": ( 3_000, 15_000),
|
||
"Matomo (Pro / Cloud EU)": ( 6_000, 30_000),
|
||
"Matomo": ( 6_000, 30_000),
|
||
"etracker Analytics": ( 10_000, 60_000),
|
||
"Mapp Intelligence": ( 40_000, 200_000),
|
||
"Plausible Analytics": ( 240, 6_000),
|
||
"Fathom Analytics EU": ( 240, 6_000),
|
||
"Mouseflow EU": ( 12_000, 60_000),
|
||
"Hotjar EU": ( 3_600, 18_000),
|
||
"Dynatrace EU": ( 50_000, 400_000), # gleicher Preis, nur Region
|
||
"SpeedCurve EU": ( 2_400, 24_000),
|
||
"Calibre": ( 3_600, 30_000),
|
||
"Bunny CDN": ( 1_200, 12_000),
|
||
"Cloudflare EU-Only": ( 6_000, 80_000),
|
||
"IONOS CDN": ( 3_000, 30_000),
|
||
"IONOS Cloud": ( 30_000, 600_000),
|
||
"OVHcloud": ( 30_000, 600_000),
|
||
"Hetzner Cloud": ( 6_000, 120_000),
|
||
"STACKIT": ( 50_000, 800_000),
|
||
"SAP Customer Experience": ( 80_000, 1_200_000),
|
||
"weclapp": ( 12_000, 80_000),
|
||
"CleverReach": ( 2_400, 24_000),
|
||
"Brevo (Sendinblue)": ( 600, 24_000),
|
||
"Inxmail": ( 8_000, 60_000),
|
||
"Smart AdServer (Equativ)": ( 30_000, 300_000),
|
||
"Bing Ads (Microsoft Advertising EU)": ( 30_000, 3_000_000),
|
||
"HERE Maps": ( 1_200, 24_000),
|
||
"OpenStreetMap (self-host)": ( 0, 6_000), # nur Server-Kosten
|
||
"Maptiler Cloud EU": ( 600, 12_000),
|
||
"Friendly Captcha": ( 600, 9_600),
|
||
"Turnstile (Cloudflare EU-Only)": ( 0, 6_000),
|
||
"LamaPoll": ( 1_200, 24_000),
|
||
"evasys": ( 6_000, 60_000),
|
||
"Xing Insights": ( 6_000, 60_000),
|
||
"Plista": ( 20_000, 150_000),
|
||
"Userlike": ( 1_200, 30_000),
|
||
"LiveZilla / EasyChat EU": ( 600, 12_000),
|
||
"Leadinfo": ( 1_200, 12_000),
|
||
"Albacross EU": ( 3_600, 24_000),
|
||
"Vimeo Pro EU": ( 900, 6_000),
|
||
"Self-hosted video (BunnyStream)": ( 600, 12_000),
|
||
"Pinterest EU + Owned-Channels": ( 600, 24_000),
|
||
}
|
||
|
||
|
||
# ─── Bekannte Gruende fuer Duplikate (sollen Konsolidierung NICHT empfehlen) ─
|
||
|
||
_DUPLICATION_CAVEATS = {
|
||
"web_analytics": [
|
||
"A/B-Vergleich verschiedener Anbieter waehrend Migration",
|
||
"Marketing nutzt Adobe, Produkt nutzt Matomo — Inhouse-Politik",
|
||
"Regional split (Adobe fuer DE, GA fuer International)",
|
||
],
|
||
"advertising": [
|
||
"Brand-Kampagne vs Performance-Kampagne (verschiedene DSPs)",
|
||
"Saisonal: Black Friday/Super Bowl nutzt mehr Kanaele",
|
||
"Markenspezifisch: BMW M-Modelle anders targetet als 1er-Serie",
|
||
],
|
||
"cdn": [
|
||
"Multi-CDN-Strategie fuer Ausfallsicherheit (Akamai + Cloudflare)",
|
||
"Event-CDN-Spike (Auto-Show, Modell-Launch) braucht Skalierung",
|
||
"Regionale Latenz-Optimierung (Akamai APAC, AWS US)",
|
||
],
|
||
"marketing_automation": [
|
||
"Salesforce Marketing Cloud fuer B2C, Adobe Campaign fuer B2B",
|
||
"Lead-Generierung (Adobe) vs Loyalitaet (Salesforce)",
|
||
],
|
||
"monitoring": [
|
||
"APM (Dynatrace) misst Backend, RUM (SpeedCurve) misst Frontend",
|
||
],
|
||
"captcha": [
|
||
"Stufenweise Migration zu cookieless Captcha",
|
||
],
|
||
}
|
||
|
||
|
||
def _company_tier_bounds(company_tier: str | None) -> tuple[float, float]:
|
||
"""Wie viel der Listpreis-Range tatsaechlich verwenden — abhaengig
|
||
vom Company-Tier. Bei 'enterprise' / 'premier' nutzen wir den UPPER
|
||
Teil (50-100%) statt starter→premier.
|
||
"""
|
||
t = (company_tier or "professional").lower()
|
||
if t == "premier": return (0.70, 1.00)
|
||
if t == "enterprise": return (0.40, 0.85)
|
||
if t == "professional": return (0.20, 0.60)
|
||
return (0.05, 0.40) # 'sme' / starter
|
||
|
||
|
||
def _estimate_savings_for_redundancy(
|
||
redundancy: dict, vendors: Iterable[dict],
|
||
company_tier: str = "enterprise",
|
||
) -> dict:
|
||
"""Schaetzbereich pro Redundanz: derzeitige Kosten + EU-Konsolidierungs-Saving.
|
||
|
||
Beruecksichtigt den company_tier — wir wollen fuer ein Konzern wie
|
||
BMW nicht die starter-Range mit anzeigen. Realistic Range ergibt
|
||
sich aus tier_bounds × (low, high).
|
||
"""
|
||
low_frac, high_frac = _company_tier_bounds(company_tier)
|
||
current_low = current_high = 0
|
||
matched_vendors = []
|
||
cat_vendors = [v for v in vendors if v.get("name") in redundancy.get("vendors", [])]
|
||
for v in cat_vendors:
|
||
name = (v.get("name") or "").lower()
|
||
for k, (lo, hi, _tier) in _COST_LOOKUP.items():
|
||
if k in name:
|
||
# Tier-aware: nimm low_frac..high_frac des Pricing-Bereichs
|
||
span = hi - lo
|
||
current_low += int(lo + span * low_frac)
|
||
current_high += int(lo + span * high_frac)
|
||
matched_vendors.append(v.get("name"))
|
||
break
|
||
|
||
# Konsolidierung: ein einziges EU-Tool ersetzt alle in der Kategorie
|
||
suggested_eu = None
|
||
suggested_low = suggested_high = 0
|
||
# 1. Multi-Funktions-Tool das diese Kategorie abdeckt
|
||
for tool in _MULTI_FUNCTION_TOOLS:
|
||
if redundancy["category"] in tool["covers"]:
|
||
suggested_eu = tool["name"]
|
||
cost = _EU_ALT_COSTS.get(tool["name"])
|
||
if cost:
|
||
suggested_low, suggested_high = cost
|
||
break
|
||
# 2. Sonst: EU-Alternative aus den Eintraegen — ABER NUR FUR VENDORS
|
||
# AUS DER AKTUELLEN KATEGORIE (sonst kommt Userlike fuer Werbung)
|
||
if not suggested_eu:
|
||
for v in cat_vendors:
|
||
n = (v.get("name") or "").lower()
|
||
for k, alts in _EU_ALTERNATIVES.items():
|
||
if k in n and alts:
|
||
suggested_eu = alts[0]["name"]
|
||
cost = _EU_ALT_COSTS.get(alts[0]["name"])
|
||
if cost:
|
||
suggested_low, suggested_high = cost
|
||
break
|
||
if suggested_eu:
|
||
break
|
||
|
||
saving_low = max(0, current_low - suggested_high)
|
||
saving_high = max(0, current_high - suggested_low)
|
||
|
||
return {
|
||
"current_estimate_year_eur": [current_low, current_high],
|
||
"suggested_eu_tool": suggested_eu,
|
||
"suggested_estimate_year_eur": [suggested_low, suggested_high],
|
||
"estimated_saving_year_eur": [saving_low, saving_high],
|
||
"caveats": _DUPLICATION_CAVEATS.get(redundancy["category"], []),
|
||
"cost_disclaimer": (
|
||
"Schaetzbereich auf Basis oeffentlicher Listenpreise. Tatsaechliche "
|
||
"Vertragspreise koennen 30-70% niedriger liegen (Volumen, Bundling, "
|
||
"Konzern-Konditionen). Bitte mit der jeweiligen Einkaufsabteilung verifizieren."
|
||
),
|
||
}
|
||
|
||
|
||
# ─── Multi-Funktions-Tools (Konsolidierungs-Ankerpunkte) ───────────
|
||
|
||
_MULTI_FUNCTION_TOOLS = [
|
||
{
|
||
"name": "Matomo (Pro / Cloud EU)",
|
||
"vendor": "InnoCraft",
|
||
"country": "DE-self-host / EU",
|
||
"covers": ["web_analytics", "tag_management", "personalisation"],
|
||
"notes": "Ersetzt Adobe Analytics + GTM + Adobe Target in einem Tool. "
|
||
"100% DSGVO ohne Einwilligung wenn IP anonymisiert.",
|
||
},
|
||
{
|
||
"name": "SAP Customer Experience Suite",
|
||
"vendor": "SAP SE",
|
||
"country": "DE",
|
||
"covers": ["crm", "marketing_automation", "personalisation", "survey"],
|
||
"notes": "Ersetzt Salesforce + Adobe Campaign + Qualtrics. EU-Hosting, "
|
||
"tiefe ERP-Integration.",
|
||
},
|
||
{
|
||
"name": "IONOS Cloud (Compute + CDN + Storage + DNS)",
|
||
"vendor": "IONOS SE",
|
||
"country": "DE",
|
||
"covers": ["cloud_infra", "cdn", "monitoring"],
|
||
"notes": "Ersetzt AWS + Akamai + zusaetzliches Monitoring in einer "
|
||
"DE-Cloud (BSI C5).",
|
||
},
|
||
{
|
||
"name": "Userlike Suite",
|
||
"vendor": "Userlike UG",
|
||
"country": "DE",
|
||
"covers": ["chat", "consent_management"],
|
||
"notes": "Ersetzt Genesys Chat. Bietet eigenes Consent-Modul.",
|
||
},
|
||
{
|
||
"name": "Smart AdServer (Equativ)",
|
||
"vendor": "Equativ",
|
||
"country": "FR",
|
||
"covers": ["advertising"],
|
||
"notes": "Ersetzt Mehrfach-DSPs (Adform/Criteo/Outbrain/Taboola/Meta) "
|
||
"durch Programmatic+Direct-Sold EU-Stack.",
|
||
},
|
||
{
|
||
"name": "HERE Maps",
|
||
"vendor": "HERE Technologies",
|
||
"country": "DE",
|
||
"covers": ["maps"],
|
||
"notes": "Berliner Anbieter, professionelle Karten + Routing.",
|
||
},
|
||
{
|
||
"name": "Vimeo Pro EU (oder self-hosted BunnyStream)",
|
||
"vendor": "Vimeo / BunnyWay",
|
||
"country": "Multi / SI",
|
||
"covers": ["external_media"],
|
||
"notes": "Ersetzt YouTube-Embeds + JW Player in einem Player.",
|
||
},
|
||
{
|
||
"name": "LamaPoll",
|
||
"vendor": "Lamano GmbH",
|
||
"country": "DE",
|
||
"covers": ["survey"],
|
||
"notes": "DSGVO-Surveys aus Berlin. Ersetzt Qualtrics / Psyma.",
|
||
},
|
||
]
|
||
|
||
|
||
# ─── Analyse ─────────────────────────────────────────────────────────
|
||
|
||
def analyze(vendors: Iterable[dict], company_tier: str = "enterprise") -> dict:
|
||
"""Main entry. Returns categorised view + redundancies + EU options.
|
||
|
||
`company_tier` (starter|professional|enterprise|premier) steuert die
|
||
Cost-Range so dass z.B. fuer einen DAX-Konzern nicht starter-Preise
|
||
in der unteren Schranke landen.
|
||
"""
|
||
by_cat: dict[str, list[dict]] = defaultdict(list)
|
||
for v in vendors:
|
||
cat = classify_vendor(v.get("name", ""))
|
||
by_cat[cat].append(v)
|
||
|
||
# Redundancies: any category with ≥2 vendors (excl. site-internal cats)
|
||
skip_redundancy_cats = {"site_infra", "site_feature", "consent_management",
|
||
"auth", "other"}
|
||
all_vendors_list = list(vendors)
|
||
redundancies: list[dict] = []
|
||
for cat, vs in by_cat.items():
|
||
if cat in skip_redundancy_cats or len(vs) < 2:
|
||
continue
|
||
red = {
|
||
"category": cat,
|
||
"category_label": _CATEGORY_LABEL.get(cat, cat),
|
||
"count": len(vs),
|
||
"vendors": [v.get("name", "") for v in vs],
|
||
"consolidation_hint": _CONSOLIDATION_HINT.get(cat, ""),
|
||
}
|
||
red.update(_estimate_savings_for_redundancy(
|
||
red, all_vendors_list, company_tier))
|
||
redundancies.append(red)
|
||
redundancies.sort(key=lambda r: -(r.get("estimated_saving_year_eur") or [0, 0])[1])
|
||
|
||
# EU alternatives lookup
|
||
eu_alternatives: list[dict] = []
|
||
seen = set()
|
||
for v in vendors:
|
||
name = v.get("name") or ""
|
||
n_lower = name.lower()
|
||
for k, alts in _EU_ALTERNATIVES.items():
|
||
if k in n_lower and k not in seen:
|
||
eu_alternatives.append({
|
||
"current_vendor": name,
|
||
"current_recipient_type": v.get("recipient_type", ""),
|
||
"matched_key": k,
|
||
"alternatives": alts,
|
||
})
|
||
seen.add(k)
|
||
break
|
||
|
||
# Multi-function tool recommendations: only if the customer has vendors
|
||
# across the categories the tool covers
|
||
present_cats = set(by_cat.keys())
|
||
multi_function = []
|
||
for tool in _MULTI_FUNCTION_TOOLS:
|
||
covered_here = [c for c in tool["covers"] if c in present_cats]
|
||
if len(covered_here) >= 2:
|
||
# Vendor-Namen sammeln statt nur summieren — dedupliziert
|
||
unique_vendors: set[str] = set()
|
||
for c in covered_here:
|
||
for v in by_cat[c]:
|
||
unique_vendors.add(v.get("name", ""))
|
||
multi_function.append({
|
||
**tool,
|
||
"replaces_categories": covered_here,
|
||
"potential_replacements": len(unique_vendors),
|
||
})
|
||
multi_function.sort(key=lambda t: -t["potential_replacements"])
|
||
|
||
total_current_low = sum((r.get("current_estimate_year_eur") or [0, 0])[0] for r in redundancies)
|
||
total_current_high = sum((r.get("current_estimate_year_eur") or [0, 0])[1] for r in redundancies)
|
||
total_saving_low = sum((r.get("estimated_saving_year_eur") or [0, 0])[0] for r in redundancies)
|
||
total_saving_high = sum((r.get("estimated_saving_year_eur") or [0, 0])[1] for r in redundancies)
|
||
|
||
return {
|
||
"summary": {
|
||
"total_vendors": len(all_vendors_list),
|
||
"distinct_categories": len([c for c in by_cat if c != "other"]),
|
||
"redundancy_count": len(redundancies),
|
||
"eu_alternative_count": len(eu_alternatives),
|
||
"consolidation_potential": sum(r["count"] - 1 for r in redundancies),
|
||
"estimated_current_year_eur": [total_current_low, total_current_high],
|
||
"estimated_saving_year_eur": [total_saving_low, total_saving_high],
|
||
"estimated_saving_pct": (
|
||
# Beide Bounds gegen denselben Nenner (Mittelwert der
|
||
# aktuellen Schaetzung) — sonst explodiert die obere
|
||
# Schranke wenn current_low klein ist. Cap auf 95%.
|
||
(lambda mid: (
|
||
f"{min(95, int(100 * total_saving_low / mid))}–"
|
||
f"{min(95, int(100 * total_saving_high / mid))}%"
|
||
))((total_current_low + total_current_high) / 2)
|
||
if total_current_high else "n/a"
|
||
),
|
||
"cost_disclaimer": (
|
||
"Schaetzbereich auf Basis oeffentlicher Listenpreise (Gartner, Forrester 2025). "
|
||
"Vertragspreise koennen 30-70% niedriger liegen (Volumen-Rabatte, Konzern-Konditionen, "
|
||
"Bundling). Werte dienen als Diskussionsgrundlage mit dem Einkauf, NICHT als Angebot."
|
||
),
|
||
},
|
||
"by_category": {cat: [v.get("name", "") for v in vs]
|
||
for cat, vs in by_cat.items()},
|
||
"redundancies": redundancies,
|
||
"eu_alternatives": eu_alternatives,
|
||
"multi_function_tools": multi_function,
|
||
}
|
||
|
||
|
||
_CATEGORY_LABEL = {
|
||
"web_analytics": "Web-Analytics",
|
||
"advertising": "Werbung / Retargeting",
|
||
"tag_management": "Tag-Management",
|
||
"marketing_automation": "Marketing-Automation",
|
||
"personalisation": "Personalisierung",
|
||
"external_media": "Externe Medien (Video)",
|
||
"maps": "Karten / Geo",
|
||
"cdn": "CDN",
|
||
"cloud_infra": "Cloud-Infrastruktur",
|
||
"monitoring": "Performance-Monitoring",
|
||
"crm": "CRM",
|
||
"chat": "Chat / Support",
|
||
"captcha": "Bot-Schutz",
|
||
"lead_tracking": "Lead-Tracking",
|
||
"survey": "Umfragen",
|
||
"social_aggregator": "Social-Media-Aggregation",
|
||
"consent_management": "Consent-Management",
|
||
"auth": "Authentifizierung",
|
||
"site_infra": "Eigene Infrastruktur",
|
||
"site_feature": "Eigene Features",
|
||
"other": "Sonstige",
|
||
}
|
||
|
||
_CONSOLIDATION_HINT = {
|
||
"web_analytics": "Mehrere Analytics-Tools sammeln meist redundante Daten. Ein Tool genuegt — Matomo (DE) ist DSGVO-Standard.",
|
||
"advertising": "Werbe-/Retargeting-Pixel sind oft austauschbar. Konzentration auf 2-3 Kanaele senkt Drittland-Risiko.",
|
||
"external_media": "Mehrere Video-Embeds nur wenn fachlich noetig. Self-hosted (BunnyStream/Vimeo) reduziert Tracking.",
|
||
"maps": "Eine Karten-Loesung reicht. HERE Maps (DE) als EU-Alternative zu Google Maps.",
|
||
"cdn": "Ein CDN+Performance-Stack genuegt. IONOS oder Bunny vereinen mehrere Funktionen.",
|
||
"marketing_automation": "Marketing-Cloud + separates E-Mail-Tool sind oft Dopplung — SAP CX oder CleverReach allein moeglich.",
|
||
"chat": "Ein Chat-System genuegt. Userlike (DE) ersetzt Genesys-Stack.",
|
||
"monitoring": "RUM + APM koennen in einem Tool gebuendelt werden (Dynatrace EU oder Sentry-Self-host).",
|
||
"survey": "Eine Survey-Plattform genuegt — LamaPoll (DE) oder Mapp.",
|
||
}
|