fix(vvt): correct ePaaS schema mapping + category-aware scoring

The first BMW VVT table rendered all 24 providers at 20% score because
the ePaaS extractor was reading the wrong field names. Actual schema is
nested: providers[].processings[].persistences[], NOT providers[] alone.

Correct ePaaS schema (verified against bmw.com/epaas/.../de_DE.epaas.json):
  Provider:    {id, name, description, processings[]}
  Processing:  {id, name, description, categoryId, optOutLink,
                privacyPolicyLink, persistences[]}
  Persistence: {id, name, domain, type, expiry, description}

Two structural changes:

1. One row per processing (not provider). BMW has 26 providers but ~91
   processings spread across them (Adobe alone has ACMProcessing,
   AdobeAnalytics, AdobeCampaign, AdobeTargetAnalytics, AdobeTargetPers.).
   The cookie widget displays each processing separately — VVT now
   mirrors that. Display name format: 'Provider Name — Processing Name'.

2. Read optOutLink/privacyPolicyLink from PROCESSING (where they live),
   not provider. Persistences flatten to cookies[] with name + expiry +
   description.

Plus category mapping:
  advertising -> marketing
  strictlyNecessary -> necessary
  statistics -> statistics
  functional -> functional

Category-aware scoring (cookie_link_validator.score_vendors):
- 'necessary' (technisch erforderliche, §25 Abs. 2 TDDDG): no opt-out
  required, no country required. Score weight shifts to purpose +
  cookie disclosure (essential cookies must list names + expiry).
- All other categories: opt-out URL still mandatory; missing opt-out
  flags 'no_opt_out_url' and zeros that block of points.

Expected BMW result after this fix:
- ~91 rows (Adobe Analytics, Adform Retargeting, Akamai Infrastructure,
  AWS, ..., plus ~60 strictlyNecessary processings)
- Marketing rows with present opt-out → ~75-90%
- Necessary rows with cookie+expiry → ~85-95%
- Rows missing fields → still flagged
This commit is contained in:
Benjamin Admin
2026-05-17 11:19:31 +02:00
parent 189918b043
commit 6c7d4c7552
2 changed files with 98 additions and 58 deletions
@@ -173,8 +173,16 @@ async def validate_vendor_urls(vendors: list[dict]) -> list[dict]:
def score_vendors(vendors: list[dict]) -> list[dict]:
"""Compute per-vendor compliance score (0-100) and flags. Mutates."""
"""Compute per-vendor compliance score (0-100) and flags. Mutates.
Category-aware: 'necessary' (technisch erforderliche Cookies) do NOT
require an opt-out — §25 Abs. 2 TDDDG. Penalising them for that would
be wrong; instead we require precise purpose + cookie disclosure.
"""
for v in vendors:
is_necessary = (v.get("category") or "").lower() in (
"necessary", "strictlynecessary",
)
score = 0
max_score = 0
flags: list[str] = []
@@ -186,50 +194,56 @@ def score_vendors(vendors: list[dict]) -> list[dict]:
else:
flags.append("no_name")
# Purpose — 15
max_score += 15
# Purpose — 20
max_score += 20
if v.get("purpose"):
score += 15
score += 20
else:
flags.append("no_purpose")
# Country (3rd-country transfer relevance) — 10
max_score += 10
if v.get("country"):
score += 10
else:
flags.append("no_country")
# Country (3rd-country transfer relevance) — only relevant for
# consent-based categories (otherwise irrelevant flag noise)
if not is_necessary:
max_score += 10
if v.get("country"):
score += 10
else:
flags.append("no_country")
# Opt-Out URL present + reachable — 25
max_score += 25
if not v.get("opt_out_url"):
flags.append("no_opt_out_url")
elif v.get("opt_out_ok") is False:
flags.append("broken_opt_out")
score += 5 # at least they tried
else:
score += 25
# Opt-Out URL — only for consent-based categories (§25 TDDDG)
if not is_necessary:
max_score += 25
if not v.get("opt_out_url"):
flags.append("no_opt_out_url")
elif v.get("opt_out_ok") is False:
flags.append("broken_opt_out")
score += 5
else:
score += 25
# Privacy policy URL present + reachable — 15
max_score += 15
# Privacy policy URL — relevant for all, but weight lower for necessary
weight = 10 if is_necessary else 15
max_score += weight
if not v.get("privacy_policy_url"):
flags.append("no_privacy_url")
elif v.get("privacy_ok") is False:
flags.append("broken_privacy_url")
score += 5
score += weight // 3
else:
score += 15
score += weight
# Cookies disclosed (names + expiry) — 15
max_score += 15
# Cookies disclosed (names + expiry) — higher weight for necessary
# (since that's mostly what they offer in lieu of opt-out)
weight = 50 if is_necessary else 15
max_score += weight
cookies = v.get("cookies") or []
if cookies:
named = sum(1 for c in cookies if c.get("name"))
with_expiry = sum(1 for c in cookies if c.get("expiry"))
if named >= 1 and with_expiry >= 1:
score += 15
score += weight
elif named >= 1:
score += 8
score += weight // 2
flags.append("cookies_no_expiry")
else:
flags.append("cookies_no_names")