The first BMW VVT table rendered all 24 providers at 20% score because
the ePaaS extractor was reading the wrong field names. Actual schema is
nested: providers[].processings[].persistences[], NOT providers[] alone.
Correct ePaaS schema (verified against bmw.com/epaas/.../de_DE.epaas.json):
Provider: {id, name, description, processings[]}
Processing: {id, name, description, categoryId, optOutLink,
privacyPolicyLink, persistences[]}
Persistence: {id, name, domain, type, expiry, description}
Two structural changes:
1. One row per processing (not provider). BMW has 26 providers but ~91
processings spread across them (Adobe alone has ACMProcessing,
AdobeAnalytics, AdobeCampaign, AdobeTargetAnalytics, AdobeTargetPers.).
The cookie widget displays each processing separately — VVT now
mirrors that. Display name format: 'Provider Name — Processing Name'.
2. Read optOutLink/privacyPolicyLink from PROCESSING (where they live),
not provider. Persistences flatten to cookies[] with name + expiry +
description.
Plus category mapping:
advertising -> marketing
strictlyNecessary -> necessary
statistics -> statistics
functional -> functional
Category-aware scoring (cookie_link_validator.score_vendors):
- 'necessary' (technisch erforderliche, §25 Abs. 2 TDDDG): no opt-out
required, no country required. Score weight shifts to purpose +
cookie disclosure (essential cookies must list names + expiry).
- All other categories: opt-out URL still mandatory; missing opt-out
flags 'no_opt_out_url' and zeros that block of points.
Expected BMW result after this fix:
- ~91 rows (Adobe Analytics, Adform Retargeting, Akamai Infrastructure,
AWS, ..., plus ~60 strictlyNecessary processings)
- Marketing rows with present opt-out → ~75-90%
- Necessary rows with cookie+expiry → ~85-95%
- Rows missing fields → still flagged
When a known CMP (ePaaS, OneTrust) renders the cookie policy, we now
extract structured vendor records, probe their opt-out + privacy URLs,
score each vendor (0-100), and append a 'VVT-Vorschlag' table to the
compliance email — one row per vendor, sortable by compliance score.
consent-tester:
- DSIDiscoveryResult.cmp_payloads: surfaces raw CMP JSON to callers
- DSIDiscoveryResponse: new cmp_payloads field
- discover_dsi_documents sets cmp_payloads from cmp_capture
- cmp_library/{epaas,onetrust}.py: new extract_vendors(d) returning
list[VendorRecord]
backend:
- _fetch_text() now returns (text, cmp_payloads) tuple
- doc_entries store cmp_payloads per doc (mostly cookie)
- _autodiscover_missing forwards homepage payloads to the cookie entry
- New module vendor_extractor.py: dispatches ePaaS/OneTrust/generic
schemas; dedupes vendors across multiple payloads
- cookie_link_validator.py extended with validate_vendor_urls(vendors)
and score_vendors(vendors) — 0-100 score per vendor based on name,
purpose, country, opt-out reachable, privacy URL reachable, cookies
with names + expiry
- agent_doc_check_extras.build_vvt_table_html: renders the table
- Route appends VVT HTML after the provider list, before the
document-by-document report
- Response JSON gains cmp_vendors for future frontend rendering
Example for BMW: ~30 ePaaS providers → table with Name | Kategorie |
Sitz | Cookies | Opt-Out (✓/✗) | Privacy (✓/✗) | Score. Sorted by
score ascending so the worst-compliant vendors are at the top.
cookie_checks.py:
- cookie_names_listed: now also matches CMP placeholder notation
(BMW: 'Adfpc###', 'CT###') and 'Diese Datenverarbeitung verwendet die
folgenden Cookies oder ähnliche Technologien' as list-shape signal.
Cryptic vendor names like 'audience', 'adformfrpid' are accepted via
the surrounding markup, not by hard-coding each one.
- cookie_providers_named: new pattern 'Gesetzt von: <Firma>' (BMW/ePaaS
per-cookie vendor naming) + recognition of full legal-form names
(Adform A/S, BMW AG, Adobe Systems Software Ireland Limited).
- cookie_duration_values: now matches 'Ablauf: 1 Jahr' / 'Speicherdauer:
30 Tage' (BMW format) in addition to the legacy '<n> <unit>'.
New L1 + L2 checks for controller in cookie-policy:
- cookie_controller (L1): the cookie policy must name Verantwortlich(er)
- cookie_controller_address (L2): PLZ + Ort or address keywords
- cookie_controller_contact_or_link (L2): email/phone OR link back to
Datenschutzerklärung (the practical equivalent — BMW does this)
New L2 checks (parented under opt_out):
- cookie_optout_links: detects per-provider opt-out URLs in the text
- cookie_privacy_policy_links: per-provider privacy-policy URLs
New service: cookie_link_validator.py
- extract_links(text): pulls all https?://… URLs that follow 'Opt-Out
Link:' / 'Link zur Privacy Policy:' (deduped)
- validate_links(links): probes every URL concurrently (HEAD first, GET
fallback for 405/403). 10 parallel, 8s per request, 60s batch cap.
Returns reachable=True/False + status + final_url.
- build_check_items(): renders 2 CheckItems (opt-out + privacy-policy),
each pass if ALL links 2xx/3xx, fail with up-to-5 broken-link examples.
Hook in _check_single: doc_type=='cookie' triggers the validator after
regex+MC checks. Recomputes correctness with the new L2 items.
This addresses two concrete BMW observations:
1. BMW's per-cookie structure (Name + Zweck + Ablauf, Gesetzt von: …,
Opt-Out Link: …) now recognised → 'Konkrete Cookie-Namen aufgelistet'
and 'Konkrete Speicherdauern' should pass.
2. Defective opt-out URLs surface as compliance findings rather than
silently passing — Art. 7(3) DSGVO requires a working withdrawal
path per provider.