feat(tcf-vendors): GVL cache + vendor extraction + VVT mapping
Build + Deploy / build-admin-compliance (push) Successful in 14s
Build + Deploy / build-backend-compliance (push) Successful in 16s
Build + Deploy / build-ai-sdk (push) Successful in 20s
Build + Deploy / build-developer-portal (push) Successful in 12s
Build + Deploy / build-tts (push) Successful in 15s
Build + Deploy / build-document-crawler (push) Successful in 13s
Build + Deploy / build-dsms-gateway (push) Successful in 13s
Build + Deploy / build-dsms-node (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m49s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 45s
CI / test-python-backend (push) Successful in 38s
CI / test-python-document-crawler (push) Successful in 26s
CI / test-python-dsms-gateway (push) Successful in 23s
CI / validate-canonical-controls (push) Successful in 15s
Build + Deploy / trigger-orca (push) Successful in 2m23s

Phase 1-2 of the closed quality loop:
- GVL cache (consent-tester/services/gvl_cache.py): downloads and caches
  IAB Global Vendor List with 24h TTL, resolves vendor IDs to names,
  purposes, policy URLs, retention, country
- Vendor extraction (consent_interceptor.py): extract_tcf_vendors()
  reads __tcfapi after accept phase, resolves via GVL
- Scan response: tcf_vendors field added to /scan endpoint
- VVT mapper (vendor_vvt_mapper.py): maps TCF vendors to VVT format
  with purpose labels, Rechtsgrundlage, Drittland detection
- Vendor cross-check (banner_cookie_cross_check.py): checks all TCF
  vendors against DSI text — missing vendors, undocumented transfers
- Compliance check integrates Step 3d: TCF vendors vs DSI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-12 18:18:50 +02:00
parent 979fe20ea5
commit c867478791
7 changed files with 392 additions and 2 deletions
@@ -143,3 +143,83 @@ def cross_check_banner_vs_cookie(
logger.info("Cross-check: %d findings (%d services, %d tracking before)",
len(findings), len(all_tracking), len(tracking_before))
return findings
def cross_check_vendors_vs_dsi(
vendors: list[dict],
dsi_text: str,
) -> list[dict]:
"""Cross-check: Are all TCF vendors documented in the DSI?
Checks per vendor:
1. Is the vendor mentioned by name?
2. Is third-country transfer documented (if non-EU)?
3. Is storage duration mentioned?
Returns list of CheckItem-compatible dicts.
"""
findings: list[dict] = []
dsi_lower = dsi_text.lower()
for v in vendors:
name = v.get("name", "")
name_lower = name.lower()
if not name_lower:
continue
# Check if vendor is mentioned in DSI
mentioned = any(kw in dsi_lower for kw in [
name_lower,
name_lower.replace(" ", ""),
name_lower.split()[0] if " " in name_lower else name_lower,
])
if not mentioned:
findings.append({
"id": f"vendor-{v.get('vendor_id', name_lower[:20])}",
"label": f"Verarbeiter '{name}' fehlt in DSI",
"passed": False,
"severity": "HIGH",
"level": 2,
"parent": None,
"skipped": False,
"matched_text": "",
"hint": (
f"Der Cookie-Banner listet '{name}' als Verarbeiter "
f"({v.get('zweck_kurz', 'unbekannt')}), aber die DSI "
f"erwaehnt diesen Dienst nicht. Art. 13(1)(e) DSGVO "
f"verlangt die Benennung aller Empfaenger."
),
"source": "vendor_cross_check",
})
# Check third-country transfer documentation
if v.get("drittland") and mentioned:
country = v.get("land", "Drittland")
transfer_mentioned = any(kw in dsi_lower for kw in [
name_lower + ".*" + "usa",
name_lower + ".*" + "drittland",
"scc", "standardvertragsklausel", "data privacy framework",
"angemessenheitsbeschluss",
])
if not transfer_mentioned:
findings.append({
"id": f"vendor-transfer-{v.get('vendor_id', '')}",
"label": f"Drittlandtransfer fuer '{name}' nicht dokumentiert",
"passed": False,
"severity": "MEDIUM",
"level": 2,
"parent": None,
"skipped": False,
"matched_text": "",
"hint": (
f"'{name}' verarbeitet Daten in {country} (ausserhalb EWR). "
f"Die DSI muss den Transfermechanismus benennen: "
f"SCC (Art. 46(2)(c)) oder DPF (Angemessenheitsbeschluss)."
),
"source": "vendor_cross_check",
})
logger.info("Vendor cross-check: %d findings for %d vendors",
len(findings), len(vendors))
return findings