fix(audit): P90-B1 — cmp_payloads bei kurzem DSE-Text nicht verwerfen
CI / detect-changes (push) Successful in 9s
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / test-go (push) Failing after 41s
CI / iace-gt-coverage (push) Successful in 25s
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-python-backend (push) Successful in 35s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped

BMW-Lauf 9811eba1 hatte 0 cmp_vendors obwohl consent-tester ePaaS 4x
captured (~393KB). Root-Cause in _fetch_text Z.1254:

  if merged and len(merged.split()) > 100:
      return merged, cmp_payloads

Wenn DSE/Cookie-URL nur kurzen SPA-Shell-Text liefert (BMW: 10 Worte),
greift die Schwelle nicht — Code faellt durch zum HTTP-Fallback der
return text, []  zurueckgibt. Die zuvor captured CMP-Payloads (ePaaS-JSON
mit allen Vendor-Daten) werden komplett verworfen.

Fix: vor dem HTTP-Fallback pruefen ob cmp_payloads vorhanden sind. Wenn ja,
diese zurueckgeben mit dem (kurzen) Text oder dem rekonstruierten
cmp_cookie_text. Auch ohne 100-Wort-Schwelle.

Effekt: BMW-VVT-Tabelle wird gefuellt (~90 Vendors aus ePaaS-JSON).
Mercedes/andere OEMs unveraendert.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-21 11:29:41 +02:00
parent 7938e377b6
commit 86b4a263d2
@@ -1256,6 +1256,20 @@ async def _fetch_text(url: str, doc_type: str = "") -> tuple[str, list[dict]]:
logger.info("Merged %d docs from %s (%d words)",
len(texts), url, len(merged.split()))
return merged, cmp_payloads
# P90-Bug-Fix: auch wenn DSE-Text zu kurz fuer 100-Wort-
# Schwelle ist, die captured CMP-Payloads NICHT verwerfen.
# BMW-Bug: DSE liefert 10 Wort SPA-Shell, aber ePaaS-JSON
# (393KB) wurde captured. Backend braucht die fuer
# extract_vendors_from_payloads (VVT-Tabelle).
if cmp_payloads:
logger.info(
"P90: keeping %d CMP payloads for %s despite "
"short text (%d words) — HTTP fallback runs in parallel",
len(cmp_payloads), url,
len((merged or cmp_cookie_text).split()),
)
fallback_text = merged or cmp_cookie_text or ""
return fallback_text, cmp_payloads
except Exception as e:
logger.warning("Consent-tester fetch failed for %s: %s", url, e)