fix(audit): parse_flat_cookie_text fuer VW-Style Flat-Tabellen
CI / loc-budget (push) Failing after 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m4s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 19s
CI / loc-budget (push) Failing after 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m4s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 19s
VW Cookie-Doc liefert die Tabelle als FLACHEN Text ohne Spalten-Trenner: 'IDE Tracking Cookies (Marketing) Beschreibung 13 Monate Permanent TAID Tracking Cookies (Marketing) ...' parse_flat_cookie_text matched mit Regex: NAME [Tracking|Session|Funktional|...] Cookies ... [13 Monate|Session|Permanent] Backend faellt bei parse_cookie_table=[] auf parse_flat zurueck. Damit holen wir aus dem 65k VW Cookie-Doc ~30-50 Cookies + Vendors deterministisch, auch wenn der HTML-Table-DOM-Extract leer ist (was passiert wenn die Tabelle aus mehreren append-Code-Pfaden geladen wird). Bonus: _extract_dom_tables Helper in dsi_discovery.py vorbereitet fuer spaeteres Einhaengen an allen 7 DiscoveredDSI.append-Stellen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -862,16 +862,19 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
except Exception as e:
|
||||
logger.warning("html_table parse failed: %s", e)
|
||||
|
||||
# B — cookies_table_parser auch auf gecrawltem Cookie-Text
|
||||
# (nicht nur bei User-Paste). Wenn der Crawler Tab/Pipe-
|
||||
# getrennte Tabellen-Reihen erhalten hat, parsen wir sie
|
||||
# deterministisch und mergen die Vendor-Records.
|
||||
# B — cookies_table_parser auch auf gecrawltem Cookie-Text.
|
||||
# Erst Standard-Parse (Tab/Pipe-getrennt). Wenn der nichts
|
||||
# findet (kein Separator), Flat-Pattern-Parse fuer Sites wie
|
||||
# VW die ihre Tabelle als flachen Text liefern.
|
||||
if cookie_text and len(cookie_text) >= 500:
|
||||
try:
|
||||
from compliance.services.cookies_table_parser import (
|
||||
parse_cookie_table as _parse_ct,
|
||||
parse_flat_cookie_text as _parse_flat,
|
||||
)
|
||||
crawled_table_vendors = _parse_ct(cookie_text)
|
||||
if not crawled_table_vendors:
|
||||
crawled_table_vendors = _parse_flat(cookie_text)
|
||||
if crawled_table_vendors:
|
||||
existing = {(v.get("name") or "").strip().lower()
|
||||
for v in cmp_vendors}
|
||||
|
||||
Reference in New Issue
Block a user