feat(audit): Phase 1 Quick-Wins (P81 + P85 + P70 + P83) + TCF DELETE/INSERT-Fix
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 38s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / test-go (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 38s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / test-go (push) Has been skipped
P81 — tests/fixtures/golden_truth/vw_de.json: GT-Fixture mit must_find_cookies (47 VW-Cookies) + expected_vendors (Google, Adobe, Trade Desk, ...). Basis fuer kuenftige Regression-Tests. P85 — banner_screenshot_block.py + consent_scanner.py + main.py: consent-tester macht beim Banner-Detect einen base64-PNG-Screenshot (< 1.5MB). Backend rendert ihn als <img src="data:..."> direkt nach dem GF-1-Pager. Visueller Beweis 'so sah das Banner aus' fuer Dispute mit Marketing/DSB. P70 — rag_provenance.py: classify_finding_provenance() klassifiziert ein Finding als 'rag' (Norm + Quelle), 'mixed' (Norm ohne Quelle) oder 'heuristic' (eigene Interpretation). provenance_badge_html() rendert kleine Badges (✓ RAG / NORM / ⚠ HEURISTIK). Modul ist generisch, kann bei jedem Finding-Renderer einklinkt werden. P83 — scripts/check-rebuild-needed.sh: Prueft ob die im Container deployten BUILD_SHA mit local HEAD uebereinstimmen. Bei Mismatch exit 1 mit 'REBUILD REQUIRED'-Hinweis. Verhindert das 'alter Code im Container'-Problem das uns mehrfach erwischt hat (Frontend-Tabs sichtbar, Backend ohne neuen Service). TCF-Fix — tcf_vendor_authority.py: cookie_library hat keinen UNIQUE-Index auf cookie_name → ON CONFLICT war unmoeglich. Loesung: vor Insert DELETE WHERE source_name='iab_tcf_v2'. Idempotent. + per-Vendor-Commit damit ein Fail die naechsten nicht blockt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1486,6 +1486,17 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
except Exception as e:
|
||||
logger.warning("P71 jc_avv_decision skipped: %s", e)
|
||||
|
||||
# P85 — Banner-Screenshot fuer visuellen Beweis (zwischen
|
||||
# GF-1-Pager und Detail-Bloecken)
|
||||
banner_shot_html = ""
|
||||
try:
|
||||
from compliance.services.banner_screenshot_block import (
|
||||
build_banner_screenshot_html,
|
||||
)
|
||||
banner_shot_html = build_banner_screenshot_html(banner_result)
|
||||
except Exception as e:
|
||||
logger.warning("P85 banner-screenshot skipped: %s", e)
|
||||
|
||||
# P82: GF-1-Pager ganz oben in der Mail — 5-Bullet-Zusammenfassung
|
||||
# damit die GF nicht 124k Char lesen muss.
|
||||
gf_one_pager_html = ""
|
||||
@@ -1585,6 +1596,7 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
+ critical_html + scope_disclaimer_html + exec_summary_html
|
||||
+ cookie_arch_html + summary_html + scanned_html + profile_html
|
||||
+ scorecard_html + redundancy_html
|
||||
+ banner_shot_html
|
||||
+ providers_html + banner_deep_html
|
||||
+ cookie_audit_html
|
||||
+ tcf_authority_html
|
||||
|
||||
@@ -0,0 +1,44 @@
|
||||
"""
|
||||
P85 — Banner-Screenshot-Block in der Mail.
|
||||
|
||||
Embedded den von consent-tester captured Screenshot des Banners
|
||||
(banner_result.banner_screenshot_b64) als data-URI <img> in die Mail.
|
||||
"so sah euer Banner zum Audit-Zeitpunkt aus" — visueller Beweis fuer
|
||||
Dispute mit Marketing-Team oder DSB.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def build_banner_screenshot_html(banner_result: dict | None) -> str:
|
||||
if not isinstance(banner_result, dict):
|
||||
return ""
|
||||
b64 = banner_result.get("banner_screenshot_b64") or ""
|
||||
if not b64 or len(b64) < 200:
|
||||
return ""
|
||||
provider = banner_result.get("banner_provider") or "Generic"
|
||||
detected = banner_result.get("banner_detected")
|
||||
return (
|
||||
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
|
||||
'max-width:760px;margin:0 auto 16px;padding:12px 16px;'
|
||||
'background:#f8fafc;border:1px solid #cbd5e1;border-radius:8px">'
|
||||
'<div style="font-size:11px;color:#475569;text-transform:uppercase;'
|
||||
'letter-spacing:1.2px;margin-bottom:4px;font-weight:600">'
|
||||
'Screenshot des Cookie-Banners zum Audit-Zeitpunkt</div>'
|
||||
f'<h3 style="margin:0 0 6px;font-size:13px;color:#1e293b">'
|
||||
f'Provider: <strong>{provider}</strong> · '
|
||||
f'erkannt: <strong>{"ja" if detected else "nein"}</strong></h3>'
|
||||
'<p style="margin:0 0 8px;font-size:11px;color:#64748b;line-height:1.5">'
|
||||
'Visueller Beweis wie das Banner zum Zeitpunkt des Audits angezeigt '
|
||||
'wurde. Bei spaeterer Aenderung des Banners bitte mit diesem '
|
||||
'Screenshot abgleichen.'
|
||||
'</p>'
|
||||
f'<img src="data:image/png;base64,{b64}" alt="Cookie-Banner" '
|
||||
f'style="max-width:100%;height:auto;border:1px solid #cbd5e1;'
|
||||
f'border-radius:4px;display:block">'
|
||||
'</div>'
|
||||
)
|
||||
@@ -0,0 +1,90 @@
|
||||
"""
|
||||
P70 — RAG-Provenance-Marker.
|
||||
|
||||
Wenn ein Finding aus dem RAG-Korpus belegt ist (z.B. Art-Match auf
|
||||
einen konkreten Gesetzes-Paragrafen aus dem ingestierten DSGVO/TDDDG/
|
||||
TMG-Korpus), bekommt es einen ✓-Marker. Wenn es nur aus unserer
|
||||
Heuristik kommt (Pattern-Match ohne RAG-Belegung), bekommt es ein ⚠
|
||||
"Heuristik".
|
||||
|
||||
Dadurch sieht der Nutzer sofort welche Aussagen rechtlich verbindlich
|
||||
gestuetzt sind vs welche unsere Eigeninterpretation sind.
|
||||
|
||||
Generisch: dataclass-aehnliche Funktion die ein Finding-dict klassifiziert.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import re
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# Pattern fuer "Belegt aus Korpus": Finding enthaelt expliziten
|
||||
# Norm-Bezug mit Artikel + Quelle.
|
||||
_NORM_RE = re.compile(
|
||||
r"(Art\.?\s*\d+(?:\s*Abs\.?\s*\d+)?(?:\s*lit\.?\s*[a-z])?\s*"
|
||||
r"(?:DSGVO|GDPR|TDDDG|TMG|BDSG|UWG|TKG|EuGH|EDPB)|"
|
||||
r"\(?(EU|VO)\s*\d{4}/\d+\)?|"
|
||||
r"§\s*\d+[a-z]?\s*(TMG|UWG|BDSG|TKG|TDDDG))",
|
||||
re.I,
|
||||
)
|
||||
|
||||
|
||||
def classify_finding_provenance(finding: dict) -> str:
|
||||
"""Returns 'rag', 'heuristic', or 'mixed'.
|
||||
|
||||
rag — Norm-Bezug + Quellen-URL (verbindlich)
|
||||
heuristic — Pattern-Match ohne Norm-Bezug (Eigeninterpretation)
|
||||
mixed — Norm-Bezug aber ohne Quellen-URL (teilweise belegbar)
|
||||
"""
|
||||
if not isinstance(finding, dict):
|
||||
return "heuristic"
|
||||
legal = (finding.get("legal_basis") or "").strip()
|
||||
detail = (finding.get("detail") or "").strip()
|
||||
rag_id = finding.get("rag_chunk_id")
|
||||
rag_url = finding.get("rag_source_url")
|
||||
blob = " ".join([legal, detail])
|
||||
has_norm = bool(_NORM_RE.search(blob))
|
||||
has_source = bool(rag_id or rag_url or
|
||||
"https://" in legal or "https://" in detail)
|
||||
if has_norm and has_source:
|
||||
return "rag"
|
||||
if has_norm:
|
||||
return "mixed"
|
||||
return "heuristic"
|
||||
|
||||
|
||||
def provenance_badge_html(provenance: str) -> str:
|
||||
if provenance == "rag":
|
||||
return (
|
||||
'<span style="background:#dcfce7;color:#166534;'
|
||||
'padding:1px 5px;border-radius:8px;font-size:9px;'
|
||||
'font-weight:600;margin-left:4px" '
|
||||
'title="Aussage durch RAG-Korpus belegt (Gesetzestext + Quelle)">'
|
||||
'✓ RAG</span>'
|
||||
)
|
||||
if provenance == "mixed":
|
||||
return (
|
||||
'<span style="background:#dbeafe;color:#1e40af;'
|
||||
'padding:1px 5px;border-radius:8px;font-size:9px;'
|
||||
'font-weight:600;margin-left:4px" '
|
||||
'title="Norm-Bezug ohne direkte Quellen-URL">'
|
||||
'NORM</span>'
|
||||
)
|
||||
return (
|
||||
'<span style="background:#f1f5f9;color:#475569;'
|
||||
'padding:1px 5px;border-radius:8px;font-size:9px;'
|
||||
'font-weight:600;margin-left:4px" '
|
||||
'title="Heuristik / Eigeninterpretation ohne Korpus-Beleg">'
|
||||
'⚠ HEURISTIK</span>'
|
||||
)
|
||||
|
||||
|
||||
def annotate_findings(findings: list[dict]) -> list[dict]:
|
||||
"""In-place: setzt finding['provenance'] auf jeden Eintrag."""
|
||||
for f in (findings or []):
|
||||
if isinstance(f, dict) and "provenance" not in f:
|
||||
f["provenance"] = classify_finding_provenance(f)
|
||||
return findings
|
||||
@@ -81,6 +81,12 @@ async def fetch_and_ingest_tcf_vendors(db: Session) -> dict:
|
||||
if not vendors:
|
||||
return {"error": "no vendors in TCF response", "n_vendors": 0}
|
||||
|
||||
# Erst alte TCF-Eintraege weg (kein UNIQUE-Index auf cookie_name,
|
||||
# daher kein ON CONFLICT moeglich → idempotent via DELETE+INSERT).
|
||||
db.execute(sa_text(
|
||||
"DELETE FROM compliance.cookie_library WHERE source_name='iab_tcf_v2'"
|
||||
))
|
||||
db.commit()
|
||||
inserted = 0
|
||||
skipped = 0
|
||||
for vid, v in vendors.items():
|
||||
@@ -106,13 +112,6 @@ async def fetch_and_ingest_tcf_vendors(db: Session) -> dict:
|
||||
VALUES (:n, :dp, :v, :pu, :cat, :purp, 'iab_tcf_v2',
|
||||
'https://vendor-list.consensu.org/v3/vendor-list.json',
|
||||
0.99)
|
||||
ON CONFLICT (cookie_name) DO UPDATE
|
||||
SET actual_category = EXCLUDED.actual_category,
|
||||
vendor_name = EXCLUDED.vendor_name,
|
||||
vendor_privacy_url = EXCLUDED.vendor_privacy_url,
|
||||
purpose_en = EXCLUDED.purpose_en,
|
||||
source_name = EXCLUDED.source_name,
|
||||
confidence = EXCLUDED.confidence
|
||||
"""
|
||||
), {"n": marker, "dp": "*",
|
||||
"v": f"[TCF-{vid}] {name}",
|
||||
|
||||
@@ -0,0 +1,51 @@
|
||||
{
|
||||
"site": "Volkswagen Deutschland",
|
||||
"site_url": "https://www.volkswagen.de",
|
||||
"captured_at": "2026-05-22T00:00:00Z",
|
||||
"source": "User-Copy aus Cookie-Richtlinie (Browser Strg+A → Strg+C)",
|
||||
"cookie_richtlinie_url": "https://www.volkswagen.de/de/mehr/rechtliches/cookie-richtlinie.html",
|
||||
"expectations": {
|
||||
"min_declared_cookies": 90,
|
||||
"expected_unique_vendors_after_dedup": 18,
|
||||
"must_find_cookies": [
|
||||
"VWD6_ENSIGHTEN_PRIVACY_MODAL_LOADED",
|
||||
"VWD6_ENSIGHTEN_PRIVACY_MODAL_VIEWED",
|
||||
"smartSignals2UiD", "smartSignals2sUiD",
|
||||
"s_ecid", "s_cc", "s_sq",
|
||||
"AMCV_", "AMCVS_", "demdex", "dextp",
|
||||
"mbox", "mboxEdgeCluster",
|
||||
"TDID", "TDCPM", "TTDOptOut",
|
||||
"DSID", "ANID", "AID", "IDE", "TAID",
|
||||
"_gcl_au", "_gcl_dc", "_fbc", "_fbp", "fr",
|
||||
"_pk_uid",
|
||||
"OptanonConsent",
|
||||
"everest_g_v2", "everest_session_v2",
|
||||
"adbCDP",
|
||||
"liveagent_sid", "liveagent_chatted",
|
||||
"X-Salesforce-eLB", "sfdc-stream",
|
||||
"__cfduid", "__cflb",
|
||||
"FPAU", "FPGCLDC", "FLC", "APC",
|
||||
"wlfeDoLogin", "wlfeRefreshSessionId", "LBCOOKIE",
|
||||
"CookieConsentPolicy",
|
||||
"BrowserId", "BrowserId_sec",
|
||||
"inbenta-km-session-id"
|
||||
],
|
||||
"expected_vendors_present": [
|
||||
"Google",
|
||||
"Adobe Experience Cloud",
|
||||
"Adobe Analytics",
|
||||
"The Trade Desk",
|
||||
"AdForm",
|
||||
"Meta / Facebook",
|
||||
"Salesforce",
|
||||
"Cloudflare",
|
||||
"Borlabs"
|
||||
],
|
||||
"expected_high_findings_minimum": 1,
|
||||
"banner_must_be_detected": true,
|
||||
"expected_doc_types_with_text": [
|
||||
"dse", "cookie", "impressum", "nutzungsbedingungen"
|
||||
]
|
||||
},
|
||||
"raw_paste": "Name des Cookies\nKategorie\nVerwendungszweck\nSpeicherdauer\nArt des Cookies\nSee tests/fixtures/cookie_gt/vw_cookie_richtlinie.txt for the abbreviated raw form."
|
||||
}
|
||||
Reference in New Issue
Block a user