feat(audit): Phase 1 Quick-Wins (P81 + P85 + P70 + P83) + TCF DELETE/INSERT-Fix
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 38s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / test-go (push) Has been skipped

P81 — tests/fixtures/golden_truth/vw_de.json:
GT-Fixture mit must_find_cookies (47 VW-Cookies) + expected_vendors
(Google, Adobe, Trade Desk, ...). Basis fuer kuenftige Regression-Tests.

P85 — banner_screenshot_block.py + consent_scanner.py + main.py:
consent-tester macht beim Banner-Detect einen base64-PNG-Screenshot
(< 1.5MB). Backend rendert ihn als <img src="data:..."> direkt nach
dem GF-1-Pager. Visueller Beweis 'so sah das Banner aus' fuer Dispute
mit Marketing/DSB.

P70 — rag_provenance.py:
classify_finding_provenance() klassifiziert ein Finding als 'rag'
(Norm + Quelle), 'mixed' (Norm ohne Quelle) oder 'heuristic' (eigene
Interpretation). provenance_badge_html() rendert kleine Badges
(✓ RAG / NORM / ⚠ HEURISTIK). Modul ist generisch, kann bei jedem
Finding-Renderer einklinkt werden.

P83 — scripts/check-rebuild-needed.sh:
Prueft ob die im Container deployten BUILD_SHA mit local HEAD
uebereinstimmen. Bei Mismatch exit 1 mit 'REBUILD REQUIRED'-Hinweis.
Verhindert das 'alter Code im Container'-Problem das uns mehrfach
erwischt hat (Frontend-Tabs sichtbar, Backend ohne neuen Service).

TCF-Fix — tcf_vendor_authority.py:
cookie_library hat keinen UNIQUE-Index auf cookie_name → ON CONFLICT
war unmoeglich. Loesung: vor Insert DELETE WHERE source_name='iab_tcf_v2'.
Idempotent. + per-Vendor-Commit damit ein Fail die naechsten nicht blockt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-22 08:24:46 +02:00
parent 6c35bcf116
commit 8cbb513e2c
8 changed files with 269 additions and 7 deletions
@@ -1486,6 +1486,17 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
except Exception as e:
logger.warning("P71 jc_avv_decision skipped: %s", e)
# P85 — Banner-Screenshot fuer visuellen Beweis (zwischen
# GF-1-Pager und Detail-Bloecken)
banner_shot_html = ""
try:
from compliance.services.banner_screenshot_block import (
build_banner_screenshot_html,
)
banner_shot_html = build_banner_screenshot_html(banner_result)
except Exception as e:
logger.warning("P85 banner-screenshot skipped: %s", e)
# P82: GF-1-Pager ganz oben in der Mail — 5-Bullet-Zusammenfassung
# damit die GF nicht 124k Char lesen muss.
gf_one_pager_html = ""
@@ -1585,6 +1596,7 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
+ critical_html + scope_disclaimer_html + exec_summary_html
+ cookie_arch_html + summary_html + scanned_html + profile_html
+ scorecard_html + redundancy_html
+ banner_shot_html
+ providers_html + banner_deep_html
+ cookie_audit_html
+ tcf_authority_html
@@ -0,0 +1,44 @@
"""
P85 — Banner-Screenshot-Block in der Mail.
Embedded den von consent-tester captured Screenshot des Banners
(banner_result.banner_screenshot_b64) als data-URI <img> in die Mail.
"so sah euer Banner zum Audit-Zeitpunkt aus" — visueller Beweis fuer
Dispute mit Marketing-Team oder DSB.
"""
from __future__ import annotations
import logging
logger = logging.getLogger(__name__)
def build_banner_screenshot_html(banner_result: dict | None) -> str:
if not isinstance(banner_result, dict):
return ""
b64 = banner_result.get("banner_screenshot_b64") or ""
if not b64 or len(b64) < 200:
return ""
provider = banner_result.get("banner_provider") or "Generic"
detected = banner_result.get("banner_detected")
return (
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
'max-width:760px;margin:0 auto 16px;padding:12px 16px;'
'background:#f8fafc;border:1px solid #cbd5e1;border-radius:8px">'
'<div style="font-size:11px;color:#475569;text-transform:uppercase;'
'letter-spacing:1.2px;margin-bottom:4px;font-weight:600">'
'Screenshot des Cookie-Banners zum Audit-Zeitpunkt</div>'
f'<h3 style="margin:0 0 6px;font-size:13px;color:#1e293b">'
f'Provider: <strong>{provider}</strong> · '
f'erkannt: <strong>{"ja" if detected else "nein"}</strong></h3>'
'<p style="margin:0 0 8px;font-size:11px;color:#64748b;line-height:1.5">'
'Visueller Beweis wie das Banner zum Zeitpunkt des Audits angezeigt '
'wurde. Bei spaeterer Aenderung des Banners bitte mit diesem '
'Screenshot abgleichen.'
'</p>'
f'<img src="data:image/png;base64,{b64}" alt="Cookie-Banner" '
f'style="max-width:100%;height:auto;border:1px solid #cbd5e1;'
f'border-radius:4px;display:block">'
'</div>'
)
@@ -0,0 +1,90 @@
"""
P70 — RAG-Provenance-Marker.
Wenn ein Finding aus dem RAG-Korpus belegt ist (z.B. Art-Match auf
einen konkreten Gesetzes-Paragrafen aus dem ingestierten DSGVO/TDDDG/
TMG-Korpus), bekommt es einen ✓-Marker. Wenn es nur aus unserer
Heuristik kommt (Pattern-Match ohne RAG-Belegung), bekommt es ein ⚠
"Heuristik".
Dadurch sieht der Nutzer sofort welche Aussagen rechtlich verbindlich
gestuetzt sind vs welche unsere Eigeninterpretation sind.
Generisch: dataclass-aehnliche Funktion die ein Finding-dict klassifiziert.
"""
from __future__ import annotations
import logging
import re
logger = logging.getLogger(__name__)
# Pattern fuer "Belegt aus Korpus": Finding enthaelt expliziten
# Norm-Bezug mit Artikel + Quelle.
_NORM_RE = re.compile(
r"(Art\.?\s*\d+(?:\s*Abs\.?\s*\d+)?(?:\s*lit\.?\s*[a-z])?\s*"
r"(?:DSGVO|GDPR|TDDDG|TMG|BDSG|UWG|TKG|EuGH|EDPB)|"
r"\(?(EU|VO)\s*\d{4}/\d+\)?|"
r"§\s*\d+[a-z]?\s*(TMG|UWG|BDSG|TKG|TDDDG))",
re.I,
)
def classify_finding_provenance(finding: dict) -> str:
"""Returns 'rag', 'heuristic', or 'mixed'.
rag — Norm-Bezug + Quellen-URL (verbindlich)
heuristic — Pattern-Match ohne Norm-Bezug (Eigeninterpretation)
mixed — Norm-Bezug aber ohne Quellen-URL (teilweise belegbar)
"""
if not isinstance(finding, dict):
return "heuristic"
legal = (finding.get("legal_basis") or "").strip()
detail = (finding.get("detail") or "").strip()
rag_id = finding.get("rag_chunk_id")
rag_url = finding.get("rag_source_url")
blob = " ".join([legal, detail])
has_norm = bool(_NORM_RE.search(blob))
has_source = bool(rag_id or rag_url or
"https://" in legal or "https://" in detail)
if has_norm and has_source:
return "rag"
if has_norm:
return "mixed"
return "heuristic"
def provenance_badge_html(provenance: str) -> str:
if provenance == "rag":
return (
'<span style="background:#dcfce7;color:#166534;'
'padding:1px 5px;border-radius:8px;font-size:9px;'
'font-weight:600;margin-left:4px" '
'title="Aussage durch RAG-Korpus belegt (Gesetzestext + Quelle)">'
'✓ RAG</span>'
)
if provenance == "mixed":
return (
'<span style="background:#dbeafe;color:#1e40af;'
'padding:1px 5px;border-radius:8px;font-size:9px;'
'font-weight:600;margin-left:4px" '
'title="Norm-Bezug ohne direkte Quellen-URL">'
'NORM</span>'
)
return (
'<span style="background:#f1f5f9;color:#475569;'
'padding:1px 5px;border-radius:8px;font-size:9px;'
'font-weight:600;margin-left:4px" '
'title="Heuristik / Eigeninterpretation ohne Korpus-Beleg">'
'⚠ HEURISTIK</span>'
)
def annotate_findings(findings: list[dict]) -> list[dict]:
"""In-place: setzt finding['provenance'] auf jeden Eintrag."""
for f in (findings or []):
if isinstance(f, dict) and "provenance" not in f:
f["provenance"] = classify_finding_provenance(f)
return findings
@@ -81,6 +81,12 @@ async def fetch_and_ingest_tcf_vendors(db: Session) -> dict:
if not vendors:
return {"error": "no vendors in TCF response", "n_vendors": 0}
# Erst alte TCF-Eintraege weg (kein UNIQUE-Index auf cookie_name,
# daher kein ON CONFLICT moeglich → idempotent via DELETE+INSERT).
db.execute(sa_text(
"DELETE FROM compliance.cookie_library WHERE source_name='iab_tcf_v2'"
))
db.commit()
inserted = 0
skipped = 0
for vid, v in vendors.items():
@@ -106,13 +112,6 @@ async def fetch_and_ingest_tcf_vendors(db: Session) -> dict:
VALUES (:n, :dp, :v, :pu, :cat, :purp, 'iab_tcf_v2',
'https://vendor-list.consensu.org/v3/vendor-list.json',
0.99)
ON CONFLICT (cookie_name) DO UPDATE
SET actual_category = EXCLUDED.actual_category,
vendor_name = EXCLUDED.vendor_name,
vendor_privacy_url = EXCLUDED.vendor_privacy_url,
purpose_en = EXCLUDED.purpose_en,
source_name = EXCLUDED.source_name,
confidence = EXCLUDED.confidence
"""
), {"n": marker, "dp": "*",
"v": f"[TCF-{vid}] {name}",
@@ -0,0 +1,51 @@
{
"site": "Volkswagen Deutschland",
"site_url": "https://www.volkswagen.de",
"captured_at": "2026-05-22T00:00:00Z",
"source": "User-Copy aus Cookie-Richtlinie (Browser Strg+A → Strg+C)",
"cookie_richtlinie_url": "https://www.volkswagen.de/de/mehr/rechtliches/cookie-richtlinie.html",
"expectations": {
"min_declared_cookies": 90,
"expected_unique_vendors_after_dedup": 18,
"must_find_cookies": [
"VWD6_ENSIGHTEN_PRIVACY_MODAL_LOADED",
"VWD6_ENSIGHTEN_PRIVACY_MODAL_VIEWED",
"smartSignals2UiD", "smartSignals2sUiD",
"s_ecid", "s_cc", "s_sq",
"AMCV_", "AMCVS_", "demdex", "dextp",
"mbox", "mboxEdgeCluster",
"TDID", "TDCPM", "TTDOptOut",
"DSID", "ANID", "AID", "IDE", "TAID",
"_gcl_au", "_gcl_dc", "_fbc", "_fbp", "fr",
"_pk_uid",
"OptanonConsent",
"everest_g_v2", "everest_session_v2",
"adbCDP",
"liveagent_sid", "liveagent_chatted",
"X-Salesforce-eLB", "sfdc-stream",
"__cfduid", "__cflb",
"FPAU", "FPGCLDC", "FLC", "APC",
"wlfeDoLogin", "wlfeRefreshSessionId", "LBCOOKIE",
"CookieConsentPolicy",
"BrowserId", "BrowserId_sec",
"inbenta-km-session-id"
],
"expected_vendors_present": [
"Google",
"Adobe Experience Cloud",
"Adobe Analytics",
"The Trade Desk",
"AdForm",
"Meta / Facebook",
"Salesforce",
"Cloudflare",
"Borlabs"
],
"expected_high_findings_minimum": 1,
"banner_must_be_detected": true,
"expected_doc_types_with_text": [
"dse", "cookie", "impressum", "nutzungsbedingungen"
]
},
"raw_paste": "Name des Cookies\nKategorie\nVerwendungszweck\nSpeicherdauer\nArt des Cookies\nSee tests/fixtures/cookie_gt/vw_cookie_richtlinie.txt for the abbreviated raw form."
}
+2
View File
@@ -53,6 +53,7 @@ class ScanResponse(BaseModel):
cmp_payloads: list[dict] = [] # P48: raw CMP JSON-payloads (Usercentrics/OneTrust/...) captured during scan
vendor_details: list[dict] = [] # P50: per-vendor detail-modal-extracts (Beschreibung/Cookies/Opt-Out/Privacy)
cookies_detailed: list[dict] = [] # P59b: full cookie details for behavior-validation (name,value,domain,expires,phase,declared_category)
banner_screenshot_b64: str = "" # P85: base64-PNG des Banners (initial-view)
@app.get("/health")
@@ -133,6 +134,7 @@ async def scan_consent(req: ScanRequest):
cmp_payloads=result.cmp_payloads, # P48
vendor_details=result.vendor_details, # P50
cookies_detailed=result.cookies_detailed, # P59b
banner_screenshot_b64=result.banner_screenshot_b64, # P85
)
@@ -77,6 +77,10 @@ class ConsentTestResult:
# for behavior-validation in backend. Implicit declared_category:
# before/reject phase = essential (site claims), accept = any.
cookies_detailed: list = field(default_factory=list)
# P85: base64-PNG-Screenshot des Banners vor dem ersten Klick.
# Backend embedded das als <img> in der Mail — visueller Beweis
# "so sah das Banner zum Audit-Zeitpunkt aus".
banner_screenshot_b64: str = ""
async def run_consent_test(
@@ -196,6 +200,17 @@ async def run_consent_test(
result.banner_text_violations = banner_violations["violations"]
result.banner_has_impressum_link = banner_violations["has_impressum"]
result.banner_has_dse_link = banner_violations["has_dse"]
# P85 — visueller Beweis fuer die Mail.
try:
import base64 as _b64
png = await page_a.screenshot(
full_page=False, type="png", timeout=10000,
)
if png and len(png) < 1_500_000: # < 1.5 MB
result.banner_screenshot_b64 = _b64.b64encode(png).decode("ascii")
logger.info("P85: banner screenshot captured (%d bytes)", len(png))
except Exception as _se:
logger.warning("P85: banner screenshot failed: %s", _se)
await ctx_a.close()
+49
View File
@@ -0,0 +1,49 @@
#!/usr/bin/env bash
# P83 — verhindert "alter Code im Container"-Bug.
#
# Vergleicht den im Container deployten git-SHA mit dem aktuellen
# Source-SHA. Wenn abweichend → exit 1 mit Hinweis Build/Recreate.
#
# Aufruf-Beispiele:
# ./scripts/check-rebuild-needed.sh backend-compliance
# ./scripts/check-rebuild-needed.sh admin-compliance
# ./scripts/check-rebuild-needed.sh consent-tester
#
# CI-Verwendung: nach git push, vor dem ersten Health-Check.
# Lokal: claude / dev kann es via pre-merge-hook nutzen.
#
# Voraussetzung: Container hat BUILD_SHA env (gesetzt im Dockerfile via
# ARG BUILD_SHA + ENV BUILD_SHA=$BUILD_SHA). Falls leer → Warnung.
set -e
SERVICE="${1:-backend-compliance}"
CONTAINER="bp-compliance-${SERVICE#*-}" # backend-compliance → bp-compliance-backend
if [[ "$SERVICE" == "consent-tester" ]]; then
CONTAINER="bp-compliance-consent-tester"
fi
DOCKER="${DOCKER:-/usr/local/bin/docker}"
deployed_sha=$($DOCKER exec "$CONTAINER" sh -c 'echo "${BUILD_SHA:-unknown}"' 2>/dev/null || echo "container-down")
local_sha=$(git rev-parse --short HEAD)
if [[ "$deployed_sha" == "container-down" ]]; then
echo "❌ Container $CONTAINER is not running"
exit 2
fi
if [[ "$deployed_sha" == "unknown" ]]; then
echo "⚠️ $CONTAINER has no BUILD_SHA env — cannot verify."
echo " Add to Dockerfile: ARG BUILD_SHA / ENV BUILD_SHA=\$BUILD_SHA"
exit 0
fi
if [[ "$deployed_sha" != "$local_sha"* && "$local_sha" != "$deployed_sha"* ]]; then
echo "$CONTAINER is on commit $deployed_sha, local is $local_sha"
echo " REBUILD REQUIRED:"
echo " docker compose build $SERVICE && docker compose up -d --no-deps --force-recreate $SERVICE"
exit 1
fi
echo "$CONTAINER ($deployed_sha) matches local ($local_sha)"