feat(audit): Phase 1 Quick-Wins (P81 + P85 + P70 + P83) + TCF DELETE/INSERT-Fix
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 38s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / test-go (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 38s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / test-go (push) Has been skipped
P81 — tests/fixtures/golden_truth/vw_de.json: GT-Fixture mit must_find_cookies (47 VW-Cookies) + expected_vendors (Google, Adobe, Trade Desk, ...). Basis fuer kuenftige Regression-Tests. P85 — banner_screenshot_block.py + consent_scanner.py + main.py: consent-tester macht beim Banner-Detect einen base64-PNG-Screenshot (< 1.5MB). Backend rendert ihn als <img src="data:..."> direkt nach dem GF-1-Pager. Visueller Beweis 'so sah das Banner aus' fuer Dispute mit Marketing/DSB. P70 — rag_provenance.py: classify_finding_provenance() klassifiziert ein Finding als 'rag' (Norm + Quelle), 'mixed' (Norm ohne Quelle) oder 'heuristic' (eigene Interpretation). provenance_badge_html() rendert kleine Badges (✓ RAG / NORM / ⚠ HEURISTIK). Modul ist generisch, kann bei jedem Finding-Renderer einklinkt werden. P83 — scripts/check-rebuild-needed.sh: Prueft ob die im Container deployten BUILD_SHA mit local HEAD uebereinstimmen. Bei Mismatch exit 1 mit 'REBUILD REQUIRED'-Hinweis. Verhindert das 'alter Code im Container'-Problem das uns mehrfach erwischt hat (Frontend-Tabs sichtbar, Backend ohne neuen Service). TCF-Fix — tcf_vendor_authority.py: cookie_library hat keinen UNIQUE-Index auf cookie_name → ON CONFLICT war unmoeglich. Loesung: vor Insert DELETE WHERE source_name='iab_tcf_v2'. Idempotent. + per-Vendor-Commit damit ein Fail die naechsten nicht blockt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1486,6 +1486,17 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
except Exception as e:
|
||||
logger.warning("P71 jc_avv_decision skipped: %s", e)
|
||||
|
||||
# P85 — Banner-Screenshot fuer visuellen Beweis (zwischen
|
||||
# GF-1-Pager und Detail-Bloecken)
|
||||
banner_shot_html = ""
|
||||
try:
|
||||
from compliance.services.banner_screenshot_block import (
|
||||
build_banner_screenshot_html,
|
||||
)
|
||||
banner_shot_html = build_banner_screenshot_html(banner_result)
|
||||
except Exception as e:
|
||||
logger.warning("P85 banner-screenshot skipped: %s", e)
|
||||
|
||||
# P82: GF-1-Pager ganz oben in der Mail — 5-Bullet-Zusammenfassung
|
||||
# damit die GF nicht 124k Char lesen muss.
|
||||
gf_one_pager_html = ""
|
||||
@@ -1585,6 +1596,7 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
+ critical_html + scope_disclaimer_html + exec_summary_html
|
||||
+ cookie_arch_html + summary_html + scanned_html + profile_html
|
||||
+ scorecard_html + redundancy_html
|
||||
+ banner_shot_html
|
||||
+ providers_html + banner_deep_html
|
||||
+ cookie_audit_html
|
||||
+ tcf_authority_html
|
||||
|
||||
@@ -0,0 +1,44 @@
|
||||
"""
|
||||
P85 — Banner-Screenshot-Block in der Mail.
|
||||
|
||||
Embedded den von consent-tester captured Screenshot des Banners
|
||||
(banner_result.banner_screenshot_b64) als data-URI <img> in die Mail.
|
||||
"so sah euer Banner zum Audit-Zeitpunkt aus" — visueller Beweis fuer
|
||||
Dispute mit Marketing-Team oder DSB.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def build_banner_screenshot_html(banner_result: dict | None) -> str:
|
||||
if not isinstance(banner_result, dict):
|
||||
return ""
|
||||
b64 = banner_result.get("banner_screenshot_b64") or ""
|
||||
if not b64 or len(b64) < 200:
|
||||
return ""
|
||||
provider = banner_result.get("banner_provider") or "Generic"
|
||||
detected = banner_result.get("banner_detected")
|
||||
return (
|
||||
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
|
||||
'max-width:760px;margin:0 auto 16px;padding:12px 16px;'
|
||||
'background:#f8fafc;border:1px solid #cbd5e1;border-radius:8px">'
|
||||
'<div style="font-size:11px;color:#475569;text-transform:uppercase;'
|
||||
'letter-spacing:1.2px;margin-bottom:4px;font-weight:600">'
|
||||
'Screenshot des Cookie-Banners zum Audit-Zeitpunkt</div>'
|
||||
f'<h3 style="margin:0 0 6px;font-size:13px;color:#1e293b">'
|
||||
f'Provider: <strong>{provider}</strong> · '
|
||||
f'erkannt: <strong>{"ja" if detected else "nein"}</strong></h3>'
|
||||
'<p style="margin:0 0 8px;font-size:11px;color:#64748b;line-height:1.5">'
|
||||
'Visueller Beweis wie das Banner zum Zeitpunkt des Audits angezeigt '
|
||||
'wurde. Bei spaeterer Aenderung des Banners bitte mit diesem '
|
||||
'Screenshot abgleichen.'
|
||||
'</p>'
|
||||
f'<img src="data:image/png;base64,{b64}" alt="Cookie-Banner" '
|
||||
f'style="max-width:100%;height:auto;border:1px solid #cbd5e1;'
|
||||
f'border-radius:4px;display:block">'
|
||||
'</div>'
|
||||
)
|
||||
@@ -0,0 +1,90 @@
|
||||
"""
|
||||
P70 — RAG-Provenance-Marker.
|
||||
|
||||
Wenn ein Finding aus dem RAG-Korpus belegt ist (z.B. Art-Match auf
|
||||
einen konkreten Gesetzes-Paragrafen aus dem ingestierten DSGVO/TDDDG/
|
||||
TMG-Korpus), bekommt es einen ✓-Marker. Wenn es nur aus unserer
|
||||
Heuristik kommt (Pattern-Match ohne RAG-Belegung), bekommt es ein ⚠
|
||||
"Heuristik".
|
||||
|
||||
Dadurch sieht der Nutzer sofort welche Aussagen rechtlich verbindlich
|
||||
gestuetzt sind vs welche unsere Eigeninterpretation sind.
|
||||
|
||||
Generisch: dataclass-aehnliche Funktion die ein Finding-dict klassifiziert.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import re
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# Pattern fuer "Belegt aus Korpus": Finding enthaelt expliziten
|
||||
# Norm-Bezug mit Artikel + Quelle.
|
||||
_NORM_RE = re.compile(
|
||||
r"(Art\.?\s*\d+(?:\s*Abs\.?\s*\d+)?(?:\s*lit\.?\s*[a-z])?\s*"
|
||||
r"(?:DSGVO|GDPR|TDDDG|TMG|BDSG|UWG|TKG|EuGH|EDPB)|"
|
||||
r"\(?(EU|VO)\s*\d{4}/\d+\)?|"
|
||||
r"§\s*\d+[a-z]?\s*(TMG|UWG|BDSG|TKG|TDDDG))",
|
||||
re.I,
|
||||
)
|
||||
|
||||
|
||||
def classify_finding_provenance(finding: dict) -> str:
|
||||
"""Returns 'rag', 'heuristic', or 'mixed'.
|
||||
|
||||
rag — Norm-Bezug + Quellen-URL (verbindlich)
|
||||
heuristic — Pattern-Match ohne Norm-Bezug (Eigeninterpretation)
|
||||
mixed — Norm-Bezug aber ohne Quellen-URL (teilweise belegbar)
|
||||
"""
|
||||
if not isinstance(finding, dict):
|
||||
return "heuristic"
|
||||
legal = (finding.get("legal_basis") or "").strip()
|
||||
detail = (finding.get("detail") or "").strip()
|
||||
rag_id = finding.get("rag_chunk_id")
|
||||
rag_url = finding.get("rag_source_url")
|
||||
blob = " ".join([legal, detail])
|
||||
has_norm = bool(_NORM_RE.search(blob))
|
||||
has_source = bool(rag_id or rag_url or
|
||||
"https://" in legal or "https://" in detail)
|
||||
if has_norm and has_source:
|
||||
return "rag"
|
||||
if has_norm:
|
||||
return "mixed"
|
||||
return "heuristic"
|
||||
|
||||
|
||||
def provenance_badge_html(provenance: str) -> str:
|
||||
if provenance == "rag":
|
||||
return (
|
||||
'<span style="background:#dcfce7;color:#166534;'
|
||||
'padding:1px 5px;border-radius:8px;font-size:9px;'
|
||||
'font-weight:600;margin-left:4px" '
|
||||
'title="Aussage durch RAG-Korpus belegt (Gesetzestext + Quelle)">'
|
||||
'✓ RAG</span>'
|
||||
)
|
||||
if provenance == "mixed":
|
||||
return (
|
||||
'<span style="background:#dbeafe;color:#1e40af;'
|
||||
'padding:1px 5px;border-radius:8px;font-size:9px;'
|
||||
'font-weight:600;margin-left:4px" '
|
||||
'title="Norm-Bezug ohne direkte Quellen-URL">'
|
||||
'NORM</span>'
|
||||
)
|
||||
return (
|
||||
'<span style="background:#f1f5f9;color:#475569;'
|
||||
'padding:1px 5px;border-radius:8px;font-size:9px;'
|
||||
'font-weight:600;margin-left:4px" '
|
||||
'title="Heuristik / Eigeninterpretation ohne Korpus-Beleg">'
|
||||
'⚠ HEURISTIK</span>'
|
||||
)
|
||||
|
||||
|
||||
def annotate_findings(findings: list[dict]) -> list[dict]:
|
||||
"""In-place: setzt finding['provenance'] auf jeden Eintrag."""
|
||||
for f in (findings or []):
|
||||
if isinstance(f, dict) and "provenance" not in f:
|
||||
f["provenance"] = classify_finding_provenance(f)
|
||||
return findings
|
||||
@@ -81,6 +81,12 @@ async def fetch_and_ingest_tcf_vendors(db: Session) -> dict:
|
||||
if not vendors:
|
||||
return {"error": "no vendors in TCF response", "n_vendors": 0}
|
||||
|
||||
# Erst alte TCF-Eintraege weg (kein UNIQUE-Index auf cookie_name,
|
||||
# daher kein ON CONFLICT moeglich → idempotent via DELETE+INSERT).
|
||||
db.execute(sa_text(
|
||||
"DELETE FROM compliance.cookie_library WHERE source_name='iab_tcf_v2'"
|
||||
))
|
||||
db.commit()
|
||||
inserted = 0
|
||||
skipped = 0
|
||||
for vid, v in vendors.items():
|
||||
@@ -106,13 +112,6 @@ async def fetch_and_ingest_tcf_vendors(db: Session) -> dict:
|
||||
VALUES (:n, :dp, :v, :pu, :cat, :purp, 'iab_tcf_v2',
|
||||
'https://vendor-list.consensu.org/v3/vendor-list.json',
|
||||
0.99)
|
||||
ON CONFLICT (cookie_name) DO UPDATE
|
||||
SET actual_category = EXCLUDED.actual_category,
|
||||
vendor_name = EXCLUDED.vendor_name,
|
||||
vendor_privacy_url = EXCLUDED.vendor_privacy_url,
|
||||
purpose_en = EXCLUDED.purpose_en,
|
||||
source_name = EXCLUDED.source_name,
|
||||
confidence = EXCLUDED.confidence
|
||||
"""
|
||||
), {"n": marker, "dp": "*",
|
||||
"v": f"[TCF-{vid}] {name}",
|
||||
|
||||
@@ -0,0 +1,51 @@
|
||||
{
|
||||
"site": "Volkswagen Deutschland",
|
||||
"site_url": "https://www.volkswagen.de",
|
||||
"captured_at": "2026-05-22T00:00:00Z",
|
||||
"source": "User-Copy aus Cookie-Richtlinie (Browser Strg+A → Strg+C)",
|
||||
"cookie_richtlinie_url": "https://www.volkswagen.de/de/mehr/rechtliches/cookie-richtlinie.html",
|
||||
"expectations": {
|
||||
"min_declared_cookies": 90,
|
||||
"expected_unique_vendors_after_dedup": 18,
|
||||
"must_find_cookies": [
|
||||
"VWD6_ENSIGHTEN_PRIVACY_MODAL_LOADED",
|
||||
"VWD6_ENSIGHTEN_PRIVACY_MODAL_VIEWED",
|
||||
"smartSignals2UiD", "smartSignals2sUiD",
|
||||
"s_ecid", "s_cc", "s_sq",
|
||||
"AMCV_", "AMCVS_", "demdex", "dextp",
|
||||
"mbox", "mboxEdgeCluster",
|
||||
"TDID", "TDCPM", "TTDOptOut",
|
||||
"DSID", "ANID", "AID", "IDE", "TAID",
|
||||
"_gcl_au", "_gcl_dc", "_fbc", "_fbp", "fr",
|
||||
"_pk_uid",
|
||||
"OptanonConsent",
|
||||
"everest_g_v2", "everest_session_v2",
|
||||
"adbCDP",
|
||||
"liveagent_sid", "liveagent_chatted",
|
||||
"X-Salesforce-eLB", "sfdc-stream",
|
||||
"__cfduid", "__cflb",
|
||||
"FPAU", "FPGCLDC", "FLC", "APC",
|
||||
"wlfeDoLogin", "wlfeRefreshSessionId", "LBCOOKIE",
|
||||
"CookieConsentPolicy",
|
||||
"BrowserId", "BrowserId_sec",
|
||||
"inbenta-km-session-id"
|
||||
],
|
||||
"expected_vendors_present": [
|
||||
"Google",
|
||||
"Adobe Experience Cloud",
|
||||
"Adobe Analytics",
|
||||
"The Trade Desk",
|
||||
"AdForm",
|
||||
"Meta / Facebook",
|
||||
"Salesforce",
|
||||
"Cloudflare",
|
||||
"Borlabs"
|
||||
],
|
||||
"expected_high_findings_minimum": 1,
|
||||
"banner_must_be_detected": true,
|
||||
"expected_doc_types_with_text": [
|
||||
"dse", "cookie", "impressum", "nutzungsbedingungen"
|
||||
]
|
||||
},
|
||||
"raw_paste": "Name des Cookies\nKategorie\nVerwendungszweck\nSpeicherdauer\nArt des Cookies\nSee tests/fixtures/cookie_gt/vw_cookie_richtlinie.txt for the abbreviated raw form."
|
||||
}
|
||||
@@ -53,6 +53,7 @@ class ScanResponse(BaseModel):
|
||||
cmp_payloads: list[dict] = [] # P48: raw CMP JSON-payloads (Usercentrics/OneTrust/...) captured during scan
|
||||
vendor_details: list[dict] = [] # P50: per-vendor detail-modal-extracts (Beschreibung/Cookies/Opt-Out/Privacy)
|
||||
cookies_detailed: list[dict] = [] # P59b: full cookie details for behavior-validation (name,value,domain,expires,phase,declared_category)
|
||||
banner_screenshot_b64: str = "" # P85: base64-PNG des Banners (initial-view)
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
@@ -133,6 +134,7 @@ async def scan_consent(req: ScanRequest):
|
||||
cmp_payloads=result.cmp_payloads, # P48
|
||||
vendor_details=result.vendor_details, # P50
|
||||
cookies_detailed=result.cookies_detailed, # P59b
|
||||
banner_screenshot_b64=result.banner_screenshot_b64, # P85
|
||||
)
|
||||
|
||||
|
||||
|
||||
@@ -77,6 +77,10 @@ class ConsentTestResult:
|
||||
# for behavior-validation in backend. Implicit declared_category:
|
||||
# before/reject phase = essential (site claims), accept = any.
|
||||
cookies_detailed: list = field(default_factory=list)
|
||||
# P85: base64-PNG-Screenshot des Banners vor dem ersten Klick.
|
||||
# Backend embedded das als <img> in der Mail — visueller Beweis
|
||||
# "so sah das Banner zum Audit-Zeitpunkt aus".
|
||||
banner_screenshot_b64: str = ""
|
||||
|
||||
|
||||
async def run_consent_test(
|
||||
@@ -196,6 +200,17 @@ async def run_consent_test(
|
||||
result.banner_text_violations = banner_violations["violations"]
|
||||
result.banner_has_impressum_link = banner_violations["has_impressum"]
|
||||
result.banner_has_dse_link = banner_violations["has_dse"]
|
||||
# P85 — visueller Beweis fuer die Mail.
|
||||
try:
|
||||
import base64 as _b64
|
||||
png = await page_a.screenshot(
|
||||
full_page=False, type="png", timeout=10000,
|
||||
)
|
||||
if png and len(png) < 1_500_000: # < 1.5 MB
|
||||
result.banner_screenshot_b64 = _b64.b64encode(png).decode("ascii")
|
||||
logger.info("P85: banner screenshot captured (%d bytes)", len(png))
|
||||
except Exception as _se:
|
||||
logger.warning("P85: banner screenshot failed: %s", _se)
|
||||
|
||||
await ctx_a.close()
|
||||
|
||||
|
||||
Executable
+49
@@ -0,0 +1,49 @@
|
||||
#!/usr/bin/env bash
|
||||
# P83 — verhindert "alter Code im Container"-Bug.
|
||||
#
|
||||
# Vergleicht den im Container deployten git-SHA mit dem aktuellen
|
||||
# Source-SHA. Wenn abweichend → exit 1 mit Hinweis Build/Recreate.
|
||||
#
|
||||
# Aufruf-Beispiele:
|
||||
# ./scripts/check-rebuild-needed.sh backend-compliance
|
||||
# ./scripts/check-rebuild-needed.sh admin-compliance
|
||||
# ./scripts/check-rebuild-needed.sh consent-tester
|
||||
#
|
||||
# CI-Verwendung: nach git push, vor dem ersten Health-Check.
|
||||
# Lokal: claude / dev kann es via pre-merge-hook nutzen.
|
||||
#
|
||||
# Voraussetzung: Container hat BUILD_SHA env (gesetzt im Dockerfile via
|
||||
# ARG BUILD_SHA + ENV BUILD_SHA=$BUILD_SHA). Falls leer → Warnung.
|
||||
|
||||
set -e
|
||||
|
||||
SERVICE="${1:-backend-compliance}"
|
||||
CONTAINER="bp-compliance-${SERVICE#*-}" # backend-compliance → bp-compliance-backend
|
||||
if [[ "$SERVICE" == "consent-tester" ]]; then
|
||||
CONTAINER="bp-compliance-consent-tester"
|
||||
fi
|
||||
|
||||
DOCKER="${DOCKER:-/usr/local/bin/docker}"
|
||||
|
||||
deployed_sha=$($DOCKER exec "$CONTAINER" sh -c 'echo "${BUILD_SHA:-unknown}"' 2>/dev/null || echo "container-down")
|
||||
local_sha=$(git rev-parse --short HEAD)
|
||||
|
||||
if [[ "$deployed_sha" == "container-down" ]]; then
|
||||
echo "❌ Container $CONTAINER is not running"
|
||||
exit 2
|
||||
fi
|
||||
|
||||
if [[ "$deployed_sha" == "unknown" ]]; then
|
||||
echo "⚠️ $CONTAINER has no BUILD_SHA env — cannot verify."
|
||||
echo " Add to Dockerfile: ARG BUILD_SHA / ENV BUILD_SHA=\$BUILD_SHA"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if [[ "$deployed_sha" != "$local_sha"* && "$local_sha" != "$deployed_sha"* ]]; then
|
||||
echo "❌ $CONTAINER is on commit $deployed_sha, local is $local_sha"
|
||||
echo " REBUILD REQUIRED:"
|
||||
echo " docker compose build $SERVICE && docker compose up -d --no-deps --force-recreate $SERVICE"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✓ $CONTAINER ($deployed_sha) matches local ($local_sha)"
|
||||
Reference in New Issue
Block a user