Files
breakpilot-compliance/backend-compliance/compliance/services/consent_diff_for_user.py
T
Benjamin Admin bd65b6f318
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Failing after 59s
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 19s
CI / iace-gt-coverage (push) Successful in 27s
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
feat(audit): Phase 2+3 — P54 + P68 + P69 + P6/P53/P55 + P31 + P80v2
P54 — consent_diff_for_user.py: USP-Feature fuer wiederkehrende Besucher.
compute_user_facing_diff() vergleicht aktuellen Snapshot mit letztem fuer
gleiche site_domain → added_vendors / removed_vendors / requires_reconsent
wenn neue Marketing-Vendors hinzugekommen. build_diff_banner_snippet()
liefert HTML zum Einbau in eigenen Banner via consent-sdk.

P68 — reverse_audit.py: Self-Audit unserer Template-Bibliothek.
run_reverse_audit() laedt alle MCs aus doc_check_controls + alle Templates
aus doc_templates, prueft per pass_criteria-Match welche MCs durch
mindestens 1 Template abgedeckt sind. Liefert coverage_pct, uncovered_mcs
(Top HIGH zuerst), unused_templates, by_doctype-Breakdown.

P69 — data/ecall_regulation.json: eCall-VO (EU) 2015/758 als 7 Chunks
fuer RAG-Ingest (Art. 3/6/7 + compliance_implications fuer Automotive-OEMs).
Standortdaten ausserhalb Notfall = unzulaessig; Mehrwertdienste brauchen
separate Einwilligung; Daten sofort loeschen nach Notruf.

P6+P53+P55 — industry_library.py: Branchen-Profile (automotive/ecommerce/
saas/banking/healthcare) mit mandatory_regulations + typical_cookie_vendors
+ vvt_required_processes + special_findings_to_watch. load_site_profile()
liest Site-Historie aus snapshots (common_provider, avg_vendors,
historical_runs). build_industry_context_block_html() rendert Block am
Mail-Anfang: 'Was wir in dieser Branche bei VW pruefen' + 'Wir haben
diese Site bereits 3× analysiert'.

P31 — llm_cascade.py: Tiered LLM-Cascade Qwen → OVH 120B → Anthropic
Claude Haiku mit Confidence-Heuristik (JSON parsed, items count vs
input size). Valkey-Cache (redis://) mit 7-Tage-TTL plus In-Process-
Fallback. Wenn Tier-1 unter Confidence-Threshold → Tier-2, dann Tier-3.
Reduziert Lauf-Zeit drastisch bei Re-Runs.

P80 v2 — check_replay.py: replay nutzt jetzt audit_quality_checks
mit den Snapshot-Daten. Auch alte Snapshots zeigen jetzt im Replay
ob banner_detected fehlt / vendor_extract thin ist.

Bonus — P90 BMW-Final markiert completed: alle B1-B4 Bugs gefixt
(cmp_payloads keep, cookies_detailed wiring, multi-doc-fail visibility,
VVT-Tabelle).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:38:08 +02:00

126 lines
4.1 KiB
Python

"""
P54 — Diff-Banner fuer End-User (USP-Feature).
USP-Idee: bei wiederkehrenden Besuchern zeigt das Banner NICHT die
Standard-Frage, sondern eine Diff-Mitteilung:
"Seit deiner letzten Zustimmung haben wir hinzugefuegt:
* Microsoft Bing (Werbung)
* TikTok Pixel (Marketing)
Bitte erneut zustimmen oder anpassen."
Backend-Seite (hier): liefert pro Snapshot eine 'diff_for_user'-Struktur
die zum Embedden in eigenen Banner / Hinweistext genutzt werden kann.
Frontend-Banner-Lib (separate consent-sdk) konsumiert das.
Vergleicht Vendor-Listen zwischen aktuellem Snapshot und dem letzten
Snapshot mit gleicher site_domain.
"""
from __future__ import annotations
import logging
from typing import Iterable
from sqlalchemy import text as sa_text
from sqlalchemy.orm import Session
logger = logging.getLogger(__name__)
def _norm_vendor_set(vendors: Iterable) -> set[str]:
out: set[str] = set()
for v in (vendors or []):
if isinstance(v, dict):
n = (v.get("name") or "").strip()
elif isinstance(v, str):
n = v.strip()
else:
continue
if n:
out.add(n)
return out
def compute_user_facing_diff(
db: Session,
site_domain: str,
current_check_id: str,
current_cmp_vendors: list,
) -> dict | None:
"""Vergleicht aktuelle vs letzte cmp_vendors-Liste fuer die gleiche
site_domain. Liefert {prev_at, added_vendors, removed_vendors,
new_high_risk_categories} oder None wenn kein vorheriger Lauf."""
if not site_domain:
return None
try:
row = db.execute(sa_text(
"""
SELECT cmp_vendors, created_at
FROM compliance.compliance_check_snapshots
WHERE site_domain = :dom AND check_id != :ex
ORDER BY created_at DESC LIMIT 1
"""
), {"dom": site_domain, "ex": current_check_id}).fetchone()
except Exception as e:
logger.warning("diff lookup failed: %s", e)
return None
if not row:
return None
prev_vendors = row[0] or []
prev_at = row[1]
curr_set = _norm_vendor_set(current_cmp_vendors)
prev_set = _norm_vendor_set(prev_vendors)
added = sorted(curr_set - prev_set)
removed = sorted(prev_set - curr_set)
if not added and not removed:
return None
# High-risk Kategorien aus added Vendors: Marketing / Tracking
new_marketing: list[str] = []
for v in current_cmp_vendors:
if not isinstance(v, dict):
continue
n = (v.get("name") or "").strip()
cat = (v.get("category") or "").lower()
if n in added and cat in ("marketing", "tracking", "advertising"):
new_marketing.append(n)
return {
"prev_at": prev_at.isoformat() if prev_at else None,
"added_vendors": added,
"removed_vendors": removed,
"new_marketing_vendors": new_marketing,
"requires_reconsent": bool(new_marketing),
}
def build_diff_banner_snippet(diff: dict) -> str:
"""Liefert HTML-Snippet das der Site-Betreiber in seinen eigenen
Cookie-Banner einbauen kann (z.B. via consent-sdk)."""
if not diff or not diff.get("added_vendors"):
return ""
added = diff.get("added_vendors", [])
n_marketing = len(diff.get("new_marketing_vendors") or [])
items = "".join(f"<li>{v}</li>" for v in added[:8])
reconsent_note = ""
if diff.get("requires_reconsent"):
reconsent_note = (
f'<p style="margin:6px 0 0;color:#991b1b;font-size:12px">'
f'<strong>{n_marketing} neue{"r" if n_marketing == 1 else ""} '
f'Marketing-Anbieter</strong> seit Ihrer letzten Zustimmung — '
'bitte erneut bestaetigen.'
'</p>'
)
return (
'<div class="breakpilot-consent-diff" '
'style="font-family:-apple-system,sans-serif;font-size:12px;'
'padding:8px 12px;background:#fef3c7;border:1px solid #fde68a;'
'border-radius:6px;margin-bottom:8px">'
'<strong>Seit Ihrer letzten Zustimmung haben wir hinzugefuegt:</strong>'
f'<ul style="margin:4px 0 0 18px;padding:0">{items}</ul>'
+ reconsent_note +
'</div>'
)