feat(compliance-check): MC-Classification + Embedding + Vendor-Redundanz + Action-Recipes + Borlabs-Features
CI / nodejs-build (push) Successful in 2m47s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / nodejs-build (push) Successful in 2m47s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Massiv-Update auf Basis BMW-Test-Iterationen (v1→v9): Core Compliance-Check - Sonnet check_type Klassifikation: text/process/review fuer alle 1874 MCs in compliance.doc_check_controls (script + Sidecar /data/mc_classification.db). rag_document_checker filtert auf check_type='text' fuer doc_check. Plus fits_doc_type-Audit (v2) + ui_only-Audit fuer DSA/E-Commerce-MCs in falscher doc_type-Schublade. - scope_requires-Filter: biometric/ai_decision/child_targeting MCs werden per business_profile gefiltert (FRT skipped fuer BMW etc.). - Embedding-Match (BGE-M3) als Phase-3 nach Regex-Match: Per-doc_type-Threshold-Override (impressum 0.50, dse/cookie 0.60), Short-Field-Rescue (15-Wort-Chunks) fuer Pflichtfelder im Impressum. Title+check_question als Embedding-Input fuer mehr Kontext. - Cookie-Text-Routing: consent-tester gibt cmp_cookie_text aus dem CMP-Reconstruct zurueck, Backend bevorzugt das gegen DOM-Extraction wenn richer (BMW 1824 vs 600 Worte). Vendor-Redundanz + EU-Alternativen + Cost-Saving - vendor_redundancy.analyze() — funktionale Kategorisierung der CMP-Vendors, Detektion von Mehrfach-Anbietern pro Kategorie, EU-Alternative-Lookup (Matomo, IONOS, HERE, Friendly Captcha, Smart AdServer, ...). - vendor_cost_estimator: Tier-Inferenz aus Cookie-Footprint (Cookie-Anzahl + Premium-Feature-Cookies + Third-Party-Quote → starter/professional/ enterprise/premier). - Self-Service-Werbung (Google/Meta/Pinterest/...) = 0 Lizenz-Kosten (nur Media-Spend, separat). DSP-Plattformen behalten enge Range. - Tier-aware Saving-Range: bei Enterprise/Premier nutzen wir den oberen 40-100%-Band der Listpreise, nicht starter→premier. - Multi-Function-Tools (Matomo Pro, SAP CX, IONOS Cloud, Userlike, Smart AdServer, HERE Maps, Vimeo Pro, LamaPoll) — ein Tool ersetzt mehrere Kategorien gleichzeitig. Cookie-Wissens-DB + Funktionale Klassifikation - cookie_knowledge_db: 50 kuratierte Top-Cookies (Google/Meta/Adobe/MS/...) mit vendor, exact_purpose, data_collected, IAB-TCF-IDs, reid_risk, schrems_ii_status, EuGH-Urteile, EU-Alternative. - cookie_function_classifier: pro Cookie funktionale Rolle (tracking_id, ad_pixel, session_id, ab_test, csrf, ...) + blocking_impact. Country-Inferenz aus Rechtsform - cookie_link_validator: Country-Field wird aus Vendor-Name abgeleitet (A/S=DK, GmbH=DE, Inc=US, B.V.=NL, ...) plus Vendor-Lookup-Table. Reduziert false-positive no_country-Flags bei eindeutig-EU-Vendors (Adform DK, Pinterest IE). Action-Recipes + Doc-Anchor-Locator - finding_action_recipes: pro Finding-Typ (no_cookies_listed, no_country, broken_opt_out, "Auftragsverarbeiter erwaehnen", "Art. 22 Profiling", ...) eine strukturierte Anweisung mit what/why/fix_text/where/example. Zum 1:1-Einfuegen in Kunden-Dokumente. - doc_anchor_locator: Embedding-basiert (BGE-M3 cosine) — sucht den passenden Absatz im existierenden Kundendokument fuer jeden Finding. Per-Run Thread-Local-Cache. Fallback: keyword-Match. - Email-Rendering integriert Recipe + Anchor pro Doc-Pruefungs-Fail + Vendor-Flag-Liste mit aufklappbarer Action-Liste. - Score-Erklaerung pro Vendor-Zeile (3/5-Untertitel + Tooltip). Migration-Pipeline (Compliance-Check -> Customer Banner/Documents) - migration_to_banner.py: Vendor-Liste -> CookieBannerConfig mit 4 Kategorien + Review-Flags. - migration_to_document.py: Vendor-Liste -> Cookie-Policy + VVT-Register + Privacy-Policy-Pre-Fills. - agent_migration_routes: 3 Preview-Endpoints (banner-preview, document-preview, summary). Persistierung der cmp_vendors in /data/compliance_audits.db check_payloads-Tabelle. Borlabs-Parity Cookie-Banner-Features - Consent-Historie im Banner: window.bpShowConsentHistory() + localStorage. - Content-Blocker: cookie-banner-content-blocker.ts — YouTube/Maps/Video Placeholder bis Einwilligung. - Google Consent Mode v2 erweitert: wait_for_update + region=EEA/CH/GB. - Consent-Log Export (CSV/JSON) per einwilligungen_export_routes. Bug-Fixes - canonical_control_routes: _jsonish-Helper fuer string-typed jsonb, similar-controls-Endpoint mit _has_embedding_col()-Cache (kein 500 mehr). - Control-Library Frontend: defensive .map-Coercer in 2 Detail-Views. - Embedding-Service-Batching (32er Batches statt 165 in einem Call). - KeyError 'control_id' in MC-Result-Aggregation (defensive .get). - Master-Controls-Klick-Through von /sdk/master-controls auf /sdk/control-library?control=<id> mit URL-Param-Auto-Open. - Dockerfile: /data pre-chowned auf appuser (Audit-DB-Schreibrecht). - Cookie-Text-Routing-Bug (cmp_reconstructed > DOM-extraction). - doc_type-aware MC-Filter (statt all-text-MCs). - Master-Contract-Dedup (60 BMW-Internal-Eintraege = 1 Adobe-Vertrag). - A3-v2-Audit hat 24 UI-Sprache-MCs als 'process' reklassifiziert. Tests - test_migration_mappers.py (9 Tests) - test_migration_endpoints.py (4 Tests) Skripte (one-shot) - classify_mc_check_type.py (v1) + _v2 (PK=control_id,doc_type) - audit_mc_doctype_fit.py (v1 fits) + _v2 (ui_only + scope_requires) BMW-Run-Bilanz v1 (broken) -> v9 (alle Fixes): DSE 7,5% -> 81-83% Impressum 4% -> 100% (6 echte MCs alle erfuellt) Cookie 0% -> 79-83% (CMP-Text-Routing + Embedding) Plus: 10 Konsolidierungs-Kategorien, geschaetzte Saving 200k-3M / Jahr Plus: Action-Recipes + Doc-Anchors fuer jeden Fail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -33,6 +33,7 @@ _ROUTER_MODULES = [
|
||||
"vvt_routes",
|
||||
"legal_document_routes",
|
||||
"einwilligungen_routes",
|
||||
"einwilligungen_export_routes",
|
||||
"escalation_routes",
|
||||
"consent_template_routes",
|
||||
"notfallplan_routes",
|
||||
|
||||
@@ -159,6 +159,13 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
from .agent_doc_check_routes import CheckItem, DocCheckResult
|
||||
from .agent_doc_check_report import build_html_report
|
||||
|
||||
# Reset anchor-locator cache per run (avoid cross-run leak)
|
||||
try:
|
||||
from compliance.services.doc_anchor_locator import reset_cache
|
||||
reset_cache()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Step 1: Resolve texts (fetch from URL if needed) — 0-30%
|
||||
_update(check_id, "Texte werden geladen...", 1)
|
||||
doc_texts: dict[str, str] = {}
|
||||
@@ -234,6 +241,20 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
# Filter out doc_types that don't apply to this business profile
|
||||
skip_types = _get_skip_types(profile)
|
||||
|
||||
# Derive business_scope hints for the MC filter (O1 — Doc-type Scope-Flag).
|
||||
# MCs that explicitly require a feature (e.g. 'biometric_processing',
|
||||
# 'ai_decision_making', 'child_targeting') get dropped when the
|
||||
# detected profile doesn't declare it.
|
||||
business_scope: set[str] = set()
|
||||
for svc in (getattr(profile, "detected_services", []) or []):
|
||||
business_scope.add(str(svc).lower())
|
||||
if (getattr(profile, "business_type", "") or "").lower() == "b2c":
|
||||
business_scope.add("b2c")
|
||||
if getattr(profile, "has_online_shop", False):
|
||||
business_scope.add("ecommerce")
|
||||
if getattr(profile, "is_regulated_profession", False):
|
||||
business_scope.add("regulated_profession")
|
||||
|
||||
# Document checks: 40-80%
|
||||
n_entries = max(1, len(doc_entries))
|
||||
for i, entry in enumerate(doc_entries):
|
||||
@@ -268,6 +289,7 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
result = await _check_single(
|
||||
text, doc_type, label, url,
|
||||
entry["word_count"], use_agent_flag,
|
||||
business_scope=business_scope,
|
||||
)
|
||||
|
||||
# Apply profile context filter
|
||||
@@ -421,9 +443,42 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
len(cmp_vendors))
|
||||
cmp_vendors = await validate_vendor_urls(cmp_vendors)
|
||||
cmp_vendors = score_vendors(cmp_vendors)
|
||||
# Enrich each vendor with per-cookie functional roles
|
||||
try:
|
||||
from compliance.services.cookie_function_classifier import (
|
||||
annotate_vendor_cookies,
|
||||
)
|
||||
cmp_vendors = [annotate_vendor_cookies(v) for v in cmp_vendors]
|
||||
except Exception as e:
|
||||
logger.warning("Cookie function classification skipped: %s", e)
|
||||
except Exception as e:
|
||||
logger.warning("VVT vendor extraction skipped: %s", e)
|
||||
|
||||
# Vendor-Redundanz + EU-Alternativen + Cost/Savings (O4)
|
||||
redundancy_report = None
|
||||
try:
|
||||
from compliance.services.vendor_redundancy import analyze as analyze_redundancy
|
||||
from compliance.services.vendor_cost_estimator import infer_company_tier
|
||||
if cmp_vendors:
|
||||
# Company-Tier aus business_profile ableiten — beeinflusst die
|
||||
# Cost-Range so dass z.B. fuer DAX-Konzerne nicht starter-Preise
|
||||
# die untere Schranke duruecken.
|
||||
bp_dict = {
|
||||
"type": getattr(profile, "business_type", ""),
|
||||
"features": list(business_scope),
|
||||
}
|
||||
ctier = infer_company_tier(bp_dict)
|
||||
redundancy_report = analyze_redundancy(cmp_vendors, company_tier=ctier)
|
||||
logger.info(
|
||||
"Redundanz: %d Kategorien mit Mehrfach-Anbietern, "
|
||||
"Spar-Schaetzung %s pro Jahr (company_tier=%s)",
|
||||
redundancy_report["summary"]["redundancy_count"],
|
||||
redundancy_report["summary"]["estimated_saving_pct"],
|
||||
ctier,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning("Vendor redundancy analysis skipped: %s", e)
|
||||
|
||||
summary_html = build_management_summary(results)
|
||||
scanned_html = build_scanned_urls_html(doc_entries)
|
||||
providers_html = build_provider_list_html(banner_result, vvt_entries)
|
||||
@@ -468,11 +523,18 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
if scorecard else ""
|
||||
)
|
||||
|
||||
report_html = build_html_report(results, None)
|
||||
report_html = build_html_report(results, None, doc_texts)
|
||||
profile_html = _build_profile_html(profile)
|
||||
|
||||
# O4: Vendor-Redundanz / EU-Alternativen + Cost-Savings-Block —
|
||||
# zwischen VVT und Doc-Report einsortiert, damit Geschaeftsfuehrung
|
||||
# die Einsparung sieht bevor sie in die Detail-Pruefung geht.
|
||||
from .agent_doc_check_redundancy import build_redundancy_html
|
||||
redundancy_html = build_redundancy_html(redundancy_report)
|
||||
|
||||
full_html = (
|
||||
summary_html + scanned_html + profile_html + scorecard_html
|
||||
+ providers_html + vvt_html + report_html
|
||||
+ providers_html + vvt_html + redundancy_html + report_html
|
||||
)
|
||||
|
||||
# Step 6: Send email — derive site name primarily from entered URL.
|
||||
@@ -602,6 +664,7 @@ async def _fetch_text(url: str, doc_type: str = "") -> tuple[str, list[dict]]:
|
||||
payload = resp.json()
|
||||
docs = payload.get("documents", [])
|
||||
cmp_payloads = payload.get("cmp_payloads") or []
|
||||
cmp_cookie_text = payload.get("cmp_cookie_text") or ""
|
||||
if docs:
|
||||
texts = []
|
||||
for doc in docs:
|
||||
@@ -609,6 +672,22 @@ async def _fetch_text(url: str, doc_type: str = "") -> tuple[str, list[dict]]:
|
||||
if t and len(t) > 50:
|
||||
texts.append(t)
|
||||
merged = "\n\n".join(texts)
|
||||
# For cookie/dse/social_media: when CMP reconstruction is
|
||||
# substantially richer than DOM extraction, use it. This
|
||||
# fixes the BMW case where DOM yields ~600 words of
|
||||
# navigation but the ePaaS payload reconstructs to ~1800
|
||||
# words of actual cookie policy.
|
||||
if (doc_type in short_extract_types
|
||||
and cmp_cookie_text
|
||||
and len(cmp_cookie_text.split()) > len(merged.split())):
|
||||
logger.info(
|
||||
"Preferring CMP-reconstructed text for %s on %s "
|
||||
"(%d words CMP vs %d words DOM)",
|
||||
doc_type, url,
|
||||
len(cmp_cookie_text.split()),
|
||||
len(merged.split()),
|
||||
)
|
||||
merged = cmp_cookie_text
|
||||
if merged and len(merged.split()) > 100:
|
||||
if len(texts) > 1:
|
||||
logger.info("Merged %d docs from %s (%d words)",
|
||||
@@ -727,6 +806,7 @@ async def _autodiscover_missing(
|
||||
|
||||
discovered: list[dict] = []
|
||||
disc_payloads: list[dict] = []
|
||||
disc_cookie_texts: list[str] = []
|
||||
for base in crawl_bases:
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=180.0) as client:
|
||||
@@ -742,8 +822,14 @@ async def _autodiscover_missing(
|
||||
body = resp.json()
|
||||
discovered.extend(body.get("documents", []) or [])
|
||||
disc_payloads.extend(body.get("cmp_payloads") or [])
|
||||
logger.info("auto-discovery on %s: %d docs",
|
||||
base, len(body.get("documents", []) or []))
|
||||
cmp_text = body.get("cmp_cookie_text") or ""
|
||||
if cmp_text:
|
||||
disc_cookie_texts.append(cmp_text)
|
||||
logger.info("auto-discovery on %s: %d docs, %d CMP payloads, "
|
||||
"cmp_cookie_text=%d words", base,
|
||||
len(body.get("documents", []) or []),
|
||||
len(body.get("cmp_payloads") or []),
|
||||
len(cmp_text.split()))
|
||||
except Exception as e:
|
||||
logger.warning("auto-discovery failed for %s: %s", base, e)
|
||||
|
||||
@@ -772,6 +858,19 @@ async def _autodiscover_missing(
|
||||
d = by_type.get(dt)
|
||||
if d:
|
||||
full = d.get("full_text") or d.get("text_preview") or ""
|
||||
# For cookie: prefer the CMP-reconstructed text when it's
|
||||
# substantially richer than the auto-discovered DOM extraction.
|
||||
# BMW homepage CMP yields ~1800 words of authoritative policy;
|
||||
# DOM extraction typically yields ~600 words of site chrome.
|
||||
if dt == "cookie" and disc_cookie_texts:
|
||||
cmp_merged = "\n\n".join(disc_cookie_texts)
|
||||
if len(cmp_merged.split()) > len(full.split()):
|
||||
logger.info(
|
||||
"cookie: using CMP-reconstructed text (%d words) "
|
||||
"instead of DOM (%d words)",
|
||||
len(cmp_merged.split()), len(full.split()),
|
||||
)
|
||||
full = cmp_merged
|
||||
if len(full.split()) >= 100:
|
||||
new_entry["text"] = full
|
||||
new_entry["url"] = d.get("url", "")
|
||||
@@ -829,6 +928,7 @@ def _classify_discovered_doc(title: str, url: str) -> str | None:
|
||||
async def _check_single(
|
||||
text: str, doc_type: str, label: str, url: str,
|
||||
word_count: int, use_agent: bool,
|
||||
business_scope: set[str] | None = None,
|
||||
):
|
||||
"""Run regex + MC checks on a single document."""
|
||||
from compliance.services.doc_checks.runner import check_document_completeness
|
||||
@@ -862,6 +962,7 @@ async def _check_single(
|
||||
# (top-10 FAILs) so cost stays bounded.
|
||||
mc_results = await check_document_with_controls(
|
||||
text, doc_type, label, max_controls=0, use_agent=use_agent,
|
||||
business_scope=business_scope,
|
||||
)
|
||||
if mc_results:
|
||||
for mc in mc_results:
|
||||
|
||||
@@ -374,11 +374,52 @@ def _render_vendor_row_full(v: dict) -> str:
|
||||
)
|
||||
score_color = ("#16a34a" if score >= 80 else
|
||||
"#d97706" if score >= 50 else "#dc2626")
|
||||
|
||||
# Score-Erklaerung: was wurde gewertet, was fehlt
|
||||
# Annahme: Score = bestandene Kriterien / Gesamtkriterien * 100.
|
||||
# Typisch 5 Kriterien fuer EXT: country, cookies, opt_out, privacy, scoring.
|
||||
# Bei INTERNAL/GROUP: opt_out + privacy nicht gewertet (3 Kriterien).
|
||||
n_criteria = 3 if is_own else 5
|
||||
n_failed = len(flags) if flags else 0
|
||||
score_tooltip = (
|
||||
f"{n_criteria - n_failed} von {n_criteria} Kriterien erfuellt"
|
||||
+ (f" — fehlt: {', '.join(_flag_short(f) for f in flags[:3])}"
|
||||
if flags else "")
|
||||
)
|
||||
|
||||
# Inline-Aktions-Anweisungen pro Flag
|
||||
actions_html = ""
|
||||
if flags:
|
||||
from compliance.services.finding_action_recipes import recipe_for
|
||||
action_items = []
|
||||
for f in flags:
|
||||
rec = recipe_for(f)
|
||||
if not rec:
|
||||
continue
|
||||
action_items.append(
|
||||
f'<li style="margin-bottom:6px"><strong>{_flag_short(f)}:</strong> '
|
||||
f'{rec.get("what", "")}<br/>'
|
||||
f'<span style="color:#475569"><strong>Was tun:</strong> '
|
||||
f'{rec.get("fix_text", "").splitlines()[0][:200]}</span><br/>'
|
||||
f'<span style="color:#94a3b8;font-size:9px">Quelle: '
|
||||
f'{rec.get("why", "")[:160]}</span></li>'
|
||||
)
|
||||
if action_items:
|
||||
actions_html = (
|
||||
f'<details style="margin-top:4px"><summary style="cursor:pointer;'
|
||||
f'color:#dc2626;font-size:10px">Was muss ich tun? '
|
||||
f'({len(action_items)} Action{"s" if len(action_items) != 1 else ""})</summary>'
|
||||
f'<ul style="margin:4px 0 0 14px;padding:0;font-size:10px;color:#1e293b">'
|
||||
+ "".join(action_items)
|
||||
+ '</ul></details>'
|
||||
)
|
||||
|
||||
flag_str = ""
|
||||
if flags:
|
||||
flag_str = (
|
||||
f'<div style="font-size:10px;color:#94a3b8;margin-top:2px">'
|
||||
f'{", ".join(flags[:4])}</div>'
|
||||
f'{actions_html}'
|
||||
)
|
||||
return (
|
||||
f'<tr style="border-top:1px solid #e2e8f0">'
|
||||
@@ -391,11 +432,26 @@ def _render_vendor_row_full(v: dict) -> str:
|
||||
f'<td style="padding:6px 8px;text-align:center">{opt_status}</td>'
|
||||
f'<td style="padding:6px 8px;text-align:center">{privacy_status}</td>'
|
||||
f'<td style="padding:6px 8px;text-align:right;font-weight:600;'
|
||||
f'color:{score_color};font-size:11px">{score}%</td>'
|
||||
f'color:{score_color};font-size:11px" title="{score_tooltip}">'
|
||||
f'{score}%<div style="font-size:9px;font-weight:400;color:#94a3b8">'
|
||||
f'{n_criteria - n_failed}/{n_criteria}</div></td>'
|
||||
f'</tr>'
|
||||
)
|
||||
|
||||
|
||||
def _flag_short(f: str) -> str:
|
||||
"""Lesbare deutsche Form fuer einen Flag-Token."""
|
||||
labels = {
|
||||
"no_cookies_listed": "Cookies fehlen",
|
||||
"no_country": "Sitzland fehlt",
|
||||
"no_privacy_url": "Privacy-Link fehlt",
|
||||
"broken_privacy_url": "Privacy-Link broken",
|
||||
"no_opt_out_url": "Opt-Out fehlt",
|
||||
"broken_opt_out": "Opt-Out broken",
|
||||
}
|
||||
return labels.get(f, f)
|
||||
|
||||
|
||||
def _link_status_badge(
|
||||
url: str | None,
|
||||
ok: bool | None,
|
||||
|
||||
@@ -0,0 +1,141 @@
|
||||
"""
|
||||
Email-Renderer fuer den Vendor-Redundanz + EU-Alternativen + Cost-/Savings-Block.
|
||||
|
||||
Wird im Email-Body unter dem VVT eingebaut.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
|
||||
def _fmt_eur(low: int, high: int) -> str:
|
||||
if not low and not high:
|
||||
return "im Listpreis bundled"
|
||||
if low == high:
|
||||
return f"~{low:,} €".replace(",", ".")
|
||||
return f"{low:,}–{high:,} €".replace(",", ".")
|
||||
|
||||
|
||||
def build_redundancy_html(report: dict | None) -> str:
|
||||
if not report:
|
||||
return ""
|
||||
s = report.get("summary") or {}
|
||||
redundancies = report.get("redundancies") or []
|
||||
eu_alts = report.get("eu_alternatives") or []
|
||||
multi = report.get("multi_function_tools") or []
|
||||
|
||||
cur = s.get("estimated_current_year_eur") or [0, 0]
|
||||
sav = s.get("estimated_saving_year_eur") or [0, 0]
|
||||
pct = s.get("estimated_saving_pct") or "n/a"
|
||||
|
||||
parts = [
|
||||
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
|
||||
'max-width:700px;margin:0 auto 16px;padding:14px 18px;'
|
||||
'background:#fef3c7;border:1px solid #fcd34d;border-radius:8px">',
|
||||
'<h3 style="margin:0 0 6px;font-size:14px;color:#92400e">'
|
||||
'Optimierungspotenzial: Redundanzen + EU-Alternativen</h3>',
|
||||
f'<p style="margin:0 0 10px;font-size:11px;color:#78350f">'
|
||||
f'<strong>{s.get("redundancy_count", 0)}</strong> Kategorien mit '
|
||||
f'mehreren Anbietern · <strong>{s.get("consolidation_potential", 0)}</strong> '
|
||||
f'Anbieter konsolidierbar · '
|
||||
f'<strong>{s.get("eu_alternative_count", 0)}</strong> EU-Alternativen verfuegbar</p>',
|
||||
|
||||
'<div style="background:#fff;border:1px solid #fcd34d;border-radius:6px;'
|
||||
'padding:10px 12px;margin-bottom:10px">',
|
||||
|
||||
'<div style="font-size:10px;color:#94a3b8;margin-bottom:6px;text-transform:uppercase;letter-spacing:0.5px">'
|
||||
'Diese Schaetzung umfasst NUR die als redundant erkannten Tools — '
|
||||
'nicht den Gesamt-Stack der Website</div>',
|
||||
|
||||
f'<div style="font-size:11px;color:#78350f">'
|
||||
f'Listpreis-Schaetzung der <strong>redundanten</strong> Tools '
|
||||
f'(Mehrfach-Anbieter in derselben Funktions-Kategorie):'
|
||||
f' <strong>{_fmt_eur(*cur)}/Jahr</strong></div>',
|
||||
|
||||
f'<div style="font-size:11px;color:#16a34a;margin-top:4px">'
|
||||
f'Sparpotenzial durch Konsolidierung auf je 1 EU-Tool pro Kategorie:'
|
||||
f' <strong>{_fmt_eur(*sav)}/Jahr</strong> ({pct})</div>',
|
||||
|
||||
'<div style="font-size:10px;color:#94a3b8;margin-top:8px;font-style:italic">'
|
||||
'<strong>Wichtige Einschraenkungen:</strong><br/>'
|
||||
'• Konzern-Konditionen liegen ueblicherweise 30–50% unter Listpreis — '
|
||||
'realistisches Saving entsprechend €X·0,5 bis €X·0,7.<br/>'
|
||||
'• Eintraege "<em>Eigene Marke — Tool</em>" (z.B. "BMW AG — Adobe Analytics") '
|
||||
'gehoeren oft zu einem einzigen Master-Vertrag, nicht zu mehreren Lizenzen.<br/>'
|
||||
'• Media-Spend (Google Ads, Meta Ads) ist NICHT enthalten — nur Tooling-Lizenzen.<br/>'
|
||||
'• Quelle: Gartner/Forrester 2025 + oeffentliche Listpreise.'
|
||||
'</div></div>',
|
||||
]
|
||||
|
||||
if redundancies:
|
||||
parts.append(
|
||||
'<table style="width:100%;border-collapse:collapse;font-size:11px;'
|
||||
'margin-bottom:10px">'
|
||||
'<thead><tr style="background:#fde68a;color:#78350f;text-align:left">'
|
||||
'<th style="padding:6px 8px">Kategorie</th>'
|
||||
'<th style="padding:6px 8px">#</th>'
|
||||
'<th style="padding:6px 8px">Anbieter</th>'
|
||||
'<th style="padding:6px 8px">EU-Empfehlung</th>'
|
||||
'<th style="padding:6px 8px;text-align:right">Saving / Jahr</th>'
|
||||
'</tr></thead><tbody>'
|
||||
)
|
||||
for r in redundancies[:12]:
|
||||
vendors_str = ", ".join(r.get("vendors", [])[:6])
|
||||
if len(r.get("vendors", [])) > 6:
|
||||
vendors_str += f" (+{len(r['vendors']) - 6} weitere)"
|
||||
sav_r = r.get("estimated_saving_year_eur") or [0, 0]
|
||||
parts.append(
|
||||
f'<tr style="border-top:1px solid #fde68a;vertical-align:top">'
|
||||
f'<td style="padding:5px 8px;color:#78350f;font-weight:600">{r["category_label"]}</td>'
|
||||
f'<td style="padding:5px 8px;text-align:center">{r["count"]}</td>'
|
||||
f'<td style="padding:5px 8px;color:#1e293b;font-size:10px">{vendors_str}</td>'
|
||||
f'<td style="padding:5px 8px;color:#16a34a;font-size:10px">{r.get("suggested_eu_tool") or "–"}</td>'
|
||||
f'<td style="padding:5px 8px;text-align:right;color:#16a34a;font-weight:600">'
|
||||
f'{_fmt_eur(*sav_r)}</td></tr>'
|
||||
)
|
||||
hint = r.get("consolidation_hint")
|
||||
if hint:
|
||||
parts.append(
|
||||
f'<tr><td colspan="5" style="padding:0 8px 8px;color:#94a3b8;font-size:10px;font-style:italic">'
|
||||
f'Hinweis: {hint}</td></tr>'
|
||||
)
|
||||
caveats = r.get("caveats") or []
|
||||
if caveats:
|
||||
parts.append(
|
||||
f'<tr><td colspan="5" style="padding:0 8px 8px;color:#94a3b8;font-size:10px">'
|
||||
f'<strong>Moegliche Gruende fuer Mehrfach-Einsatz:</strong> '
|
||||
+ "; ".join(caveats) + '</td></tr>'
|
||||
)
|
||||
parts.append('</tbody></table>')
|
||||
|
||||
if multi:
|
||||
parts.append(
|
||||
'<div style="margin-top:8px"><strong style="font-size:11px;color:#78350f">'
|
||||
'Multi-Funktions-Tools (1 Tool ersetzt mehrere Kategorien):</strong>'
|
||||
'<ul style="margin:6px 0 0 18px;padding:0;font-size:11px;color:#78350f">'
|
||||
)
|
||||
for t in multi[:4]:
|
||||
cats = ", ".join(t.get("replaces_categories", []))
|
||||
parts.append(
|
||||
f'<li style="margin-bottom:3px"><strong>{t["name"]}</strong>'
|
||||
f' ({t["country"]}) — ersetzt <em>{cats}</em>'
|
||||
f' ({t.get("potential_replacements", 0)} Anbieter heute)</li>'
|
||||
)
|
||||
parts.append('</ul></div>')
|
||||
|
||||
if eu_alts:
|
||||
parts.append(
|
||||
'<details style="margin-top:8px"><summary style="font-size:11px;color:#78350f;'
|
||||
'cursor:pointer">EU-Alternativen pro Anbieter (Details)</summary>'
|
||||
'<ul style="margin:6px 0 0 18px;padding:0;font-size:10px;color:#475569">'
|
||||
)
|
||||
for e in eu_alts[:20]:
|
||||
first_alt = (e.get("alternatives") or [{}])[0]
|
||||
parts.append(
|
||||
f'<li style="margin-bottom:3px"><strong>{e["current_vendor"]}</strong>'
|
||||
f' → {first_alt.get("name", "")} ({first_alt.get("country", "")})'
|
||||
f' <span style="color:#94a3b8">— {first_alt.get("notes", "")}</span></li>'
|
||||
)
|
||||
parts.append('</ul></details>')
|
||||
|
||||
parts.append('</div>')
|
||||
return "".join(parts)
|
||||
@@ -7,8 +7,12 @@ including L1/L2 check hierarchy, progress bars, and actionable hints.
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import re
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from .agent_doc_check_routes import CheckItem, DocCheckResult
|
||||
|
||||
@@ -32,12 +36,93 @@ def _icon(passed: bool, skipped: bool = False) -> str:
|
||||
return '<span style="color:#ef4444;font-weight:bold">✗</span>'
|
||||
|
||||
|
||||
def _hint_box(hint: str) -> str:
|
||||
return (
|
||||
def _first_sentence(text: str, max_chars: int = 300) -> str:
|
||||
"""Erster vollstaendiger Satz statt erste Zeile — robust gegen
|
||||
mehrzeilige Fix-Texte die mit Bullet-Listen anfangen."""
|
||||
if not text:
|
||||
return ""
|
||||
# Suche Satz-Endezeichen vor max_chars
|
||||
snippet = text[:max_chars]
|
||||
m = re.search(r"^(.+?[\.\?\!])(?:\s|$)", snippet, re.DOTALL)
|
||||
if m:
|
||||
first = m.group(1).strip()
|
||||
# Wenn der "Satz" eine Variant-Header wie "Variante A:" ist, nimm
|
||||
# weiter — der echte Inhalt kommt erst danach
|
||||
if re.fullmatch(r"(Variante [A-Z]\s*\([^\)]+\):?|Beispiel\s*\d*:?)",
|
||||
first, re.IGNORECASE):
|
||||
rest = text[m.end():].lstrip()
|
||||
return _first_sentence(rest, max_chars)
|
||||
return first
|
||||
# Kein Satz-Endezeichen — nimm bis max_chars
|
||||
line = (text.splitlines() or [""])[0]
|
||||
return line[:max_chars] + ("…" if len(line) > max_chars else "")
|
||||
|
||||
|
||||
def _hint_box(hint: str, check_label: str = "", doc_text: str = "",
|
||||
doc_id: str | None = None) -> str:
|
||||
"""Hint-Block mit angereichertem Recipe + Doc-Anchor wenn moeglich."""
|
||||
base = (
|
||||
f'<div style="font-size:11px;color:#dc2626;margin:2px 0 4px 20px;'
|
||||
f'padding:4px 8px;background:#fef2f2;border-radius:4px;'
|
||||
f'border-left:3px solid #fca5a5">{hint}</div>'
|
||||
f'border-left:3px solid #fca5a5">{hint}'
|
||||
)
|
||||
# Recipe + Anker hinzufuegen wenn check_label bekannt
|
||||
if check_label:
|
||||
try:
|
||||
from compliance.services.finding_action_recipes import recipe_for
|
||||
from compliance.services.doc_anchor_locator import locate_anchor
|
||||
rec = recipe_for(check_label)
|
||||
if rec and rec.get("fix_text"):
|
||||
first_sentence = _first_sentence(rec["fix_text"], 300)
|
||||
full = rec["fix_text"]
|
||||
# Statt <details> ein einfaches Inline-Block-Layout —
|
||||
# robuster bei Plain-Text-Mail-Render
|
||||
more = ""
|
||||
if len(full) > len(first_sentence) + 10:
|
||||
more = (
|
||||
f'<div style="margin-top:4px;padding:6px 8px;background:#fff;'
|
||||
f'border:1px solid #fcd5d5;border-radius:4px;font-size:10px;'
|
||||
f'white-space:pre-wrap;color:#1e293b">'
|
||||
f'<strong style="display:block;margin-bottom:3px;color:#475569">'
|
||||
f'Vollstaendiger Textbaustein zum Einfuegen:</strong>'
|
||||
f'{full}</div>'
|
||||
)
|
||||
base += (
|
||||
f'<div style="margin-top:6px;padding-top:6px;border-top:1px solid #fecaca">'
|
||||
f'<strong style="color:#7c3aed;font-size:10px">Konkrete Massnahme:</strong> '
|
||||
f'<span style="color:#1e293b">{first_sentence}</span>'
|
||||
f'{more}'
|
||||
)
|
||||
# Anker via Embedding-Locator (mit doc_id-Cache)
|
||||
if doc_text:
|
||||
anchor = locate_anchor(check_label, doc_text, doc_id)
|
||||
if anchor and anchor.get("anchor_phrase") and anchor.get("confidence") != "low":
|
||||
conf_label = anchor.get("confidence", "")
|
||||
conf_badge = (
|
||||
f' <span style="color:#94a3b8;font-size:9px">'
|
||||
f'(Match-Konfidenz {conf_label}, '
|
||||
f'Score {anchor.get("score", "—")})</span>'
|
||||
)
|
||||
base += (
|
||||
f'<div style="margin-top:4px;color:#475569;font-size:10px">'
|
||||
f'<strong>Einfuegen:</strong> {anchor["position_hint"]}'
|
||||
f'{conf_badge}</div>'
|
||||
)
|
||||
elif rec.get("where"):
|
||||
# Kein guter Anchor-Match — zeige generischen Fallback
|
||||
base += (
|
||||
f'<div style="margin-top:4px;color:#475569;font-size:10px">'
|
||||
f'<strong>Einfuegen:</strong> {rec["where"]} '
|
||||
f'<span style="color:#94a3b8;font-size:9px">'
|
||||
f'(kein eindeutiger Absatz im Dokument gefunden — '
|
||||
f'Anweisung allgemein)</span></div>'
|
||||
)
|
||||
base += '</div>'
|
||||
except Exception as e:
|
||||
logger.debug("Hint-box enrichment failed: %s", e)
|
||||
pass # Recipes optional — Hint-Box muss nie crashen
|
||||
base += '</div>'
|
||||
return base
|
||||
|
||||
|
||||
def build_management_summary(results: list[DocCheckResult]) -> str:
|
||||
@@ -158,8 +243,14 @@ def _check_to_action(doc_label: str, check_label: str, hint: str) -> str:
|
||||
def build_html_report(
|
||||
results: list[DocCheckResult],
|
||||
cookie_result: dict | None,
|
||||
doc_texts: dict[str, str] | None = None,
|
||||
) -> str:
|
||||
"""Build HTML email report styled like the frontend."""
|
||||
"""Build HTML email report styled like the frontend.
|
||||
|
||||
`doc_texts` is the doc_type→text dict so hint-boxes can locate the
|
||||
relevant Absatz in the original document for the Einfuege-Empfehlung.
|
||||
"""
|
||||
doc_texts = doc_texts or {}
|
||||
ok_count = sum(1 for r in results if r.completeness_pct == 100)
|
||||
html = [
|
||||
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
|
||||
@@ -170,7 +261,7 @@ def build_html_report(
|
||||
]
|
||||
|
||||
for r in results:
|
||||
_render_document(html, r)
|
||||
_render_document(html, r, doc_texts.get(r.doc_type, ""))
|
||||
|
||||
if cookie_result:
|
||||
_render_cookie_banner(html, cookie_result)
|
||||
@@ -179,7 +270,7 @@ def build_html_report(
|
||||
return "\n".join(html)
|
||||
|
||||
|
||||
def _render_document(html: list[str], r: DocCheckResult) -> None:
|
||||
def _render_document(html: list[str], r: DocCheckResult, doc_text: str = "") -> None:
|
||||
pct = r.completeness_pct
|
||||
cpct = r.correctness_pct
|
||||
bar_color = "green" if pct >= 80 else "yellow" if pct >= 50 else "red"
|
||||
@@ -244,7 +335,7 @@ def _render_document(html: list[str], r: DocCheckResult) -> None:
|
||||
else:
|
||||
html.append('<div style="padding:8px 16px 12px">')
|
||||
for c in l1_checks:
|
||||
_render_l1_check(html, c, l2_by_parent.get(c.id, []))
|
||||
_render_l1_check(html, c, l2_by_parent.get(c.id, []), doc_text)
|
||||
|
||||
# Master-Control aggregation: with 1874 MCs evaluated per run,
|
||||
# rendering every L2 check inline produces ~600 rows per doc and
|
||||
@@ -289,6 +380,7 @@ def _render_document(html: list[str], r: DocCheckResult) -> None:
|
||||
|
||||
def _render_l1_check(
|
||||
html: list[str], c: CheckItem, children: list[CheckItem],
|
||||
doc_text: str = "",
|
||||
) -> None:
|
||||
l2_sub = [ch for ch in children if not ch.skipped]
|
||||
l2_passed = sum(1 for ch in l2_sub if ch.passed)
|
||||
@@ -301,16 +393,16 @@ def _render_l1_check(
|
||||
if l2_sub:
|
||||
html.append(f' <span style="color:#9ca3af;font-size:11px">({l2_passed}/{len(l2_sub)})</span>')
|
||||
if not c.passed and c.hint:
|
||||
html.append(_hint_box(c.hint))
|
||||
html.append(_hint_box(c.hint, c.label, doc_text))
|
||||
html.append('</div>')
|
||||
|
||||
for ch in children:
|
||||
if ch.skipped:
|
||||
continue
|
||||
_render_l2_check(html, ch)
|
||||
_render_l2_check(html, ch, doc_text)
|
||||
|
||||
|
||||
def _render_l2_check(html: list[str], ch: CheckItem) -> None:
|
||||
def _render_l2_check(html: list[str], ch: CheckItem, doc_text: str = "") -> None:
|
||||
style = "color:#dc2626;font-weight:500" if not ch.passed else "color:#6b7280"
|
||||
html.append(
|
||||
f'<div style="padding:2px 0 2px 24px;border-left:2px solid #e5e7eb;margin-left:8px">'
|
||||
@@ -324,7 +416,7 @@ def _render_l2_check(html: list[str], ch: CheckItem) -> None:
|
||||
f'white-space:nowrap">"...{ch.matched_text[:80]}..."</div>'
|
||||
)
|
||||
if not ch.passed and ch.hint:
|
||||
html.append(_hint_box(ch.hint))
|
||||
html.append(_hint_box(ch.hint, ch.label, doc_text))
|
||||
html.append('</div>')
|
||||
|
||||
|
||||
|
||||
@@ -1808,6 +1808,32 @@ async def list_categories():
|
||||
# SIMILAR CONTROLS (Embedding-based dedup)
|
||||
# =============================================================================
|
||||
|
||||
_EMBEDDING_COL_AVAILABLE: bool | None = None
|
||||
|
||||
|
||||
def _has_embedding_col() -> bool:
|
||||
"""Cache whether canonical_controls has the embedding column.
|
||||
|
||||
Returns False on systems where pgvector + embedding backfill weren't
|
||||
set up. Saves the per-request 500 + log spam.
|
||||
"""
|
||||
global _EMBEDDING_COL_AVAILABLE
|
||||
if _EMBEDDING_COL_AVAILABLE is not None:
|
||||
return _EMBEDDING_COL_AVAILABLE
|
||||
try:
|
||||
with SessionLocal() as db:
|
||||
r = db.execute(text(
|
||||
"SELECT 1 FROM information_schema.columns "
|
||||
"WHERE table_schema='compliance' "
|
||||
"AND table_name='canonical_controls' "
|
||||
"AND column_name='embedding'"
|
||||
)).fetchone()
|
||||
_EMBEDDING_COL_AVAILABLE = bool(r)
|
||||
except Exception:
|
||||
_EMBEDDING_COL_AVAILABLE = False
|
||||
return _EMBEDDING_COL_AVAILABLE
|
||||
|
||||
|
||||
@router.get("/controls/{control_id}/similar")
|
||||
async def find_similar_controls(
|
||||
control_id: str,
|
||||
@@ -1815,6 +1841,8 @@ async def find_similar_controls(
|
||||
limit: int = Query(20, ge=1, le=100),
|
||||
):
|
||||
"""Find controls similar to the given one using embedding cosine similarity."""
|
||||
if not _has_embedding_col():
|
||||
return []
|
||||
with SessionLocal() as db:
|
||||
# Get the target control's embedding
|
||||
target = db.execute(
|
||||
@@ -1856,7 +1884,7 @@ async def find_similar_controls(
|
||||
"title": r.title,
|
||||
"severity": r.severity,
|
||||
"release_state": r.release_state,
|
||||
"tags": r.tags or [],
|
||||
"tags": _jsonish(r.tags) or [],
|
||||
"license_rule": r.license_rule,
|
||||
"verification_method": r.verification_method,
|
||||
"category": r.category,
|
||||
@@ -1866,6 +1894,10 @@ async def find_similar_controls(
|
||||
]
|
||||
except Exception as e:
|
||||
logger.warning("Embedding similarity query failed (no embedding column?): %s", e)
|
||||
try:
|
||||
db.rollback()
|
||||
except Exception:
|
||||
pass
|
||||
return []
|
||||
|
||||
|
||||
@@ -1946,6 +1978,22 @@ async def get_v1_matches_endpoint(control_id: str):
|
||||
# INTERNAL HELPERS
|
||||
# =============================================================================
|
||||
|
||||
def _jsonish(v):
|
||||
"""Parse v as JSON if it's a string that looks like JSON, otherwise return as-is.
|
||||
|
||||
Some canonical_controls rows were inserted with jsonb columns containing
|
||||
raw JSON strings (e.g. '["a","b"]' as a TEXT). The frontend expects real
|
||||
arrays — coerce here so .map() works.
|
||||
"""
|
||||
if isinstance(v, str) and v and v[0] in "[{":
|
||||
try:
|
||||
import json as _j
|
||||
return _j.loads(v)
|
||||
except Exception:
|
||||
return v
|
||||
return v
|
||||
|
||||
|
||||
def _control_row(r) -> dict:
|
||||
return {
|
||||
"id": str(r.id),
|
||||
@@ -1954,17 +2002,17 @@ def _control_row(r) -> dict:
|
||||
"title": r.title,
|
||||
"objective": r.objective,
|
||||
"rationale": r.rationale,
|
||||
"scope": r.scope,
|
||||
"requirements": r.requirements,
|
||||
"test_procedure": r.test_procedure,
|
||||
"evidence": r.evidence,
|
||||
"scope": _jsonish(r.scope),
|
||||
"requirements": _jsonish(r.requirements),
|
||||
"test_procedure": _jsonish(r.test_procedure) or [],
|
||||
"evidence": _jsonish(r.evidence) or [],
|
||||
"severity": r.severity,
|
||||
"risk_score": float(r.risk_score) if r.risk_score is not None else None,
|
||||
"implementation_effort": r.implementation_effort,
|
||||
"evidence_confidence": float(r.evidence_confidence) if r.evidence_confidence is not None else None,
|
||||
"open_anchors": r.open_anchors,
|
||||
"open_anchors": _jsonish(r.open_anchors) or [],
|
||||
"release_state": r.release_state,
|
||||
"tags": r.tags or [],
|
||||
"tags": _jsonish(r.tags) or [],
|
||||
"license_rule": r.license_rule,
|
||||
"source_original_text": r.source_original_text,
|
||||
"source_citation": r.source_citation,
|
||||
|
||||
@@ -0,0 +1,181 @@
|
||||
"""
|
||||
Consent-Log Export (Borlabs-Parity + DSB-Audit-Anforderung).
|
||||
|
||||
Auditors verlangen routinemaessig einen Auszug aller erteilten/
|
||||
widerrufenen Einwilligungen pro Tenant — heute musste der DSB dafuer
|
||||
manuell SQL schreiben. Diese Endpunkte liefern CSV + JSON direkt aus
|
||||
dem Browser.
|
||||
|
||||
Endpoints:
|
||||
GET /einwilligungen/export/consents.csv
|
||||
GET /einwilligungen/export/consents.json
|
||||
GET /einwilligungen/export/history.csv — Aenderungs-Historie
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import csv
|
||||
import io
|
||||
import json
|
||||
import logging
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from fastapi import APIRouter, Depends, Header, Query
|
||||
from fastapi.responses import Response
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from classroom_engine.database import get_db
|
||||
from ..db.einwilligungen_models import (
|
||||
EinwilligungenConsentDB,
|
||||
EinwilligungenConsentHistoryDB,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
router = APIRouter(prefix="/einwilligungen/export", tags=["einwilligungen-export"])
|
||||
|
||||
|
||||
def _get_tenant(x_tenant_id: str | None = Header(None, alias="X-Tenant-ID")) -> str:
|
||||
if not x_tenant_id:
|
||||
from .tenant_utils import get_tenant_id
|
||||
return get_tenant_id()
|
||||
return x_tenant_id
|
||||
|
||||
|
||||
def _ts() -> str:
|
||||
return datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
|
||||
|
||||
|
||||
def _consent_rows(consents: list[EinwilligungenConsentDB]) -> list[dict]:
|
||||
return [
|
||||
{
|
||||
"consent_id": str(c.id),
|
||||
"user_id": c.user_id or "",
|
||||
"data_point_id": c.data_point_id or "",
|
||||
"granted": "yes" if c.granted else "no",
|
||||
"purpose": c.purpose or "",
|
||||
"consent_version": c.consent_version or "",
|
||||
"ip_address": c.ip_address or "",
|
||||
"user_agent": (c.user_agent or "")[:200],
|
||||
"source": c.source or "",
|
||||
"created_at": c.created_at.isoformat() if c.created_at else "",
|
||||
"updated_at": c.updated_at.isoformat() if c.updated_at else "",
|
||||
"revoked_at": c.revoked_at.isoformat() if getattr(c, "revoked_at", None) else "",
|
||||
}
|
||||
for c in consents
|
||||
]
|
||||
|
||||
|
||||
def _history_rows(entries: list[EinwilligungenConsentHistoryDB]) -> list[dict]:
|
||||
return [
|
||||
{
|
||||
"id": str(e.id),
|
||||
"consent_id": str(e.consent_id),
|
||||
"action": e.action or "",
|
||||
"consent_version": e.consent_version or "",
|
||||
"ip_address": e.ip_address or "",
|
||||
"user_agent": (e.user_agent or "")[:200],
|
||||
"source": e.source or "",
|
||||
"created_at": e.created_at.isoformat() if e.created_at else "",
|
||||
}
|
||||
for e in entries
|
||||
]
|
||||
|
||||
|
||||
def _csv_response(rows: list[dict], filename: str) -> Response:
|
||||
if not rows:
|
||||
return Response(content="", media_type="text/csv",
|
||||
headers={"Content-Disposition": f"attachment; filename={filename}"})
|
||||
buf = io.StringIO()
|
||||
w = csv.DictWriter(buf, fieldnames=list(rows[0].keys()), quoting=csv.QUOTE_ALL)
|
||||
w.writeheader()
|
||||
w.writerows(rows)
|
||||
return Response(content=buf.getvalue(), media_type="text/csv; charset=utf-8",
|
||||
headers={"Content-Disposition": f"attachment; filename={filename}"})
|
||||
|
||||
|
||||
def _json_response(payload: dict, filename: str) -> Response:
|
||||
body = json.dumps(payload, ensure_ascii=False, indent=2, default=str)
|
||||
return Response(content=body, media_type="application/json; charset=utf-8",
|
||||
headers={"Content-Disposition": f"attachment; filename={filename}"})
|
||||
|
||||
|
||||
@router.get("/consents.csv")
|
||||
async def export_consents_csv(
|
||||
user_id: str | None = Query(None, description="Filter by single user"),
|
||||
granted: bool | None = Query(None),
|
||||
since: str | None = Query(None, description="ISO timestamp"),
|
||||
tenant_id: str = Depends(_get_tenant),
|
||||
db: Session = Depends(get_db),
|
||||
) -> Response:
|
||||
"""Download all consent records of this tenant as CSV (auditor-ready)."""
|
||||
q = db.query(EinwilligungenConsentDB).filter(
|
||||
EinwilligungenConsentDB.tenant_id == tenant_id,
|
||||
)
|
||||
if user_id:
|
||||
q = q.filter(EinwilligungenConsentDB.user_id == user_id)
|
||||
if granted is not None:
|
||||
q = q.filter(EinwilligungenConsentDB.granted == granted)
|
||||
if since:
|
||||
try:
|
||||
since_dt = datetime.fromisoformat(since.rstrip("Z"))
|
||||
q = q.filter(EinwilligungenConsentDB.created_at >= since_dt)
|
||||
except Exception:
|
||||
pass
|
||||
rows = _consent_rows(q.order_by(EinwilligungenConsentDB.created_at.desc()).all())
|
||||
return _csv_response(rows, f"consents_{tenant_id[:8]}_{_ts()}.csv")
|
||||
|
||||
|
||||
@router.get("/consents.json")
|
||||
async def export_consents_json(
|
||||
user_id: str | None = Query(None),
|
||||
granted: bool | None = Query(None),
|
||||
since: str | None = Query(None),
|
||||
tenant_id: str = Depends(_get_tenant),
|
||||
db: Session = Depends(get_db),
|
||||
) -> Response:
|
||||
"""Same data as the CSV endpoint but JSON-shaped for further processing."""
|
||||
q = db.query(EinwilligungenConsentDB).filter(
|
||||
EinwilligungenConsentDB.tenant_id == tenant_id,
|
||||
)
|
||||
if user_id:
|
||||
q = q.filter(EinwilligungenConsentDB.user_id == user_id)
|
||||
if granted is not None:
|
||||
q = q.filter(EinwilligungenConsentDB.granted == granted)
|
||||
if since:
|
||||
try:
|
||||
since_dt = datetime.fromisoformat(since.rstrip("Z"))
|
||||
q = q.filter(EinwilligungenConsentDB.created_at >= since_dt)
|
||||
except Exception:
|
||||
pass
|
||||
rows = _consent_rows(q.order_by(EinwilligungenConsentDB.created_at.desc()).all())
|
||||
payload = {
|
||||
"tenant_id": tenant_id,
|
||||
"exported_at": datetime.now(timezone.utc).isoformat(),
|
||||
"filter": {"user_id": user_id, "granted": granted, "since": since},
|
||||
"count": len(rows),
|
||||
"consents": rows,
|
||||
}
|
||||
return _json_response(payload, f"consents_{tenant_id[:8]}_{_ts()}.json")
|
||||
|
||||
|
||||
@router.get("/history.csv")
|
||||
async def export_history_csv(
|
||||
consent_id: str | None = Query(None, description="Limit to one consent"),
|
||||
since: str | None = Query(None),
|
||||
tenant_id: str = Depends(_get_tenant),
|
||||
db: Session = Depends(get_db),
|
||||
) -> Response:
|
||||
"""Download the consent-change history (Art. 7(1) Nachweispflicht)."""
|
||||
q = db.query(EinwilligungenConsentHistoryDB).filter(
|
||||
EinwilligungenConsentHistoryDB.tenant_id == tenant_id,
|
||||
)
|
||||
if consent_id:
|
||||
q = q.filter(EinwilligungenConsentHistoryDB.consent_id == consent_id)
|
||||
if since:
|
||||
try:
|
||||
since_dt = datetime.fromisoformat(since.rstrip("Z"))
|
||||
q = q.filter(EinwilligungenConsentHistoryDB.created_at >= since_dt)
|
||||
except Exception:
|
||||
pass
|
||||
rows = _history_rows(q.order_by(EinwilligungenConsentHistoryDB.created_at.asc()).all())
|
||||
return _csv_response(rows, f"consent-history_{tenant_id[:8]}_{_ts()}.csv")
|
||||
Reference in New Issue
Block a user