feat(compliance-check): exec-summary + voll-audit + TDM-respect + cookie-KB-extended + saving-scan-funnel
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m43s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped

P1 — Exec-Summary oben im Email-Report (4 KPIs + 2 CTAs, dunkler Gradient)
P3 — no_direct_sales-Flag fuer OEM-Konfigurator-Sites; AGB/Widerruf/AGB als
     "NICHT ANWENDBAR" (grau) statt "NICHT GEFUNDEN" (rot)
P5 — Voll-Audit Unification: alle Findings (MC + Pflichtangaben + Vendor +
     Redundanz) in /data/compliance_audits.db.unified_findings; neuer
     /api/compliance/agent/findings/<id> Endpoint + FindingsTab im Audit-UI
     mit Filter + CSV-Export
P7 — Crawl-Hardening: TDM-Reservation-Check (robots.txt / ai.txt / Header /
     Meta) vor jedem Run mit 24h-Cache; HeadlessChrome-UA (Firma noch nicht
     gegruendet — Switch via BREAKPILOT_BRANDED_UA env); per-Domain
     Rate-Limit 1 req/s + max 2 concurrent
P2 — Cookie-Knowledge-DB additiv erweitert (35 -> 74 Cookies): Adobe, Meta,
     Microsoft, LinkedIn, TikTok, HubSpot, Marketo, Salesforce, Hotjar,
     FullStory, Mouseflow, Intercom, Drift, Zendesk, Cloudflare, Stripe,
     OneTrust/Cookiebot/Usercentrics, Matomo, Pinterest, Snapchat, X/Twitter,
     YouTube, Vimeo, Klaviyo, Mailchimp, Mixpanel, Segment, Amplitude,
     Optimizely, Datadog; Wire-in in cookie_function_classifier liefert
     compliance_risk-Label (kritisch/hoch/mittel/gering) pro Vendor
A  — k-Anonymitaets-Helper (benchmark_k_anonymity) fuer P6-Vorbereitung
B  — Cross-Tenant-Domain-Assertion im /findings-Endpoint (expected_domain
     Query-Param -> 403 bei Mismatch)
C  — Saving-Scan-Funnel: /api/compliance/agent/saving-scan/start mit
     Validierung + 24h-Rate-Limit pro Domain + Lead-Persistenz in
     saving_scan_leads + Auto-Discovery via _run_compliance_check; 6 Tests
D  — Risk-Badge im Email-Vendor-Row

Rechtliche Leitplanken (Memory feedback_oem_data_legal.md): nur eigene
Knapp-Bewertungen + Source-Pointer, keine 1:1-Kopien fremder CMP-Texte.
TDM-Opt-Out-Respect nach § 44b UrhG. KEINE Schema-Aenderungen — alles in
Sidecar-SQLite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-18 23:48:34 +02:00
parent a616b64273
commit 6c223c7c9b
23 changed files with 2685 additions and 29 deletions
@@ -166,6 +166,33 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
except Exception:
pass
# P7: TDM-Reservation-Check der Base-Domain (§ 44b UrhG).
# Bei reserved/denied: Run sofort beenden, kein Crawl.
try:
from compliance.services.tdm_reservation_check import (
check_tdm_reservation, is_crawl_allowed,
)
first_url = next(
(d.url for d in req.documents if d.url), "",
)
if first_url:
tdm = await check_tdm_reservation(first_url)
_compliance_check_jobs[check_id]["tdm"] = tdm
if not is_crawl_allowed(tdm):
_compliance_check_jobs[check_id]["status"] = "skipped_tdm"
_compliance_check_jobs[check_id]["error"] = (
f"TDM-Vorbehalt fuer {tdm.get('domain')} erkannt "
f"(status={tdm.get('status')}) — Crawl nach § 44b "
f"UrhG nicht zulaessig. Signals: "
f"{[s.get('src') for s in tdm.get('signals', [])]}"
)
_compliance_check_jobs[check_id]["progress_pct"] = 100
logger.info("TDM-skip check_id=%s domain=%s status=%s",
check_id, tdm.get("domain"), tdm.get("status"))
return
except Exception as e:
logger.warning("TDM-check failed (proceeding): %s", e)
# Step 1: Resolve texts (fetch from URL if needed) — 0-30%
_update(check_id, "Texte werden geladen...", 1)
doc_texts: dict[str, str] = {}
@@ -526,15 +553,37 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
report_html = build_html_report(results, None, doc_texts)
profile_html = _build_profile_html(profile)
# O4: Vendor-Redundanz / EU-Alternativen + Cost-Savings-Block
# zwischen VVT und Doc-Report einsortiert, damit Geschaeftsfuehrung
# die Einsparung sieht bevor sie in die Detail-Pruefung geht.
# O4: Vendor-Redundanz / EU-Alternativen + Cost-Savings-Block
from .agent_doc_check_redundancy import build_redundancy_html
redundancy_html = build_redundancy_html(redundancy_report)
# P1: Executive-Summary GANZ oben — CFO/GF sieht 4 KPIs + 2 CTAs.
from .agent_doc_check_exec_summary import build_exec_summary_html
# Site-Name fuer Header bestimmen (gleiche Logik wie Email-Subject)
url_company_for_exec = _company_name_from_url(doc_entries)
domain_for_exec = _extract_domain(doc_entries)
site_name_for_exec = url_company_for_exec or domain_for_exec or ""
exec_summary_html = build_exec_summary_html(
scorecard=scorecard,
previous_scorecard=prev_scorecard,
cmp_vendors=cmp_vendors,
redundancy_report=redundancy_report,
site_name=site_name_for_exec,
)
# Reihenfolge — Sales-optimiert:
# 1) Exec-Summary (KPIs + Saving + CTAs)
# 2) summary_html (Konkrete Aufgaben fuer die Geschaeftsfuehrung)
# 3) scanned_urls (Quellen-Transparenz)
# 4) profile_html (Erkanntes Geschaeftsmodell)
# 5) scorecard_html (MC-Scorecard)
# 6) redundancy_html (Optimierungspotenzial — direkt nach Compliance-Score)
# 7) providers_html + vvt_html (Vendor-Liste)
# 8) report_html (Doc-Pruefung Details)
full_html = (
summary_html + scanned_html + profile_html + scorecard_html
+ providers_html + vvt_html + redundancy_html + report_html
exec_summary_html + summary_html + scanned_html + profile_html
+ scorecard_html + redundancy_html
+ providers_html + vvt_html + report_html
)
# Step 6: Send email — derive site name primarily from entered URL.
@@ -619,6 +668,21 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
vendors=cmp_vendors,
profile=extracted_profile,
)
# Unified findings (P5): bundle MC + Pflichtangaben + Vendor +
# Redundanz in one searchable table behind /agent/findings/<id>.
try:
from compliance.services.unified_findings_collector import collect
from compliance.services.unified_findings_store import record_findings
unified = collect(
check_id=check_id,
results=results,
cmp_vendors=cmp_vendors,
redundancy_report=redundancy_report,
doc_texts=doc_texts,
)
record_findings(check_id, unified)
except Exception as e:
logger.warning("Unified findings collect failed: %s", e)
except Exception as e:
logger.warning("Audit persistence skipped: %s", e)
@@ -696,11 +760,19 @@ async def _fetch_text(url: str, doc_type: str = "") -> tuple[str, list[dict]]:
except Exception as e:
logger.warning("Consent-tester fetch failed for %s: %s", url, e)
# 2. Fallback: direct HTTP fetch (works for SSR pages like BMW)
# 2. Fallback: direct HTTP fetch (works for SSR pages like BMW).
# P7: kenntlicher UA + per-Domain Rate-Limit.
try:
import re as _re
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
resp = await client.get(url)
from compliance.services.compliance_user_agent import (
default_request_headers, DomainRateLimiter,
)
async with httpx.AsyncClient(
timeout=30.0, follow_redirects=True,
headers=default_request_headers(),
) as client:
async with DomainRateLimiter(url):
resp = await client.get(url)
if resp.status_code == 200 and "text/html" in resp.headers.get("content-type", ""):
html = resp.text
# Strip HTML tags, decode entities
@@ -1135,8 +1207,25 @@ def _company_name_from_url(doc_entries: list[dict]) -> str | None:
def _get_skip_types(profile) -> dict[str, str]:
"""Doc_types to skip entirely. Currently empty — we check everything
and flag irrelevant items as INFO instead of skipping."""
"""Doc_types to skip entirely with a per-type reason message.
Heute primaer fuer OEM-Konfigurator-Pattern (BMW/Audi/Mercedes):
wenn die Site kein Direkt-Vertrieb macht, sind AGB/Widerruf/
Nutzungsbedingungen nicht Pflicht auf der Website — sie werden
beim Vertragshaendler ausgehaendigt.
"""
if getattr(profile, "no_direct_sales", False):
msg = (
"Nicht anwendbar — die Webseite schliesst keinen Direkt-"
"Kaufvertrag (OEM-Konfigurator-Pattern, Vertrag laeuft "
"ueber Vertragshaendler). AGB/Widerruf werden beim "
"Haendler ausgehaendigt."
)
return {
"agb": msg,
"widerruf": msg,
"nutzungsbedingungen": msg,
}
return {}
@@ -0,0 +1,135 @@
"""
Executive-Summary-Block — der oberste Email-Abschnitt.
Zeigt CFO / GF in 4 Zahlen den Gesamt-Mehrwert des Compliance-Checks:
1) Compliance-Score (Trend vs Vorlauf)
2) Anzahl analysierter Anbieter
3) Geschaetztes jaehrliches Sparpotenzial (Range)
4) Konsolidierungs-Potenzial (Anbieter koennen reduziert werden)
Plus zwei Big-CTA-Buttons:
- "Compliance-Maengel im Detail" → springt zum Doc-Pruefungs-Block
- "Konsolidierungs-Plan ansehen" → springt zum Redundanz-Block
Ziel: in 5 Sekunden sieht der Vorstand den ROI. Wenn neugierig, scrollt
er weiter in die Detail-Bloecke (die UNTER dieser Summary liegen).
"""
from __future__ import annotations
def _fmt_eur_range(low: int, high: int) -> str:
if not low and not high:
return ""
if low == high:
return f"~{low:,}".replace(",", ".")
return f"{low:,}{high:,}".replace(",", ".")
def build_exec_summary_html(
scorecard: dict | None,
previous_scorecard: dict | None,
cmp_vendors: list[dict] | None,
redundancy_report: dict | None,
site_name: str = "",
) -> str:
"""Build the top-of-email Executive Summary with 4 KPIs + 2 CTAs."""
# 1) Compliance-Score
pct = 0
delta_str = ""
score_color = "#94a3b8"
if scorecard:
totals = scorecard.get("totals") or {}
pct = int(totals.get("pct", 0))
score_color = ("#16a34a" if pct >= 80 else
"#d97706" if pct >= 50 else "#dc2626")
if previous_scorecard:
prev_pct = int((previous_scorecard.get("totals") or {}).get("pct", 0))
d = pct - prev_pct
if d:
trend_color = "#16a34a" if d > 0 else "#dc2626"
delta_str = (
f'<span style="font-size:14px;color:{trend_color};margin-left:6px">'
f'{"+" if d > 0 else ""}{d} pp</span>'
)
# 2) Vendor-Count
n_vendors = len(cmp_vendors or [])
# 3+4) Saving + Konsolidierung
s = (redundancy_report or {}).get("summary") or {}
sav_low, sav_high = s.get("estimated_saving_year_eur", [0, 0])
n_consolidation = s.get("consolidation_potential", 0)
sav_pct = s.get("estimated_saving_pct", "")
parts = [
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
'max-width:700px;margin:0 auto 18px;padding:18px 22px;'
'background:linear-gradient(135deg,#1e293b 0%,#0f172a 100%);'
'border-radius:10px;color:white">',
f'<div style="font-size:11px;color:#94a3b8;text-transform:uppercase;'
f'letter-spacing:1.5px;margin-bottom:6px">Executive Summary</div>',
f'<h2 style="margin:0 0 16px;font-size:18px;color:white">'
f'Compliance-Check {site_name}</h2>',
# 2x2 KPI grid
'<table style="width:100%;border-collapse:separate;border-spacing:8px">',
# Row 1: Compliance + Vendor count
'<tr>',
f'<td style="width:50%;padding:12px 14px;background:rgba(255,255,255,0.05);'
f'border-radius:6px;border:1px solid rgba(255,255,255,0.08)">'
f'<div style="font-size:10px;color:#94a3b8;text-transform:uppercase;'
f'letter-spacing:1px;margin-bottom:4px">DSGVO / TDDDG / TMG Score</div>'
f'<div style="font-size:28px;font-weight:700;color:{score_color}">'
f'{pct}%{delta_str}</div>'
f'<div style="font-size:11px;color:#cbd5e1;margin-top:2px">'
f'aus {int((scorecard or {}).get("totals", {}).get("total", 0))} Pflicht-Pruefungen</div>'
f'</td>',
f'<td style="width:50%;padding:12px 14px;background:rgba(255,255,255,0.05);'
f'border-radius:6px;border:1px solid rgba(255,255,255,0.08)">'
f'<div style="font-size:10px;color:#94a3b8;text-transform:uppercase;'
f'letter-spacing:1px;margin-bottom:4px">Identifizierte Anbieter</div>'
f'<div style="font-size:28px;font-weight:700;color:white">{n_vendors}</div>'
f'<div style="font-size:11px;color:#cbd5e1;margin-top:2px">'
f'davon {n_consolidation} konsolidierbar</div>'
f'</td>',
'</tr>',
# Row 2: Saving + CTA-Hinweis
'<tr>',
f'<td colspan="2" style="padding:14px 16px;background:linear-gradient(90deg,'
f'rgba(16,185,129,0.15) 0%,rgba(16,185,129,0.05) 100%);'
f'border-radius:6px;border:1px solid rgba(16,185,129,0.3)">'
f'<div style="font-size:10px;color:#86efac;text-transform:uppercase;'
f'letter-spacing:1px;margin-bottom:4px">'
f'Geschaetztes Sparpotenzial pro Jahr (Tool-Lizenzen, ohne Media-Spend)</div>'
f'<div style="font-size:24px;font-weight:700;color:#34d399">'
f'{_fmt_eur_range(sav_low, sav_high)}'
f'<span style="font-size:14px;color:#86efac;margin-left:8px">({sav_pct})</span></div>'
f'<div style="font-size:11px;color:#cbd5e1;margin-top:4px">'
f'durch Konsolidierung redundanter Anbieter auf je 1 EU-Tool pro '
f'Funktions-Kategorie. <em>Schaetzbereich, mit dem Einkauf zu verifizieren.</em>'
f'</div></td>',
'</tr>',
'</table>',
# CTAs
'<div style="margin-top:14px;padding-top:12px;border-top:1px solid '
'rgba(255,255,255,0.1);text-align:center">',
'<a href="#mc-scorecard" style="display:inline-block;padding:8px 16px;'
'background:#7c3aed;color:white;text-decoration:none;border-radius:6px;'
'font-size:12px;font-weight:600;margin-right:8px">'
'Compliance-Maengel im Detail &rarr;</a>',
'<a href="#optimierungspotenzial" style="display:inline-block;padding:8px 16px;'
'background:#10b981;color:white;text-decoration:none;border-radius:6px;'
'font-size:12px;font-weight:600">'
'Konsolidierungs-Plan &rarr;</a>',
'</div>',
'</div>',
]
return "".join(parts)
@@ -421,10 +421,18 @@ def _render_vendor_row_full(v: dict) -> str:
f'{", ".join(flags[:4])}</div>'
f'{actions_html}'
)
risk = v.get("compliance_risk") or {}
risk_label = risk.get("label") or ""
risk_badge = ""
if risk_label and risk_label != "unklar":
rc = {"kritisch": ("#dc2626", "#fff"), "hoch": ("#fecaca", "#991b1b"),
"mittel": ("#fde68a", "#92400e"), "gering": ("#d1fae5", "#065f46")}.get(risk_label, ("#e5e7eb", "#475569"))
risk_badge = (f'<span style="margin-left:6px;padding:1px 5px;border-radius:3px;font-size:9px;'
f'background:{rc[0]};color:{rc[1]}">Risk: {risk_label}</span>')
return (
f'<tr style="border-top:1px solid #e2e8f0">'
f'<td style="padding:6px 8px;color:#1e293b;font-size:11px">'
f'{name}{flag_str}</td>'
f'{name}{risk_badge}{flag_str}</td>'
f'<td style="padding:6px 8px;color:#475569;font-size:11px">{category}</td>'
f'<td style="padding:6px 8px;color:#475569;font-size:11px">{country}</td>'
f'<td style="padding:6px 8px;text-align:center;color:#475569;font-size:11px">'
@@ -28,9 +28,10 @@ def build_redundancy_html(report: dict | None) -> str:
pct = s.get("estimated_saving_pct") or "n/a"
parts = [
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
'max-width:700px;margin:0 auto 16px;padding:14px 18px;'
'background:#fef3c7;border:1px solid #fcd34d;border-radius:8px">',
'<div id="optimierungspotenzial" style="font-family:-apple-system,'
'BlinkMacSystemFont,sans-serif;max-width:700px;margin:0 auto 16px;'
'padding:14px 18px;background:#fef3c7;border:1px solid #fcd34d;'
'border-radius:8px">',
'<h3 style="margin:0 0 6px;font-size:14px;color:#92400e">'
'Optimierungspotenzial: Redundanzen + EU-Alternativen</h3>',
f'<p style="margin:0 0 10px;font-size:11px;color:#78350f">'
@@ -134,7 +134,9 @@ def build_management_summary(results: list[DocCheckResult]) -> str:
ok = [r for r in results if r.completeness_pct == 100 and not r.error]
fixable = [r for r in results if 0 < r.completeness_pct < 100 and not r.error]
critical = [r for r in results if r.completeness_pct == 0 and not r.error]
errors = [r for r in results if r.error]
not_applicable = [r for r in results if r.error
and r.error.startswith("Nicht anwendbar")]
errors = [r for r in results if r.error and r not in not_applicable]
html = [
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
@@ -150,17 +152,24 @@ def build_management_summary(results: list[DocCheckResult]) -> str:
html.append('<p>Keine Dokumente geprueft.</p></div>')
return "\n".join(html)
na_note = (
f' Zusaetzlich {len(not_applicable)} Dokument{"" if len(not_applicable) == 1 else "e"} '
f'als NICHT ANWENDBAR markiert (kein Direkt-Vertrieb — '
f'OEM-Konfigurator-Pattern).' if not_applicable else ""
)
if len(ok) == total:
html.append(
'<p style="color:#16a34a;font-weight:600;font-size:15px">'
'Alle Dokumente sind vollstaendig. Keine dringenden Massnahmen noetig.</p>'
f'<p style="color:#16a34a;font-weight:600;font-size:15px">'
f'Alle Dokumente sind vollstaendig. Keine dringenden Massnahmen noetig.'
f'{na_note}</p>'
)
else:
html.append(
f'<p style="font-size:14px;color:#475569">'
f'{len(ok)} von {total} Dokumenten sind vollstaendig. '
f'{len(fixable)} brauchen Korrekturen'
f'{f", {len(critical)} fehlen oder sind unbrauchbar" if critical else ""}.</p>'
f'{f", {len(critical)} fehlen oder sind unbrauchbar" if critical else ""}.'
f'{na_note}</p>'
)
# Concrete actions
@@ -279,10 +288,13 @@ def _render_document(html: list[str], r: DocCheckResult, doc_text: str = "") ->
r.error.startswith("Nicht eingereicht")
or r.error.startswith("Auf der Website nicht gefunden")
)
is_not_applicable = bool(r.error) and r.error.startswith("Nicht anwendbar")
if is_missing:
status_label = ("NICHT GEFUNDEN"
if r.error.startswith("Auf der Website")
else "NICHT EINGEREICHT")
elif is_not_applicable:
status_label = "NICHT ANWENDBAR"
elif r.error:
status_label = "FEHLER"
@@ -330,6 +342,13 @@ def _render_document(html: list[str], r: DocCheckResult, doc_text: str = "") ->
'background:#fafafa;border-top:1px solid #f3f4f6">'
+ body_msg + '</div>'
)
elif is_not_applicable:
html.append(
'<div style="padding:12px 16px;color:#475569;font-size:12px;'
'background:#f1f5f9;border-top:1px solid #cbd5e1;border-left:'
'3px solid #94a3b8">'
+ r.error + '</div>'
)
elif r.error:
html.append(f'<div style="padding:12px 16px;color:#991b1b">{r.error}</div>')
else:
@@ -44,7 +44,7 @@ def build_scorecard_html(
trend_str = _delta_badge(overall_pct, prev_total_pct) if prev_total_pct is not None else ""
head = (
'<div style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
'<div id="mc-scorecard" style="font-family:-apple-system,BlinkMacSystemFont,sans-serif;'
'max-width:700px;margin:0 auto 16px;padding:12px 16px;'
'background:#f0f9ff;border:1px solid #bae6fd;border-radius:8px">'
'<h3 style="margin:0 0 6px;font-size:14px;color:#0369a1">'
@@ -0,0 +1,104 @@
"""
Voll-Audit Findings Router unified view across all 4 finding sources.
Endpoint:
GET /api/compliance/agent/findings/{check_id}
?source=mc|pflichtangabe|vendor|redundanz|all
&severity=CRITICAL|HIGH|MEDIUM|LOW|INFO|all
&doc_type=impressum|dse|cookie|...|all
&status=failed|passed|skipped|na|info|all
&q=<freitext>
&limit=<int>
Liefert summary + filtered findings list. Frontend rendert daraus den
Voll-Audit-Tab unter /sdk/agent/audit/<check_id>.
"""
from __future__ import annotations
import logging
from urllib.parse import urlparse
from fastapi import APIRouter, HTTPException, Query
from compliance.services.unified_findings_store import (
findings_summary,
list_findings,
)
from compliance.services.compliance_audit_log import get_check_run
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/compliance/agent", tags=["agent"])
def _normalize_domain(d: str) -> str:
if not d:
return ""
if "://" not in d:
d = "https://" + d
host = urlparse(d).netloc.lower()
return host[4:] if host.startswith("www.") else host
@router.get("/findings/{check_id}")
def get_findings(
check_id: str,
source: str | None = Query(None, description="mc|pflichtangabe|vendor|redundanz|all"),
severity: str | None = Query(None, description="CRITICAL|HIGH|MEDIUM|LOW|INFO|all"),
doc_type: str | None = Query(None),
status: str | None = Query(None, description="failed|passed|skipped|na|info|all"),
q: str | None = Query(None, description="freitext-suche label/vendor"),
limit: int = Query(1000, ge=1, le=5000),
expected_domain: str | None = Query(
None, description="Hard-Assertion: Run muss zu dieser Domain gehoeren (Cross-Tenant-Schutz)",
),
) -> dict:
"""Return aggregated findings + summary counters for a check run."""
# P7-Restpunkt: optionale Domain-Assertion. Verhindert dass ein Frontend
# einen check_id einer fremden Tenant-Domain anfragen kann.
if expected_domain:
run = get_check_run(check_id)
actual = _normalize_domain((run or {}).get("base_domain") or "")
if not run or actual != _normalize_domain(expected_domain):
raise HTTPException(
status_code=403,
detail=f"Cross-tenant access blocked: check_id {check_id} "
f"gehoert zu Domain '{actual or '?'}', angefragt: "
f"'{_normalize_domain(expected_domain)}'",
)
try:
summary = findings_summary(check_id)
findings = list_findings(
check_id=check_id,
source_type=source,
severity=severity,
doc_type=doc_type,
status=status,
q=q,
limit=limit,
)
return {
"found": summary.get("total", 0) > 0,
"check_id": check_id,
"summary": summary,
"filter": {
"source": source or "all",
"severity": severity or "all",
"doc_type": doc_type or "all",
"status": status or "all",
"q": q or "",
"limit": limit,
},
"count": len(findings),
"findings": findings,
}
except Exception as e:
logger.exception("get_findings failed for %s", check_id)
return {
"found": False,
"check_id": check_id,
"error": str(e)[:200],
"summary": {},
"count": 0,
"findings": [],
}
@@ -0,0 +1,196 @@
"""
Saving-Scan-Funnel Endpoint Marketing-Lead Compliance-Check.
Externes Form (https://breakpilot.ai/savings-scan) postet hier:
POST /api/compliance/agent/saving-scan/start
Body: {"url": "...", "email": "..."}
Server-side:
1. Validierung URL + Email (E-Mail-Regex, URL-Schema).
2. Rate-Limit: max 1 vollstaendiger Scan / Domain / 24h
(saving_scan_allowed aus compliance_user_agent).
3. Lead persistieren (saving_scan_leads in Sidecar-SQLite) fuer
spaeteren Report-Versand + Sales-Follow-Up.
4. Compliance-Check starten mit Auto-Discovery (DocumentInput leer
ausser Homepage). Der bestehende Worker laeuft TDM-Check, dann
Discovery, dann Pruefung.
5. check_id zurueck Frontend pollt /compliance-check/<check_id>.
"""
from __future__ import annotations
import logging
import os
import re
import sqlite3
import uuid as _uuid
from datetime import datetime, timezone
from pathlib import Path
import asyncio
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel, Field
from compliance.services.compliance_user_agent import (
base_domain_of, saving_scan_allowed,
)
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/compliance/agent", tags=["agent"])
DB_PATH = os.getenv("COMPLIANCE_AUDIT_DB", "/data/compliance_audits.db")
_EMAIL_RE = re.compile(r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$")
_URL_RE = re.compile(r"^https?://[A-Za-z0-9.-]+(/.*)?$")
class SavingScanRequest(BaseModel):
url: str = Field(..., min_length=4, max_length=400)
email: str = Field(..., min_length=5, max_length=200)
consent: bool = Field(
True, description="Marketing-Consent fuer Sales-Follow-Up — "
"muss True sein laut Form-Checkbox.",
)
class SavingScanResponse(BaseModel):
check_id: str
status: str
message: str = ""
def _ensure_leads_table() -> None:
Path(DB_PATH).parent.mkdir(parents=True, exist_ok=True)
with sqlite3.connect(DB_PATH) as conn:
conn.executescript("""
CREATE TABLE IF NOT EXISTS saving_scan_leads (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ts TEXT NOT NULL,
email TEXT NOT NULL,
url TEXT NOT NULL,
base_domain TEXT NOT NULL,
check_id TEXT,
consent INTEGER NOT NULL,
source TEXT
);
CREATE INDEX IF NOT EXISTS idx_leads_domain ON saving_scan_leads(base_domain, ts);
CREATE INDEX IF NOT EXISTS idx_leads_email ON saving_scan_leads(email, ts);
""")
def _persist_lead(email: str, url: str, check_id: str, consent: bool) -> None:
try:
_ensure_leads_table()
with sqlite3.connect(DB_PATH) as conn:
conn.execute(
"INSERT INTO saving_scan_leads "
"(ts, email, url, base_domain, check_id, consent, source) "
"VALUES (?, ?, ?, ?, ?, ?, ?)",
(
datetime.now(timezone.utc).isoformat(),
email.lower().strip(),
url,
base_domain_of(url),
check_id,
1 if consent else 0,
"saving_scan_form",
),
)
conn.commit()
except Exception as e:
logger.warning("persist lead failed: %s", e)
def _normalize_url(url: str) -> str:
"""Strip path → behaupt nur Homepage, der Discover findet den Rest."""
if "://" not in url:
url = "https://" + url
from urllib.parse import urlparse
p = urlparse(url)
return f"{p.scheme}://{p.netloc}/"
@router.post("/saving-scan/start", response_model=SavingScanResponse)
async def start_saving_scan(req: SavingScanRequest) -> SavingScanResponse:
"""Trigger compliance check from the marketing-funnel form."""
if not _EMAIL_RE.match(req.email):
raise HTTPException(400, "Ungueltige E-Mail-Adresse.")
if not _URL_RE.match(req.url):
raise HTTPException(400, "URL muss mit http:// oder https:// beginnen.")
if not req.consent:
raise HTTPException(400, "Marketing-Consent erforderlich.")
domain = base_domain_of(req.url)
if not domain:
raise HTTPException(400, "Konnte Domain nicht ermitteln.")
allowed, wait_s = saving_scan_allowed(req.url)
if not allowed:
raise HTTPException(
429,
f"Fuer '{domain}' wurde in den letzten 24h bereits ein Scan "
f"durchgefuehrt. Bitte in {wait_s // 3600}h {wait_s % 3600 // 60}min "
f"erneut versuchen.",
)
# Lazy import to avoid circular dependency at module load.
from compliance.api.agent_compliance_check_routes import (
DocumentInput,
ComplianceCheckRequest,
_run_compliance_check,
_compliance_check_jobs,
)
homepage = _normalize_url(req.url)
check_id = str(_uuid.uuid4())[:8]
_compliance_check_jobs[check_id] = {
"status": "running",
"progress": "Saving-Scan gestartet — Auto-Discovery laeuft...",
"progress_pct": 0,
"result": None,
"error": "",
}
# Single "other" entry forces auto-discovery to fill in the rest.
docs = [DocumentInput(doc_type="other", url=homepage)]
check_req = ComplianceCheckRequest(
documents=docs, recipient=req.email.lower().strip(),
)
_persist_lead(req.email, req.url, check_id, req.consent)
asyncio.create_task(_run_compliance_check(check_id, check_req))
logger.info("saving-scan start: check_id=%s domain=%s email=%s",
check_id, domain, req.email[:3] + "***")
return SavingScanResponse(
check_id=check_id,
status="running",
message=f"Scan gestartet fuer {domain}. Bericht in ~3-5 Minuten.",
)
@router.get("/saving-scan/lead-count")
def saving_scan_lead_count() -> dict:
"""Diagnostik fuer das Sales-Dashboard."""
try:
_ensure_leads_table()
with sqlite3.connect(DB_PATH) as conn:
total = conn.execute(
"SELECT COUNT(*) FROM saving_scan_leads",
).fetchone()[0]
last_24h = conn.execute(
"SELECT COUNT(*) FROM saving_scan_leads "
"WHERE ts > datetime('now', '-1 day')",
).fetchone()[0]
top_domains = conn.execute(
"SELECT base_domain, COUNT(*) AS n FROM saving_scan_leads "
"GROUP BY base_domain ORDER BY n DESC LIMIT 10",
).fetchall()
return {
"total_leads": total,
"last_24h": last_24h,
"top_domains": [{"domain": d, "scans": n} for d, n in top_domains],
}
except Exception as e:
return {"error": str(e)[:200]}