feat(compliance-check): exec-summary + voll-audit + TDM-respect + cookie-KB-extended + saving-scan-funnel
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m43s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m43s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P1 — Exec-Summary oben im Email-Report (4 KPIs + 2 CTAs, dunkler Gradient)
P3 — no_direct_sales-Flag fuer OEM-Konfigurator-Sites; AGB/Widerruf/AGB als
"NICHT ANWENDBAR" (grau) statt "NICHT GEFUNDEN" (rot)
P5 — Voll-Audit Unification: alle Findings (MC + Pflichtangaben + Vendor +
Redundanz) in /data/compliance_audits.db.unified_findings; neuer
/api/compliance/agent/findings/<id> Endpoint + FindingsTab im Audit-UI
mit Filter + CSV-Export
P7 — Crawl-Hardening: TDM-Reservation-Check (robots.txt / ai.txt / Header /
Meta) vor jedem Run mit 24h-Cache; HeadlessChrome-UA (Firma noch nicht
gegruendet — Switch via BREAKPILOT_BRANDED_UA env); per-Domain
Rate-Limit 1 req/s + max 2 concurrent
P2 — Cookie-Knowledge-DB additiv erweitert (35 -> 74 Cookies): Adobe, Meta,
Microsoft, LinkedIn, TikTok, HubSpot, Marketo, Salesforce, Hotjar,
FullStory, Mouseflow, Intercom, Drift, Zendesk, Cloudflare, Stripe,
OneTrust/Cookiebot/Usercentrics, Matomo, Pinterest, Snapchat, X/Twitter,
YouTube, Vimeo, Klaviyo, Mailchimp, Mixpanel, Segment, Amplitude,
Optimizely, Datadog; Wire-in in cookie_function_classifier liefert
compliance_risk-Label (kritisch/hoch/mittel/gering) pro Vendor
A — k-Anonymitaets-Helper (benchmark_k_anonymity) fuer P6-Vorbereitung
B — Cross-Tenant-Domain-Assertion im /findings-Endpoint (expected_domain
Query-Param -> 403 bei Mismatch)
C — Saving-Scan-Funnel: /api/compliance/agent/saving-scan/start mit
Validierung + 24h-Rate-Limit pro Domain + Lead-Persistenz in
saving_scan_leads + Auto-Discovery via _run_compliance_check; 6 Tests
D — Risk-Badge im Email-Vendor-Row
Rechtliche Leitplanken (Memory feedback_oem_data_legal.md): nur eigene
Knapp-Bewertungen + Source-Pointer, keine 1:1-Kopien fremder CMP-Texte.
TDM-Opt-Out-Respect nach § 44b UrhG. KEINE Schema-Aenderungen — alles in
Sidecar-SQLite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -28,6 +28,12 @@ class BusinessProfile:
|
||||
needs_odr: bool = False # Online-Streitbeilegung
|
||||
detected_services: list[str] = field(default_factory=list)
|
||||
confidence: float = 0.0
|
||||
# Wenn True: die Site selbst schliesst KEINEN Direktkauf-Vertrag
|
||||
# (typisch OEM-Konfigurator-Sites BMW/Audi/Mercedes — Vertrag laeuft
|
||||
# ueber den Vertragshaendler, nicht die Hersteller-Webseite).
|
||||
# Konsequenz: AGB/Widerruf/Nutzungsbedingungen sind NICHT PFLICHT
|
||||
# auf der Website, sondern werden beim Haendler ausgehaendigt.
|
||||
no_direct_sales: bool = False
|
||||
|
||||
|
||||
# ── Keyword lists ────────────────────────────────────────────────────
|
||||
@@ -319,4 +325,49 @@ async def detect_business_profile(documents: dict[str, str]) -> BusinessProfile:
|
||||
"steuerberater": "finance", "architekt": "craft"}
|
||||
profile.industry = prof_map.get(profile.regulated_profession_type, "unknown")
|
||||
|
||||
# ── no_direct_sales (OEM-Konfigurator-Pattern) ───────────────
|
||||
# Hersteller-Sites die nur konfigurieren + zu Vertragshaendlern
|
||||
# weiterleiten (BMW/Audi/Mercedes/VW/Porsche) schliessen KEINEN
|
||||
# Direkt-Kaufvertrag. AGB/Widerruf/Nutzungsbedingungen sind dort
|
||||
# nicht Pflicht — werden beim Haendler ausgehaendigt.
|
||||
profile.no_direct_sales = _detect_no_direct_sales(full_text)
|
||||
|
||||
return profile
|
||||
|
||||
|
||||
# Indikatoren: Site verweist primaer auf Vertragshaendler/Niederlassungen
|
||||
# statt einen eigenen Checkout-Vertragsabschluss zu bieten.
|
||||
_NO_DIRECT_SALES_POSITIVE = [
|
||||
"vertragshaendler", "vertragshändler", "vertragspartner",
|
||||
"vertragswerkstatt", "haendlersuche", "händlersuche",
|
||||
"niederlassung", "vertretung", "autorisierter haendler",
|
||||
"autorisierter händler", "ihr haendler vor ort",
|
||||
"ihr händler vor ort", "haendler in ihrer naehe",
|
||||
"händler in ihrer nähe", "probefahrt vereinbaren",
|
||||
"anfrage an haendler", "anfrage an händler",
|
||||
"konfigurator", "fahrzeug konfigurieren",
|
||||
"ihre individuelle anfrage",
|
||||
# OEM-Markennamen — sind Hersteller-Marken die ueblicherweise via
|
||||
# Haendler vertreiben.
|
||||
"bmw vertriebs", "audi vertriebs", "mercedes-benz vertriebs",
|
||||
"volkswagen vertriebs", "porsche zentrum",
|
||||
]
|
||||
|
||||
# Indikatoren GEGEN no_direct_sales: echte Online-Shop-Funktionen.
|
||||
_DIRECT_SALES_NEGATIVE = [
|
||||
"in den warenkorb", "warenkorb hinzu", "zur kasse",
|
||||
"jetzt kaufen", "kostenpflichtig bestellen",
|
||||
"zahlungspflichtig bestellen", "sofort-kauf",
|
||||
"online bestellen", "lieferadresse", "rechnungsadresse",
|
||||
]
|
||||
|
||||
|
||||
def _detect_no_direct_sales(full_text: str) -> bool:
|
||||
"""Heuristik: erkennt OEM-Konfigurator-Sites die nicht direkt verkaufen."""
|
||||
text = full_text.lower()
|
||||
pos = sum(1 for k in _NO_DIRECT_SALES_POSITIVE if k in text)
|
||||
neg = sum(1 for k in _DIRECT_SALES_NEGATIVE if k in text)
|
||||
# Mindestens 3 Haendler-Indikatoren UND weniger Shop-Indikatoren als
|
||||
# Haendler-Indikatoren. Vermeidet false-positive fuer Shops die
|
||||
# zusaetzlich "Haendlersuche" als Filiale-Finder anbieten.
|
||||
return pos >= 3 and pos > neg
|
||||
|
||||
Reference in New Issue
Block a user