feat(agent): progress_pct + 6 BMW-Run Verbesserungen

Backend (agent_compliance_check_routes.py):
- progress_pct (0-100%) im Job-State, ueber alle Phasen verteilt
  (Laden 0-30, Profil 35-40, Pruefen 40-80, Banner 80-92, Report 95-100)
- Status-Texte vereinheitlicht ("Texte laden X/N", "Pruefen X/N")
- Firmenname fuer Email-Subject jetzt aus URL abgeleitet
  (bmw.de -> "BMW", mercedes-benz.de -> "Mercedes-Benz") statt
  unzuverlaessigem extracted_profile.companyName (matchte oft juris.de)
- E-Mail-Report enthaelt jetzt Banner+TCF-Vendor-Liste (build_provider_list_html)

Backend (agent_doc_check_extras.py — neu):
- build_scanned_urls_html: gepruefte URLs als Tabelle oben im Report
  (transparent fuer GF, welche Quellen wirklich gezogen wurden)
- Cross-Domain-Hinweis bei >1 netloc (BMW: bmw.de / bmwgroup.com /
  bmwgroup.jobs — Auffindbarkeit nach Art. 12 DSGVO)
- build_provider_list_html: Banner-Box + TCF-Vendor-Tabelle mit Spalten
  Name | Kategorie | Zweck | Drittland | Rechtsgrundlage

Backend (business_profiler.py):
- §34d-GewO Versicherungsvermittler-Hinweise zaehlen nicht mehr als
  "finance"-Industrie (BMW wurde dadurch falsch als B2B/finance erkannt)
- Neue Industry "automotive" (Fahrzeug/KFZ/Konfigurator/Modellpalette)
- B2B-Keywords: generische Begriffe wie "unternehmen", "beratung",
  "consulting" entfernt (matchten in jedem Konzerntext)
- B2C-Fallback: bei Verbraucher-Signalen ("widerruf", "kunde",
  redaktioneller Inhalt) tendiert auf b2c statt b2b

Frontend (ComplianceCheckTab.tsx):
- Progress-Balken mit Width-% und XX%-Anzeige rechts
- liest data.progress_pct aus Polling-Response

Consent-Tester (dsi_discovery.py):
- Cookie-Policy-Extraktion kritisch fixt: wait_for_function bis
  body.innerText > 500 chars (BMW SPA-Rendering brauchte mehr Zeit)
- _extract_text_robust: 3-Strategien-Extraktion (Selektoren -> Body-
  Cleanup -> P/LI/TD-Tags)
- _extract_text_from_iframes: liest OneTrust/Sourcepoint/Usercentrics
  Iframe-Inhalte (manche Cookie-Policies leben dort)

Adressiert alle Findings aus dem BMW-Ground-Truth-Vergleich.
This commit is contained in:
Benjamin Admin
2026-05-16 17:53:14 +02:00
parent 4d1e0a7f8e
commit e61e9d9e2a
6 changed files with 515 additions and 53 deletions
@@ -73,6 +73,7 @@ export function ComplianceCheckTab() {
const [useAgent, setUseAgent] = useState(false)
const [loading, setLoading] = useState(false)
const [progress, setProgress] = useState('')
const [progressPct, setProgressPct] = useState(0)
const [results, setResults] = useState<any>(() => {
if (typeof window === 'undefined') return null
try { const s = localStorage.getItem(STORAGE_KEY_RESULTS); return s ? JSON.parse(s) : null } catch { return null }
@@ -109,15 +110,16 @@ export function ComplianceCheckTab() {
if (!res.ok) continue
const data = await res.json()
if (data.progress) setProgress(data.progress)
if (typeof data.progress_pct === 'number') setProgressPct(data.progress_pct)
if (data.status === 'completed' && data.result) {
setResults(data.result); setProgress(''); setLoading(false)
setResults(data.result); setProgress(''); setProgressPct(0); setLoading(false)
localStorage.setItem(STORAGE_KEY_RESULTS, JSON.stringify(data.result))
localStorage.removeItem(STORAGE_KEY_CHECK_ID); setActiveCheckId('')
return
}
if (data.status === 'failed' || data.status === 'not_found') {
if (data.status === 'failed') setError(data.error || 'Pruefung fehlgeschlagen')
setProgress(''); setLoading(false)
setProgress(''); setProgressPct(0); setLoading(false)
localStorage.removeItem(STORAGE_KEY_CHECK_ID); setActiveCheckId('')
return
}
@@ -177,6 +179,7 @@ export function ComplianceCheckTab() {
setError(null)
setResults(null)
setProgress('Compliance-Check wird gestartet...')
setProgressPct(0)
try {
const entries = DOCUMENT_TYPES
@@ -210,9 +213,11 @@ export function ComplianceCheckTab() {
if (!pollRes.ok) { attempts++; continue }
const pollData = await pollRes.json()
if (pollData.progress) setProgress(pollData.progress)
if (typeof pollData.progress_pct === 'number') setProgressPct(pollData.progress_pct)
if (pollData.status === 'completed' && pollData.result) {
setResults(pollData.result)
setProgress('')
setProgressPct(0)
localStorage.setItem(STORAGE_KEY_RESULTS, JSON.stringify(pollData.result))
localStorage.removeItem(STORAGE_KEY_CHECK_ID); setActiveCheckId('')
@@ -242,6 +247,7 @@ export function ComplianceCheckTab() {
} catch (e) {
setError(e instanceof Error ? e.message : 'Unbekannter Fehler')
setProgress('')
setProgressPct(0)
} finally {
setLoading(false)
}
@@ -334,12 +340,21 @@ export function ComplianceCheckTab() {
{/* Progress */}
{progress && (
<div className="bg-purple-50 border border-purple-200 rounded-lg p-3 text-sm text-purple-700 flex items-center gap-3">
<svg className="animate-spin w-4 h-4 text-purple-500 shrink-0" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z" />
</svg>
{progress}
<div className="bg-purple-50 border border-purple-200 rounded-lg p-3 text-sm text-purple-700 space-y-2">
<div className="flex items-center gap-3">
<svg className="animate-spin w-4 h-4 text-purple-500 shrink-0" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z" />
</svg>
<span className="flex-1">{progress}</span>
<span className="text-xs font-mono text-purple-600 tabular-nums">{progressPct}%</span>
</div>
<div className="h-1.5 bg-purple-100 rounded-full overflow-hidden">
<div
className="h-full bg-purple-500 rounded-full transition-all duration-500 ease-out"
style={{ width: `${Math.max(2, progressPct)}%` }}
/>
</div>
</div>
)}