Commit Graph

8 Commits

Author SHA1 Message Date
Benjamin Admin e61e9d9e2a feat(agent): progress_pct + 6 BMW-Run Verbesserungen
Backend (agent_compliance_check_routes.py):
- progress_pct (0-100%) im Job-State, ueber alle Phasen verteilt
  (Laden 0-30, Profil 35-40, Pruefen 40-80, Banner 80-92, Report 95-100)
- Status-Texte vereinheitlicht ("Texte laden X/N", "Pruefen X/N")
- Firmenname fuer Email-Subject jetzt aus URL abgeleitet
  (bmw.de -> "BMW", mercedes-benz.de -> "Mercedes-Benz") statt
  unzuverlaessigem extracted_profile.companyName (matchte oft juris.de)
- E-Mail-Report enthaelt jetzt Banner+TCF-Vendor-Liste (build_provider_list_html)

Backend (agent_doc_check_extras.py — neu):
- build_scanned_urls_html: gepruefte URLs als Tabelle oben im Report
  (transparent fuer GF, welche Quellen wirklich gezogen wurden)
- Cross-Domain-Hinweis bei >1 netloc (BMW: bmw.de / bmwgroup.com /
  bmwgroup.jobs — Auffindbarkeit nach Art. 12 DSGVO)
- build_provider_list_html: Banner-Box + TCF-Vendor-Tabelle mit Spalten
  Name | Kategorie | Zweck | Drittland | Rechtsgrundlage

Backend (business_profiler.py):
- §34d-GewO Versicherungsvermittler-Hinweise zaehlen nicht mehr als
  "finance"-Industrie (BMW wurde dadurch falsch als B2B/finance erkannt)
- Neue Industry "automotive" (Fahrzeug/KFZ/Konfigurator/Modellpalette)
- B2B-Keywords: generische Begriffe wie "unternehmen", "beratung",
  "consulting" entfernt (matchten in jedem Konzerntext)
- B2C-Fallback: bei Verbraucher-Signalen ("widerruf", "kunde",
  redaktioneller Inhalt) tendiert auf b2c statt b2b

Frontend (ComplianceCheckTab.tsx):
- Progress-Balken mit Width-% und XX%-Anzeige rechts
- liest data.progress_pct aus Polling-Response

Consent-Tester (dsi_discovery.py):
- Cookie-Policy-Extraktion kritisch fixt: wait_for_function bis
  body.innerText > 500 chars (BMW SPA-Rendering brauchte mehr Zeit)
- _extract_text_robust: 3-Strategien-Extraktion (Selektoren -> Body-
  Cleanup -> P/LI/TD-Tags)
- _extract_text_from_iframes: liest OneTrust/Sourcepoint/Usercentrics
  Iframe-Inhalte (manche Cookie-Policies leben dort)

Adressiert alle Findings aus dem BMW-Ground-Truth-Vergleich.
2026-05-16 17:53:14 +02:00
Benjamin Admin d45e08e25f fix: reduce Playwright timeout 180s→60s, increase poll limit 15→25min 2026-05-16 00:47:28 +02:00
Benjamin Admin 08fcb5f239 feat(compliance-check): scenario badges + extracted profile display
Build + Deploy / build-admin-compliance (push) Successful in 1m58s
Build + Deploy / build-backend-compliance (push) Successful in 13s
Build + Deploy / build-ai-sdk (push) Successful in 49s
Build + Deploy / build-developer-portal (push) Successful in 14s
Build + Deploy / build-tts (push) Successful in 15s
Build + Deploy / build-document-crawler (push) Successful in 12s
Build + Deploy / build-dsms-gateway (push) Successful in 11s
Build + Deploy / build-dsms-node (push) Successful in 13s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 15s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m40s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 43s
CI / test-python-backend (push) Successful in 39s
CI / test-python-document-crawler (push) Successful in 27s
CI / test-python-dsms-gateway (push) Successful in 24s
CI / validate-canonical-controls (push) Successful in 14s
Build + Deploy / trigger-orca (push) Successful in 2m34s
- Show extracted profile fields (company name, legal form, address,
  DPO, USt-IdNr) with "In Company Profile uebernehmen" button
- Show Compliance Scope hints extracted from documents
- Scenario badges per document: Neugenerierung (red), Korrekturen
  (amber), Konform (green)
- Summary line shows scenario counts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-12 17:49:45 +02:00
Benjamin Admin ce77cde309 fix(compliance-check): batch LLM verification + increase poll timeout
Build + Deploy / build-admin-compliance (push) Successful in 1m52s
Build + Deploy / build-backend-compliance (push) Successful in 18s
Build + Deploy / build-ai-sdk (push) Successful in 11s
Build + Deploy / build-developer-portal (push) Successful in 11s
Build + Deploy / build-tts (push) Successful in 12s
Build + Deploy / build-document-crawler (push) Successful in 14s
Build + Deploy / build-dsms-gateway (push) Successful in 10s
Build + Deploy / build-dsms-node (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 15s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m35s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 42s
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Successful in 25s
CI / test-python-dsms-gateway (push) Successful in 21s
CI / validate-canonical-controls (push) Successful in 16s
Build + Deploy / trigger-orca (push) Successful in 2m24s
- LLM verify now sends ALL failed checks in one batched call instead of
  one Ollama call per check (80+ calls → 1 per document)
- Increase frontend poll timeout from 6 min to 15 min

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-12 11:49:30 +02:00
Benjamin Admin a127dd971b fix(compliance-check): resume polling after navigation away
Build + Deploy / build-admin-compliance (push) Successful in 2m16s
Build + Deploy / build-backend-compliance (push) Successful in 12s
Build + Deploy / build-ai-sdk (push) Successful in 12s
Build + Deploy / build-developer-portal (push) Successful in 12s
Build + Deploy / build-tts (push) Successful in 15s
Build + Deploy / build-document-crawler (push) Successful in 13s
Build + Deploy / build-dsms-gateway (push) Successful in 13s
Build + Deploy / build-dsms-node (push) Successful in 16s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 18s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m38s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 42s
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Successful in 27s
CI / test-python-dsms-gateway (push) Successful in 21s
CI / validate-canonical-controls (push) Successful in 13s
Build + Deploy / trigger-orca (push) Successful in 2m32s
Save active check_id to localStorage so polling resumes when the user
navigates away via sidebar and comes back. Same pattern as scan tab.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-12 11:37:06 +02:00
Benjamin Admin ed3ebbc246 fix(compliance-check): send 'documents' instead of 'entries' to backend
Build + Deploy / build-admin-compliance (push) Successful in 11s
Build + Deploy / build-backend-compliance (push) Successful in 13s
Build + Deploy / build-ai-sdk (push) Successful in 13s
Build + Deploy / build-developer-portal (push) Successful in 10s
Build + Deploy / build-tts (push) Successful in 11s
Build + Deploy / build-document-crawler (push) Successful in 11s
Build + Deploy / build-dsms-gateway (push) Successful in 12s
Build + Deploy / build-dsms-node (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 15s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m33s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 39s
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Successful in 26s
CI / test-python-dsms-gateway (push) Successful in 21s
CI / validate-canonical-controls (push) Successful in 14s
Build + Deploy / trigger-orca (push) Successful in 2m30s
Frontend was sending field name 'entries' but backend Pydantic model
expects 'documents', causing 422 validation error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-12 09:25:36 +02:00
Benjamin Admin f3751a4efa feat(compliance-check): show business profile + banner check result in UI
Build + Deploy / build-admin-compliance (push) Successful in 1m55s
Build + Deploy / build-backend-compliance (push) Successful in 3m17s
Build + Deploy / build-ai-sdk (push) Successful in 49s
Build + Deploy / build-developer-portal (push) Successful in 1m17s
Build + Deploy / build-tts (push) Successful in 1m33s
Build + Deploy / build-document-crawler (push) Successful in 41s
Build + Deploy / build-dsms-gateway (push) Successful in 28s
Build + Deploy / build-dsms-node (push) Successful in 17s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m35s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 47s
CI / test-python-backend (push) Successful in 38s
CI / test-python-document-crawler (push) Successful in 25s
CI / test-python-dsms-gateway (push) Successful in 24s
CI / validate-canonical-controls (push) Successful in 13s
Build + Deploy / trigger-orca (push) Successful in 2m58s
Add two info boxes above the checklist results:
- Business profile (B2B/B2C, industry, regulated profession)
- Banner check status (CMP detected, violations count, cross-check hint)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-12 00:19:51 +02:00
Benjamin Admin 0d0e705117 feat: Unified Compliance-Check — 8 document types in one form
New 3-tab structure: Website-Scan, Compliance-Check, Banner-Check.

Compliance-Check Tab (replaces Dokumenten-Pruefung + Impressum-Check):
- 8 document rows: DSI, Impressum, Social Media, Cookie, AGB,
  Nutzungsbedingungen, Widerruf, DSB-Kontakt
- Each row: URL input + "Text laden" + file upload + manual text
- "Text laden" extracts via consent-tester, shows in editable textarea
- User verifies/corrects text before checking
- Empty fields = "not present" → own finding

Business Profiler (business_profiler.py):
- Detects B2B/B2C/B2G from all documents together
- Recognizes regulated professions, online shops, editorial content
- Context-aware: INFO checks become PASS/FAIL based on profile

Backend: /compliance-check + /extract-text endpoints
Frontend: ComplianceCheckTab.tsx + DocumentRow.tsx
API proxies: compliance-check/route.ts + extract-text/route.ts

Also: Impressum regex fixes (Telefon, AG, Geschaeftsfuehrung)
and INFO severity for context-dependent checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-11 20:56:10 +02:00