Commit Graph

434 Commits

Author SHA1 Message Date
Benjamin Admin bb183b0e75 feat(template-rules): backend foundation for profile-based document recommendations
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / test-python-backend (push) Successful in 33s
CI / test-python-document-crawler (push) Successful in 23s
CI / test-python-dsms-gateway (push) Successful in 19s
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 7s
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m27s
CI / test-go (push) Failing after 46s
CI / iace-gt-coverage (push) Successful in 25s
Introduces the sustainable backend replacement for the hardcoded inline rules in
admin-compliance/app/sdk/document-generator/templateRecommendations.ts.

What's in this commit (Phase 1.1 - 1.5 of the rustling-yawning-boot plan):

- Migration 147: 4 new tables
  - compliance_template_rules (rule shell, document_type, current_version_id)
  - compliance_template_rule_versions (lifecycle, JSONB conditions,
    source_citation, change_summary, approval timestamps)
  - compliance_template_rule_approvals (audit trail)
  - compliance_tenant_rule_overrides (per-tenant classification overrides)
  Plus partial unique index for "only one is_live=1 version per rule".

- SQLAlchemy models: TemplateRuleDB, TemplateRuleVersionDB,
  TemplateRuleApprovalDB, TenantRuleOverrideDB (compliance/db/).

- Pydantic schemas (compliance/schemas/template_rule.py): full request/response
  set including RecommendationRequest/Result with reasons and override tracking.

- TemplateRuleService (compliance/services/): CRUD + Lifecycle transitions
  (submit_for_review/approve/publish/reject) following legal_document_service.py
  pattern with _transition() helper and approval audit trail. Plus tenant
  override upsert.

- RecommendationService: condition evaluator (eq, neq, in, not_in, gte/lte/gt/lt,
  exists, truthy) over JSONB conditions, override application, reason generation
  for human-readable explanations in workspace UI.

- 18 FastAPI routes in compliance/api/template_rule_routes.py covering rule CRUD,
  version lifecycle, override management and POST /recommend evaluation endpoint.

- Seed data: 33 initial rules ported from templateRecommendations.ts in
  compliance/data/template_rule_seed_data.py, written as published versions
  on first seed run. Idempotent via rule_key.

Phase 1.6 (pytest suite) and Phase 2 (editorial UI in admin-compliance) follow
in separate commits.

[migration-approved]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 23:13:50 +02:00
Benjamin Admin 37093ff9e3 feat: Browser-Matrix C2 + B11 AI-Retention + Impressum-Specialist-Agent + B1 Mobile Playwright
Task #15 Stage 1.c-e — Browser-Matrix Backend-Integration:
  - _phase_c2_browser_matrix.py: ruft consent-tester /scan-matrix wenn
    env BROWSER_MATRIX=true, fuellt state["browser_matrix"] +
    state["browser_aggregate"] + state["browser_matrix_html"]
  - V2-Mail-Block: 🌐 Browser-Matrix Tabelle (Profile · Score ·
    Sub-Scores PC/RR/BD · Bewertung) mit Worst-of-Header
  - Orchestrator ruft run_phase_c2 nach run_phase_c
  KNOWN: Stage 1.b (consent_scanner browser_profile-Param) bleibt
    zurueckgestellt (Datei in loc-exception, Hook-Patch verweigert).
    Stage 1.a-Shim laeuft im consent-tester — alle Profile aktuell
    auf Chromium, echte Engine-Diversitaet kommt mit 1.b.

Task #17 TH-RETENTION-002 als B11 ai_retention_granularity_check:
  - Erkennt AI-Provider-Kontext (vertex/openai/anthropic/etc)
  - In +-800-char-Window: prueft ≥2 Datenkategorien aus Standard-Liste
    (Texteingaben/IP/Geraet/Session/Fehlerprotokoll/Zeitstempel)
  - Wenn 1 pauschale Speicherdauer + ≥2 Kategorien aber kein
    per-Kategorie-Differential → LOW
  - Smoke: Elli-Mock-DSE trifft LOW "AI-Speicherdauer pauschal"

Task #18 Specialist-Agents Phase-1-Prototyp:
  - compliance/services/specialist_agents/__init__.py mit Architektur-Doku
  - impressum_agent.py: 9 Pflichtangaben § 5 TMG + § 1 DL-InfoV
    als Pattern-Registry (Name, Email, Telefon, HR, USt-IdNr,
    Vertretungsberechtigt, Aufsichtsbehoerde, Berufsangaben, OS-Link)
  - business_scope-aware (OS-Link nur fuer ecommerce, Aufsichtsbehoerde
    nur fuer regulated_profession/financial/insurance)
  - Phase-1 ist Pattern-Match-only (kein LLM), demonstriert die
    Schnittstelle. Phase 2 ersetzt Pattern durch System-Prompt + KB.
  - Smoke: minimal-Impressum triggert 4 Findings korrekt

Task #7 B1 Playwright Mobile-Verifikation:
  - consent-tester/services/mobile_reachability_scanner.py: echte
    WebKit-launch + p.devices['iPhone 15'] preset + de-DE locale +
    Europe/Berlin timezone
  - Footer-Anchor-Suche via locator("footer >> text=/.../i") fuer
    13 Reopen-Phrasen
  - Tap-Target-Boundingbox-Messung (Apple HIG / WCAG ≥44x44)
  - Click-Behavior: DOM-Modal-Snapshot vor/nach, erkennt CMP-Open
  - Output: has_anchor, anchor_text, tap_target_px, click_opens_cmp,
    engine_meta, screenshot_b64 (Footer-Crop wenn kein Anchor)
  - consent-tester/routes_mobile.py POST /scan-mobile-reachability
  - Backend _b1_wiring erweitert: ruft Mobile-Endpoint zuerst,
    Fallback auf statischen HTTP-Fetch. Mobile-Daten enrichen
    finding.mobile_playwright + Severity-Bump bei
    tap-target<44 / click-doesnt-open-CMP.
  KNOWN: WebKit-System-Libs sind im Dockerfile ergaenzt (Stage 1.a-
    Commit), greifen aber erst nach CI/CD-Rebuild des consent-tester.
    Bis dahin faellt B1 sauber auf statischen Fetch zurueck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 22:20:25 +02:00
Benjamin Admin e1dadc8027 feat: Browser-Matrix Stufe 1.a + 2 weitere GT-Findings + Plausibility-LLM-Härtung
Stage 1.a Browser-Matrix (Task #15) — Multi-Engine Scaffolding:
  - consent-tester/Dockerfile: firefox + webkit + Xvfb deps
  - playwright install chromium firefox webkit
  - services/browser_profiles.py: Registry mit DEFAULT_PROFILES
    (Chromium-Headed/Firefox-Headed/WebKit-Headed/Mobile-Safari) +
    EXTRA_PROFILES (Chrome-Channel, Edge, Brave)
  - services/multi_browser_scanner.py: run_matrix() orchestriert N
    parallele Scans + worst-of-Aggregation + 3 Sub-Scores
    (Pre-Consent 50%, Reject-Respekt 30%, Banner-Design 20%) +
    Hard-Fail-Cap auf <60% bei Pre-Consent/Reject-Verstoß
  - routes_matrix.py: POST /scan-matrix Endpoint (eigenes Modul,
    damit main.py unter 500 LOC bleibt)
  KNOWN: Stage 1.a-Shim ruft alle Profile auf demselben Chromium,
    echte Engine-Diversität in Stage 1.b (consent_scanner.py Param)

Coverage-Gap 3 (Task #17): 2/3 verbleibende GT-Lücken geschlossen:
  - B9 impressum_multi_entity_check (IMPRESSUM-001): erkennt
    USt-IdNr/HR/GF-Fehlen pro Entity bei multi-entity Impressen
    (Elli: USt-IdNr nur bei Elli Mobility, fehlt bei VW Group Charging)
  - B10 transfer_mechanism_check (TRANSFER-001): pro Non-EU-Vendor
    in cmp_vendors prüft DSE auf DPF/SCCs/BCRs/Einwilligung im
    ±400-char-Window. Findet Vendors ohne benannten Mechanismus.
  - TH-RETENTION-002 (AI-Datenkategorie-Differenzierung) bleibt
    semantisch-tief, vorgesehen für Specialist-Agents Task #18.

Plausibility-LLM Empty-Response-Härtung (Task #16):
  - BATCH_SIZE 8 → 4, EXCERPT 4000 → 1500 chars, TIMEOUT 60 → 45s
  - Single-retry mit halbierter Batch wenn LLM empty content
    zurückgibt — qwen3:30b-a3b rejektiert manchmal ≥6-Item-Prompts
    unter format='json'. Falls auch Half-Batch empty: log + skip.
  - Pipeline läuft jetzt nicht mehr 10min in Timeouts.

GT-Coverage Sprung: 10/13 → 11/13 (85%). 4/4 HIGH ✓, 5/6 MEDIUM ✓,
2/3 LOW ✓.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 21:42:27 +02:00
Benjamin Admin d0e3621192 feat(audit): V2 mail render + 5 new findings (B4/B5/B6/B7/B8) + LLM-Plausibility-Phase
Mail Render V2 (compliance/services/mail_render_v2/) — 11-Modul-Subpackage
das einen einheitlichen Audit-Mail-Output erzeugt mit:
  - Header + KPI-Kacheln (Score / Findings / Docs / Vendors)
  - TOC + Sprung-Links
  - 3-Bucket-Trennung: Kritische Befunde / Manuelle Prüfung / Interne Reminder
  - Cookie-Inventar (Name·Vendor·Kategorie·Speicherdauer·Löschfrist·Sitzland·Quelle·Status)
  - Sofortmaßnahmen-Aggregator ("Sitzland ergänzen für 11 Cookies")
  - 24 Legacy-Wrappers — alle alten build_*_html in V2-Sections
  - Scope-Filter: FIN/GOV/MED/INS/EDU/LEG aus Berichten wenn nicht relevant
  - Hint/Action-Dedup: keine doppelten Sätze pro Card mehr
Aktiviert via env MAIL_RENDER_V2=true (Default: legacy renderer).

5 neue deterministische Findings als Phase D-2b/B4/B5/B6/B7/B8:

  B4 vendor_consistency_check — Cross-Doc-Provider-Widerspruch
     (Elli: DSE nennt Vertex AI für Chatbot, /de/cookies nennt Iadvize → HIGH).
     6 Service-Types: chatbot/analytics/tag_manager/pixel/cdn/cmp.

  B5 ai_act_transparency_check — AI Act Art. 50 Transparenzpflicht
     (Elli: Vertex AI vorhanden ohne Pre-Chat-Disclosure → HIGH).
     Plus B5-Erweiterung: Rechtsgrundlage Art-6-Abs-1-lit-f bei AI → MED
     (Einwilligung empfehlen).

  B6 cross_doc_dpo_check — DPO in DSE genannt, nicht im Impressum (LOW).

  B7 doc_staleness_check — Datum-Extraktion aus DSE/AGB/Nutzungsbedingungen.
     Cap: AGB/NB 3y, DSE 2y. Älter → MEDIUM (Elli NB Stand 2018 → HIGH).

  B8 cmp_fingerprint_check — Banner detected, aber CMP-Provider generic
     (kein Usercentrics/OneTrust/Cookiebot/etc → MED).

  B3-Erweiterung detect_intra_doc_contradictions — Widersprüchliche
     Speicherdauer im SELBEN Doc (Elli: Logfile 7d vs 30d → HIGH).

LLM-Plausibility-Phase (Phase D-2b, finding_plausibility_check.py):
  - Läuft AFTER MC pipeline, BEFORE D3 render
  - Prompt mit Beispiel-IDs + 3-Phase-Mapping: exact-ID / position-fallback /
    fuzzy-tail-match
  - Stempelt llm_title / llm_severity / llm_recommendation / llm_drop auf
    jeden FAIL CheckItem
  - V2-Render zeigt "🤖 LLM-Plausibility:" Box pro Finding wenn gestempelt
  - KNOWN ISSUE: qwen3:30b-a3b liefert oft empty content auf format='json' +
    8000-char-excerpt prompts. Pipeline läuft mit stamped=0 weiter. Task #16.

Coverage gegen Elli Ground Truth (zeroclaw/docs/ground-truth/elli_eco_2026-06-06.json,
13 expected findings via WebFetch-Agent-Crawl):
  - 4/4 HIGH-Findings ✓ (COOKIE-CONSENT-UX-001 + WIDERRUFSBELEHRUNG-001 +
    VENDOR-CONSISTENCY-001 + AI-ACT-TRANSPARENCY-001)
  - 4/6 MEDIUM ✓
  - 2/3 LOW ✓
  - Total: 10/13 = 77% (Sprung von 4/13 = 31%)

Restliche 3 Gaps als Task #17: IMPRESSUM-001 (multi-entity USt-IdNr),
TRANSFER-001 (Vendor-Mechanismus DPF/SCC), TH-RETENTION-002 (AI-Retention
pro Datenkategorie).

V2-Mail-Preview in Mailpit: 'v2all@local.test' Subject '[V2 ALL] ELLI'.
Backend healthy, B1+B3+B4+B5+B6+B7+B8 alle live im Orchestrator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 21:19:49 +02:00
Benjamin Admin c2c8783fee refactor(agent-check): split routes file (2692→347 LOC) + wire B1/B3/A1 [guardrail-change]
Phase-5 split of agent_compliance_check_routes.py — the 2700-line
monolith was decomposed into 19 modules in compliance/api/agent_check/:

  - Phase A-F: resolve / profile+check / banner+TCF / vendors raw+finalize /
    HTML blocks top+mid+bot / email / persist
  - Helpers: _constants, _helpers, _fetch, _discovery, _single_check
  - Schemas + State + thin _orchestrator

A1 ZIP-Anhang nativ in _phase_e_email: evidence_zip_builder.py bundles
slices + manifest.json + audit_metadata.json (SHA256 per slice +
build_sha + source_url). smtp_sender.py erweitert um attachments-Parameter.

B1 COOKIE-CONSENT-UX-001 (Mobile Reachability): consent_reachability_check.py
parses footer anchors, classifies intent (reopen_cmp / info_only /
browser_deflect) + target (same_page_cmp / new_tab / external).
_b1_wiring.py fetches homepage with iPhone-UA + renders Art-7-Abs-3
severity-coloured block.

B3 TH-RETENTION (Cross-Doc Speicherdauer): retention_comparator.py
compares DSI claim ↔ cookie-table duration ↔ actual Max-Age/expires
with 5% tolerance + severity hierarchy (dsi_under_actual HIGH,
table_under_actual HIGH, dsi_vs_table MEDIUM, actual_under_table LOW
Safari-ITP-Hint). _b3_wiring.py + Top-10 mismatches table in mail.

Side-effects:
- Fixed silent UnboundLocalError in original Step 5 (gf_one_pager used
  audit_quality_findings before declaration, caught by surrounding
  except → block never rendered). New _phase_d3_blocks_bot.py runs
  audit-quality FIRST.
- agent_compliance_check_routes.py removed from loc-exceptions.txt
  ("Phase 5 split target" — done).

Tests: 55/55 grün (B1 22 + B3 27 + saving_scan 6).
E2E: smoke against Elli DSE+Cookie produced HIGH/missing B1 finding,
TH-RETENTION table (17 cookies / 3 ✓ / 3 ✗ / 11 ?), evidence-zip
with 2 slices + manifest + audit_metadata (12089B, SHA256-chained,
source verified), email sent (attachments=1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 14:47:25 +02:00
Benjamin Admin d2f26e70c6 perf(audit): parallel Tesseract OCR + Pipeline-Wire-In für Slicing
ocr_slices_extract_cookies nutzt jetzt ThreadPoolExecutor (4 workers).
Tesseract released die GIL, daher echtes parallelisieren möglich.
Sequenziell 32 slices ≈ 60s, parallel ~15s.

Pipeline in agent_compliance_check_routes.py: Step C ruft jetzt
capture_cookie_evidence_slices + ocr_slices_extract_cookies. Source
'tesseract_ocr' wird zu existing Vendors gemergt; neue Vendors als
eigenständige Records.

Final VW-Scan-Resultat:
- Cookies: 60 (parse_flat) → 128 (mit Tesseract) = +113%
- Vendors: 18 unique
- Adobe Analytics: 9 → 33 Cookies (Tesseract fand +24)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 06:36:16 +02:00
Benjamin Admin efeef73f90 feat(audit): overlapping evidence-slices fuer lueckenlose Beweiskette
Statt EIN full-page screenshot: full-page wird per PIL in viewport-grosse
Slices geschnitten, jede ueberlappt die vorherige um overlap_px Pixel.
Jeder Cookie erscheint in mind. einer Slice, an Slice-Grenzen sogar in
zwei → Dedup nach Name eliminiert die Doppel.

Warum nicht direkt scroll-based slicing in Playwright? VW's
Cookie-Page nutzt scroll-snap / fixed-position — alle viewport-shots
kamen identisch zurueck (Header-Overlay). PIL-cut auf dem full-page
PNG bypasst das Problem voellig.

VW smoke-test (32 slices):
  per-slice: [0, 0, 2, 5, 5, 3, 4, 7, 4, 3, 4, 5, ...]
  103 raw cookies → 79 unique nach dedup
  14 vendor records (Google 9, Adobe-Familie 17, etc.)

Jeder Slice hat eigenen Timestamp + SHA256 → ZIP-Anhang fuer
juristische Beweiskette.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 23:38:13 +02:00
Benjamin Admin 1784b43d72 feat(audit): Screenshot+Tesseract-OCR Cookie-Extract als Vendor-Quelle C
Statt fragiler text-Regex + LLM-Cascade-Workarounds: deterministische
Pipeline. consent-tester macht Full-Page-Screenshot der Cookie-Richtlinie
(akzeptiert Banner, klappt Accordions, brennt Timestamp ein). Backend
laesst Tesseract OCR (deu, PSM 4) drueber + anchor-basierter Parser
extrahiert {name, category, purpose, duration, type} pro Cookie.

VW-Smoke-Test:
- Vorher (parse_flat): 60 cookies / 16 vendors
- Jetzt (Tesseract): 79 cookies / 14 vendor-records (~79% GT-coverage)

Architektur:
- consent-tester: page_screenshot.py + /capture-evidence Endpoint
- backend: cookie_screenshot_ocr.py mit Tesseract-pipeline
- pipeline: nach parse_flat als komplementaere Stufe C
- Dockerfile: tesseract-ocr + deutsches Sprachpaket
- requirements: pytesseract

KEINE Textkorrektur auf Cookie-Namen (awsalb bleibt awsalb).

Timestamp im Screenshot = juristischer Beweis was wir zum Scan-Zeitpunkt
wirklich auf der Site gesehen haben.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 23:22:35 +02:00
Benjamin Admin 6dad42a8c0 perf(llm): reduce vendor-extract excerpt 50k → 20k chars
VW-Loop-Iteration 1: LLM cascade lieferte 14 vendors (Lucky-Hit via
Direct-Fallback). VW-Loop-Iteration 2: 0 vendors — qwen2.5:14b
ReadTimeout auch im 420s-Direct-Fallback (50k input + 16k output
output dauert > 7min auf M4 Pro).

Fix: max_text_chars 50000 → 20000. Erfasst die ersten ~3000 Worte der
Cookie-Tabelle (Tabellen-Kopf komplett). Vollstaendige Tabelle wird
ohnehin deterministisch von parse_flat_cookie_text geparsed. LLM ist
nur fuer Vendor-Namen die NICHT in der Tabelle stehen (z.B. aus
Prosa) und Inferenz-faehiger.

Erwartung: 60-120s LLM-call statt Timeout, reproduzierbar 10-15 LLM-
Vendors → Vendor-Normalizer-Total bleibt stabil bei 20+ statt 17.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 21:55:23 +02:00
Benjamin Admin 10c73a1a33 fix(cookies): parse_flat_cookie_text whitespace-tolerant fuer HTTP-fetch
Bisheriges _FLAT_ROW_RE erwartete textContent-Output (Cookie-Tabelle
konkateniert ohne Whitespace zwischen Zellen). Bei VW lieferte das
deterministische 10 Vendors / 35 Cookies, aber nur weil der DSE-Text-
Fallback unvollstaendige Tabellen-Fragmente enthielt.

Beim echten cookie-richtlinie.html Fetch (8086 Worte HTML→text) sind
die Spalten durch Whitespace getrennt — und der Regex hat 0 gematcht.

Fix: \s* zwischen jedem Anker und dem Cookie-Namen erlaubt. Direct-Test
auf VW: 0 → 60 Cookies / 16 Vendors (Google 13, Adobe-Familie 16, Meta,
Salesforce, Cloudflare, Akamai etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 19:17:21 +02:00
Benjamin Admin 1ccfdb5d3d fix(scan): TCF SQL column + cascade diagnose-logs
VW-Scan-Befunde aus 0a8aa16e:
1. TCF lookup failed 5x mit: column 'source' does not exist. Korrekt:
   'source_name' (siehe DELETE-Query in derselben Datei). Mit dem Fix
   funktioniert das TCF-Cross-Reference fuer alle Vendors statt 0.
2. Cascade tier-1 fail loggte leere message — jetzt mit type+model+base.
3. Cascade collapse (tier 2+3 unconfigured) wird beim ersten Aufruf
   geloggt damit der Operator den ENV-Mangel sofort sieht.
4. vendor_llm_extractor loggt jetzt START + 0-vendor-Return (vorher
   silent skip — sah aus als waere er nie aufgerufen worden).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 19:00:27 +02:00
Benjamin Admin 46278cda5b Merge branch 'main' of http://100.80.114.48:3003/pilotadmin/breakpilot-compliance 2026-05-22 11:51:27 +02:00
Benjamin Admin 6baf44ac84 fix(mc-audit): TOM/AVV case-mismatch + Ausnahmen-Pattern Wortabstand
- _PROCESS_INTERNAL_PATTERNS: Patterns wurden gegen lowercased Blob
  geprueft, aber Case-sensitive geschrieben (TOM/AVV/SCC). Matchen
  nie. Auf lowercase normalisiert.
- "Ausnahmen ... dokumentieren": Pattern war zu eng, verlangte direkte
  Adjazenz. Jetzt bis zu 60 Zeichen Wortabstand.
- Test-Suite mit 22 kuratierten DSGVO/AI-Act/eCall-MC-Labels. Alle
  gruen (vorher 2/22 FAIL — beide vom User explizit als Beispiele
  genannt: TOM, Ausnahmen).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 11:51:03 +02:00
Benjamin Admin cf6005a47c perf(audit): vendor_llm_extractor + mc_solution_generator nutzen P31 LLM-Cascade
CI / guardrail-integrity (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Beide rufen jetzt llm_cascade.call_with_cascade() statt direkter Qwen/OVH-
Aufrufe. Damit:
* Cache-Hit auf identische Eingaben (Valkey, 7d TTL) → ~50ms statt
  4-6min beim Re-Run derselben Cookie-Doc.
* Tiered Cascade automatisch: Qwen → OVH 120B → Anthropic Claude Haiku
  wenn lower-tier under confidence-threshold.
* Confidence-Scoring (JSON-parse + items_per_input_size) entscheidet ob
  weiter delegiert wird.

Fallback auf alte _call_ollama/_call_ovh bleibt bestehen wenn der
Cascade-Aufruf scheitert.

Erwartete Wirkung beim 2. VW-Lauf: ~10min statt ~25min (Cache-Hit auf
identische Cookie-Doc + MC-Solutions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:40:11 +02:00
Benjamin Admin b663e2508f feat(audit): P107 Branchen-Benchmark-Cockpit fuer Big-4-Demos
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 17s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m5s
CI / test-go (push) Failing after 54s
CI / iace-gt-coverage (push) Successful in 27s
CI / test-python-backend (push) Successful in 47s
CI / detect-changes (push) Successful in 13s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
benchmark_extractor.py — extract_kpis() liefert 18 KPIs pro Snapshot:
* vendors_total, vendors_us, vendors_non_eu (mit % je Vendor-Land)
* source_breakdown (llm/library/flat_pattern/table_paste/html_table_dom)
* max/avg cookies_per_vendor (Konzentrations-Mass)
* cookies_in_browser, cookies_detailed_count, cookie_doc_chars
* banner_detected, banner_provider, banner_violations
* compliance_score, data_quality_pct (wie viele unserer Datenquellen
  haben Inhalt)
* saving_low/high_eur (Heuristik: (vendors - 10) × 1k-5k)

anonymize_kpis() ersetzt site_label durch 'OEM 1/2/3' (Industry-Prefix
Map: automotive→OEM, banking→Bank, chemistry→Chem, luftfahrt→Airline).

GET /api/compliance/agent/admin/benchmark?industry=automotive&sites=
VW,BMW,Mercedes&anonymized=true — liefert kpis + summary
(n_sites, avg_vendors, total_saving_high).

Admin-Page /sdk/benchmark:
* Filter-Leiste: Industry-Dropdown, Sites-Input + 5 Preset-Gruppen
  (Automotive OEMs / Zulieferer, Chemie DAX, Luftfahrt, Banking DAX)
* Anonymize-Toggle prominent
* 5 Summary-KPI-Karten oben
* Vergleichstabelle 13 Spalten (Score, Vendors, US%, Drittland%,
  Cookies-Browser, Cookie-Doc-kB, Banner ✓/✗, Provider, Verstoesse,
  Saving €/Jahr, Daten-Qualitaet, Captured-Time)
* Red-/Amber-/Green-Indikatoren bei US%/Score/Drittland
* Big-4-Hinweis-Footer

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:37 +02:00
Benjamin Admin e2be51b0aa feat(audit): P106 MC-Audit-Type + P83 BUILD_SHA in Dockerfiles + P80 v2 full
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m42s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P106 — mc_audit_type.py: zentrales Quality-Thema.
Klassifiziert pro MC: verifiable / process_internal / doc_internal /
ambiguous. Pattern-Match auf check_question + title + fail_criteria
(Schulung, AVV abgeschlossen, TOM umgesetzt, DSFA durchgefuehrt,
Ausnahmen dokumentieren, kostenfrei zur Verfuegung, opt-out
intern ermoeglichen, …).

Interne MCs werden in der MC-Auswertung NICHT mehr als FAIL gewertet,
sondern als CHECK markiert (audit_status='check'). Sie zaehlen im
build_scorecard als skipped (nicht failed) damit der Score realistisch
ist. build_internal_checks_block_html() rendert sie als separaten
blauen Block 'Pruefungen die wir von aussen NICHT durchfuehren koennen'
nach dem MC-Scorecard.

Erwartete Wirkung: bei VW 95 FAILs → wahrscheinlich 30-40 echte
verifiable_fails + 50-60 internal_checks. GF-Mail wird drastisch
realistischer (statt 'Sie haben 95 Verstoesse' → 'Sie haben 35
extern sichtbare Themen + 60 interne Checks, bitte mit DSB klaeren').

P83 — BUILD_SHA in backend/admin/consent-tester Dockerfiles als
ARG + ENV. check-rebuild-needed.sh kann jetzt deployed vs local SHA
vergleichen + REBUILD REQUIRED melden.

P80 v2 — check_replay.py macht jetzt vollstaendigen Replay aller
post-fetch Quality-Generatoren: vendor_normalizer (Dedup),
audit_quality_checks, cookie_compliance_audit, tcf_vendor_authority,
cookie_value_entropy, cookie_network_tracer. Snapshots aus alter Zeit
zeigen jetzt im Replay den aktuellen Audit-Stand.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:57:02 +02:00
Benjamin Admin bd65b6f318 feat(audit): Phase 2+3 — P54 + P68 + P69 + P6/P53/P55 + P31 + P80v2
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Failing after 59s
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 19s
CI / iace-gt-coverage (push) Successful in 27s
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P54 — consent_diff_for_user.py: USP-Feature fuer wiederkehrende Besucher.
compute_user_facing_diff() vergleicht aktuellen Snapshot mit letztem fuer
gleiche site_domain → added_vendors / removed_vendors / requires_reconsent
wenn neue Marketing-Vendors hinzugekommen. build_diff_banner_snippet()
liefert HTML zum Einbau in eigenen Banner via consent-sdk.

P68 — reverse_audit.py: Self-Audit unserer Template-Bibliothek.
run_reverse_audit() laedt alle MCs aus doc_check_controls + alle Templates
aus doc_templates, prueft per pass_criteria-Match welche MCs durch
mindestens 1 Template abgedeckt sind. Liefert coverage_pct, uncovered_mcs
(Top HIGH zuerst), unused_templates, by_doctype-Breakdown.

P69 — data/ecall_regulation.json: eCall-VO (EU) 2015/758 als 7 Chunks
fuer RAG-Ingest (Art. 3/6/7 + compliance_implications fuer Automotive-OEMs).
Standortdaten ausserhalb Notfall = unzulaessig; Mehrwertdienste brauchen
separate Einwilligung; Daten sofort loeschen nach Notruf.

P6+P53+P55 — industry_library.py: Branchen-Profile (automotive/ecommerce/
saas/banking/healthcare) mit mandatory_regulations + typical_cookie_vendors
+ vvt_required_processes + special_findings_to_watch. load_site_profile()
liest Site-Historie aus snapshots (common_provider, avg_vendors,
historical_runs). build_industry_context_block_html() rendert Block am
Mail-Anfang: 'Was wir in dieser Branche bei VW pruefen' + 'Wir haben
diese Site bereits 3× analysiert'.

P31 — llm_cascade.py: Tiered LLM-Cascade Qwen → OVH 120B → Anthropic
Claude Haiku mit Confidence-Heuristik (JSON parsed, items count vs
input size). Valkey-Cache (redis://) mit 7-Tage-TTL plus In-Process-
Fallback. Wenn Tier-1 unter Confidence-Threshold → Tier-2, dann Tier-3.
Reduziert Lauf-Zeit drastisch bei Re-Runs.

P80 v2 — check_replay.py: replay nutzt jetzt audit_quality_checks
mit den Snapshot-Daten. Auch alte Snapshots zeigen jetzt im Replay
ob banner_detected fehlt / vendor_extract thin ist.

Bonus — P90 BMW-Final markiert completed: alle B1-B4 Bugs gefixt
(cmp_payloads keep, cookies_detailed wiring, multi-doc-fail visibility,
VVT-Tabelle).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:38:08 +02:00
Benjamin Admin 8cbb513e2c feat(audit): Phase 1 Quick-Wins (P81 + P85 + P70 + P83) + TCF DELETE/INSERT-Fix
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 38s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / test-go (push) Has been skipped
P81 — tests/fixtures/golden_truth/vw_de.json:
GT-Fixture mit must_find_cookies (47 VW-Cookies) + expected_vendors
(Google, Adobe, Trade Desk, ...). Basis fuer kuenftige Regression-Tests.

P85 — banner_screenshot_block.py + consent_scanner.py + main.py:
consent-tester macht beim Banner-Detect einen base64-PNG-Screenshot
(< 1.5MB). Backend rendert ihn als <img src="data:..."> direkt nach
dem GF-1-Pager. Visueller Beweis 'so sah das Banner aus' fuer Dispute
mit Marketing/DSB.

P70 — rag_provenance.py:
classify_finding_provenance() klassifiziert ein Finding als 'rag'
(Norm + Quelle), 'mixed' (Norm ohne Quelle) oder 'heuristic' (eigene
Interpretation). provenance_badge_html() rendert kleine Badges
(✓ RAG / NORM / ⚠ HEURISTIK). Modul ist generisch, kann bei jedem
Finding-Renderer einklinkt werden.

P83 — scripts/check-rebuild-needed.sh:
Prueft ob die im Container deployten BUILD_SHA mit local HEAD
uebereinstimmen. Bei Mismatch exit 1 mit 'REBUILD REQUIRED'-Hinweis.
Verhindert das 'alter Code im Container'-Problem das uns mehrfach
erwischt hat (Frontend-Tabs sichtbar, Backend ohne neuen Service).

TCF-Fix — tcf_vendor_authority.py:
cookie_library hat keinen UNIQUE-Index auf cookie_name → ON CONFLICT
war unmoeglich. Loesung: vor Insert DELETE WHERE source_name='iab_tcf_v2'.
Idempotent. + per-Vendor-Commit damit ein Fail die naechsten nicht blockt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:24:46 +02:00
Benjamin Admin 6c35bcf116 fix(tcf): per-vendor commit damit ein Fail die naechsten Inserts nicht blockt
CI / detect-changes (push) Successful in 15s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 22s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-python-backend (push) Successful in 45s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
2026-05-22 07:54:22 +02:00
Benjamin Admin 19d4b12e07 fix(tcf): Schema-Mapping fuer NOT NULL constraints (domain_pattern, source_name)
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m33s
CI / test-go (push) Failing after 52s
CI / iace-gt-coverage (push) Successful in 25s
CI / test-python-backend (push) Successful in 40s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
2026-05-22 00:32:54 +02:00
Benjamin Admin 2e87b74749 feat(audit): P103+P104+P105 Defeat-Device-Heuristik fuer Cookies
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m35s
CI / test-go (push) Failing after 51s
CI / iace-gt-coverage (push) Successful in 27s
CI / test-python-backend (push) Successful in 39s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Drei zusammenhaengende Stufen 'Cookie-Verhalten ist anders als deklariert' —
analog zum VW-Diesel-Skandal-Pattern (Pruefstand vs Realbetrieb).

P103 (Stufe 3) — cookie_value_entropy.py:
Klassifiziert Cookie-Werte als flag/short_id/long_token/uuid/hash/json_blob
via Shannon-Entropy + Regex-Patterns. Wenn ein als 'essential' deklarierter
Cookie einen 64-char-Base64-Wert hat → MEDIUM-Finding 'Defeat-Device-Heuristik'.

P104 (Stufe 4) — cookie_network_tracer.py:
Vergleicht Cookie-Domain mit Site-Hauptdomain + bekannten Tracker-Vendoren
(50 Domains gemapped: doubleclick.net, facebook.com, demdex.net, omtrdc.net,
adsrvr.org, hotjar.com, ...). Wenn ein als 'essential' deklariertes Cookie
von externer Tracker-Domain gesetzt wird → HIGH. Drittland-Cookies werden
als 'DRITTLAND US/CN/...' markiert (Schrems-II-Folge).

P105 (Stufe 5) — tcf_vendor_authority.py:
Ingest-Endpoint POST /api/compliance/agent/admin/tcf-ingest holt die
IAB TCF v2 Global Vendor List (vendor-list.consensu.org/v3) und upserted
sie in cookie_library mit source='iab_tcf_v2'. cross_reference_with_tcf
fuzzy-matched cmp_vendors gegen die TCF-Liste — wenn Vendor in TCF als
Marketing gefuehrt aber Site sagt 'Funktional' → HIGH (externe Authority
widerspricht der Deklaration).

Alle drei rendern eigene Mail-Bloecke im Bereich Cookies (nach
cookie_audit_html, vor library_mismatch_html).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 00:24:07 +02:00
Benjamin Admin 6263462ba3 feat(frontend): Tab-Layout für Audit-Ergebnisse + cookie_audit in API
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / iace-gt-coverage (push) Successful in 28s
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m40s
CI / test-go (push) Failing after 45s
CI / test-python-backend (push) Successful in 40s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
ResultsTabsView.tsx — neue Komponente mit 7 Tabs:
  1. Übersicht (KPIs: Docs, Findings, Vendors, Score)
  2. Cookies & VVT (3-Quellen-Compliance-Vergleich +
     undokumentiert/compliant/nicht-geladen + deduplizierte Vendor-Tabelle)
  3. Datenschutzerklärung (DSE-Findings via ChecklistView)
  4. Impressum
  5. AGB / Widerruf (zwei Sections in einem Tab)
  6. Cookie-Banner (Verstoesse + Phasen-KPIs)
  7. Mail-Vorschau (PDF-Download-Link)

Sticky Tab-Header oben, Content scrollt darunter. Lange Scroll-Mail
ist damit verschwunden.

DocCheckTab nutzt ResultsTabsView statt der alten Inline-ChecklistView.

Backend liefert jetzt cookie_audit-dict in der Response (zusaetzlich
zu cmp_vendors + banner_result) damit das Cookie-Tab die 3 Listen
(undokumentiert / compliant / nicht-geladen) rendern kann.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 23:44:36 +02:00
Benjamin Admin 081e4f057a feat(audit): Cookie-Compliance-Audit (3-Quellen-Vergleich) + Vendor-Dedup + Block-Parser
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 55s
CI / iace-gt-coverage (push) Successful in 25s
CI / test-python-backend (push) Successful in 44s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m43s
ZENTRALER USP: cookie_compliance_audit.py vergleicht 3 Quellen
* DEKLARIERT in Cookie-Richtlinie (parse_cookie_table + parse_flat)
* TATSAECHLICH im Browser geladen (banner_result.phases.after_accept)
* LIBRARY-Metadaten (cookie_library lookup)

Liefert 3 Listen mit Compliance-Verdict:
* compliant (deklariert UND geladen) — gruener Block
* undeclared_in_browser (geladen NICHT deklariert) — ROTER HIGH-Block
  → Art. 13(1)(c) DSGVO + § 25 TDDDG Verstoss
* declared_not_loaded (deklariert NICHT geladen) — gelber Hinweis
  → Tabelle moeglicherweise veraltet

parse_cookie_table erweitert um Block-Format (5 Zeilen pro Cookie wie
beim User-Copy aus VW). Findet 35+ Cookies aus Copy-Paste statt 0.

vendor_normalizer.py: 50+ Aliases (Google-Familie, Adobe-Familie,
Trade Desk, AdForm, ...) + Garbage-Filter (URLs, leere Strings,
'click to select', 'Mehrere OEMs'). Mergt cookies-Listen beim Dedup.

_guess_vendor erweitert: Adobe-Familie (s_ecid/AMCV/demdex/mbox/...),
Trade Desk (TDID/TDCPM/TTDOptOut), AdForm (uid/cid/otsid),
Salesforce LiveAgent, etracker, Akamai, EDAA.

audit_quality_checks: vendor-thin-Threshold jetzt dynamisch nach
Cookie-Doc-Wörter (3k→10 / 6k→20 / 10k→30 / 15k+→40).

VW-Test-Fixture: tests/fixtures/cookie_gt/vw_cookie_richtlinie.txt
(36-Cookie-Sample fuer Regression-Tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 23:36:45 +02:00
Benjamin Admin 4434e3827b fix(audit): parse_flat_cookie_text — Anchor-Pattern fuer VW-textContent
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 17s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 40s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
VW Cookie-Doc-textContent verkettet HTML-Tabellen-Zellen OHNE Whitespace:
'Permanent/Protokoll_fbcTracking Cookies (Marketing)...'

Neues Pattern hat 2 Anker:
* Davor: typisches End-Token einer vorherigen Zelle (Permanent/Protokoll,
  Session Cookie, Persistent Cookie, TagePersistent, ...)
* Danach: Kategorie-Token (Tracking Cookies, Funktionscookie, Marketing,
  Analytics, Necessary)
Dazwischen: Cookie-Name (3-50 Zeichen, alphanum/_/-)

VW-Test (snapshot 4a465783): findet jetzt 40 unique Cookie-Namen,
aggregiert zu 6 Vendors (Google, DoubleClick, Cloudflare, Borlabs,
Meta, Unbekannter Anbieter mit 22 VW-internen Cookies).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 21:33:58 +02:00
Benjamin Admin 07cc00da11 feat(licenses): Stufe 2 — auto-attribution footer in compliance PDF
Extends CompliancePDFGenerator with a "Quellen & Lizenzen" section
appended to every generated compliance PDF.

The footer is built from compliance.canonical_controls + control_parent_links
directly (no HTTP hop to /licenses/aggregate — same DB connection
already open in the generator). It groups by license_rule and lists
the top 8 source regulations per bucket.

For Rule-2 entries (CC-BY-SA, OECD-Public, Apache, etc.) it emits the
mandatory attribution paragraph required by the underlying licenses.
For Rule 1 a brief reference list satisfies the auditability goal
without legal obligation. Rule 3 is identifier-only by design.

Architecture decision: this is a PLATFORM-level footer (which sources
the platform draws on overall), not a per-export filter of "only the
sources actually cited in THIS document". The latter would require
control-uuid tracking across all sections (TOM/VVT/DSFA/etc.) which
the current PDF generator does not surface — that's a follow-up scope.
The platform-level footer fulfils the immediate legal mandate that
attribution be present on the work, not buried in AGB/Impressum.

Part of Attribution-Renderer Task #23. Stufe 1 (overview page) +
Stufe 3 (SourceBadge component) already shipped in commit dfac940.
Stufe 4 (tech-file appendix) remains for the IACE tech-file generator
in a separate iteration.
2026-05-21 21:30:02 +02:00
Benjamin Admin 1451873194 fix(audit): parse_flat_cookie_text fuer VW-Style Flat-Tabellen
CI / loc-budget (push) Failing after 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m4s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 19s
VW Cookie-Doc liefert die Tabelle als FLACHEN Text ohne Spalten-Trenner:
'IDE Tracking Cookies (Marketing) Beschreibung 13 Monate Permanent
TAID Tracking Cookies (Marketing) ...'

parse_flat_cookie_text matched mit Regex:
  NAME [Tracking|Session|Funktional|...] Cookies ... [13 Monate|Session|Permanent]

Backend faellt bei parse_cookie_table=[] auf parse_flat zurueck. Damit
holen wir aus dem 65k VW Cookie-Doc ~30-50 Cookies + Vendors deterministisch,
auch wenn der HTML-Table-DOM-Extract leer ist (was passiert wenn die
Tabelle aus mehreren append-Code-Pfaden geladen wird).

Bonus: _extract_dom_tables Helper in dsi_discovery.py vorbereitet fuer
spaeteres Einhaengen an allen 7 DiscoveredDSI.append-Stellen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 21:24:14 +02:00
Benjamin Admin dfac940272 feat(licenses): attribution renderer — Stufe 1 (overview) + Stufe 3 (SourceBadge)
Backend
- backend-compliance/compliance/api/licenses_routes.py: three endpoints
  built on the now-complete license_rule classification
  - GET  /api/compliance/licenses/overview
       global aggregation by rule + per-source breakdown (Stufe 1)
  - POST /api/compliance/licenses/aggregate
       per-control-set aggregation for PDF footer (Stufe 2) and
       tech-file appendix (Stufe 4) — consumed later
  - GET  /api/compliance/licenses/source-info/{control_uuid}
       single-control lookup for the inline source badge (Stufe 3)
- registered in api/__init__.py via the existing safe-import loader

Frontend
- app/sdk/licenses/page.tsx (Stufe 1): the /sdk/licenses overview page.
  Renders rule legend cards + per-rule source tables. Drives the
  /licenses footer link and gives auditors a one-page view of what
  licence classes the platform is operating under.
- components/sdk/SourceBadge.tsx (Stufe 3): reusable React component.
  Small R1/R2/R3 pill with click-expand tooltip showing source
  regulation + attribution string + render-full-text policy. Will be
  embedded into IACE hazards/mitigations, VVT items, DSFA controls in
  follow-up commits.

Two stages of the four-stage renderer are now ready. Stufe 2 (PDF
auto-footer) + Stufe 4 (tech-file appendix) follow once the existing
PDF generators are extended to call /licenses/aggregate.
2026-05-21 21:00:10 +02:00
Benjamin Admin cb5dad1a2f feat(audit): A Audit-Transparenz + B Tabellen-Parse + D HTML-Tables aus DOM
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-python-backend (push) Successful in 45s
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 20s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Drei zusammenhaengende Fixes fuer den VW-Befund (6 Vendors statt 100+):

A — audit_quality_checks.py: drei systemische Vorbehalte die IMMER prominent
gezeigt werden:
* banner_detected=False trotz Cookie-Doc → HIGH 'CMP-Tool ungeladen'
* cookie_doc >= 30k chars aber cmp_vendors < 15 → HIGH/MEDIUM
  'Vendor-Liste auffaellig kurz fuer Doc-Groesse'
* submitted URL aber 0/Mini-Text → MEDIUM 'URL nicht ladbar'
Rote Audit-Vorbehalt-Box ueber dem GF-1-Pager. GF-Summary sagt
'Audit unvollstaendig' statt faelschlich 'Keine kritischen Themen'.
gf_one_pager nimmt audit_quality_findings in top_findings auf
(BEVOR andere Findings).

B — cookies_table_parser laeuft jetzt auch auf gecrawltem Cookie-Doc-
Text (nicht nur bei User-Paste). Wenn der dsi-discovery-Response Tab/
Pipe-getrennte Tabellen-Reihen liefert, parsen wir sie deterministisch.

D — consent-tester/dsi-discovery extrahiert jetzt zusaetzlich zum
Text die <table>-Elemente aus dem DOM als list[str] (Tab-getrennt pro
Zeile, mind. 2 Zellen, mind. 3 Zeilen, max 10 Tabellen pro Doc). Backend
schleust diese als 'html_table'-cmp_payload ein und jagt sie zuerst durch
cookies_table_parser → 100% deterministische Vendor-Extraktion ohne LLM.

VW-Erwartung: aus der 65k-Cookie-Tabelle werden jetzt 30-50 Vendors
deterministisch geparst statt 6 vom LLM-Cascade.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 20:21:28 +02:00
Benjamin Admin e411c4f0d3 feat(audit): Text-Paste-Mode pro Row — Crawler optional umgehen
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 17s
CI / loc-budget (push) Failing after 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m27s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 47s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Hintergrund: VW liefert ueber URL-Crawler nur 6 Vendors statt der 100+
die in der echten Cookie-Tabelle stehen. Wenn der User die Tabelle aber
direkt von der Site kopieren kann (was bei den meisten OEM-Sites moeglich
ist), umgehen wir den Crawler komplett und parsen den Text deterministisch.

Backend:
* doc_type_classifier.py — 7 Pattern-Gruppen (§5 TMG, Art.13 DSGVO,
  AGB-Klauseln, Widerrufs-Frist, Cookie-Tabellen-Header, etc). Wenn der
  User Text ins falsche Doc-Type-Feld kopiert (Impressum->DSE),
  detect_mismatch liefert detected + action ('reclassify' bei sehr hoher
  Konfidenz, 'warn' bei medium).
* cookies_table_parser.py — Tab/Pipe/Komma/Semicolon-Separator-Auto-
  Detection, Spalten-Mapping per Header-Keyword. Aggregiert Cookie-
  Eintraege zu Vendor-Records (mit _guess_vendor-Fallback). Voll
  deterministisch, kein LLM.
* doc_input_warnings.py — Mail-Block ueber dem Audit, der Mismatches +
  Auto-Reclassifies dem User transparent macht.
* Pipeline: text gewinnt ueber url (war schon im Schema vermerkt), neue
  Felder declared_doc_type / input_source / reclassify_hint in doc_entries.
  Pasted-Tabellen-Vendors haben Vorrang vor Library-Fallback + LLM-Cascade
  (sind 100% genau).

Frontend (DocCheckTab):
* Pro Row Mode-Toggle 'URL' / 'Text einfuegen' (lila wenn aktiv).
* Textarea (h-32, monospace) im text-mode mit kontext-spezifischem
  Placeholder (Cookie-Hinweis ggue. anderen Doc-Types) und Live-
  Zeichen-/Wort-Counter.
* Submit-Button accepted entries mit URL ODER text.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:58:32 +02:00
Benjamin Admin 7335f64f4f feat(founding-wizard): Per-Person IP-Assignment + Prefill + E2E-Tests
CI / loc-budget (push) Failing after 20s
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 19s
CI / nodejs-build (push) Successful in 3m17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Wizard unterstuetzt jetzt 2-4 Gesellschafter mit individuellem IP-Bereich:
- Pro Gruender ein IP-Assignment-Vertrag (z.B. Benjamin: Compliance+RAG;
  Sharang: Security+Infrastruktur). Pro GF ein eigener Dienstvertrag.
- Step 1: Prefill-Button aus Unternehmensprofil + Felder Registergericht
  und HRB-Nr.
- Step 2: Rollen-Dropdown (CEO/CTO/CFO/COO/CPO/GF/Sonstige) statt freie
  Texteingabe, IP-Bereiche-Textarea pro Person.

Backend:
- generate_documents() iteriert pro Person fuer PER_PERSON_DOCS.
- _build_person_context() injiziert ASSIGNOR_*, GF_*, IP_LIST_DETAILS
  aus person.ip_areas.
- base_context() propagiert basics.register_court und basics.hrb_number.

Tests:
- 30/30 Pytest gruen (6 neue: Per-Person-Context, Slug-Helper,
  Registergericht-Propagation).
- 4 neue Playwright-E2E-Specs (hermetisch via route.fulfill, mit
  Console-/Page-Error-Traps): kompletter 8-Step-Flow, Prefill-Fehlerpfad,
  Step-Navigation/Reset, Rollen-Dropdown + IP-Areas.
- Spec setzt 'bp-sdk-cookie-consent' im addInitScript damit der
  CookieBannerOverlay nicht die Wizard-Buttons ueberlagert.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:49:10 +02:00
Benjamin Admin 138d9068c4 fix(audit): VW-Cookie-Tabelle — Library-Fallback + Pattern-Extract verstaerkt
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
VW-Lehre: cmp_vendors=6 (alle LLM-grob) wurde als ausreichend gewertet,
obwohl die echte Cookie-Tabelle 30+ Eintraege hat. 3 Fixes:

1. fallback_vendors_for_run skip-Schwelle: existing_vendor_count >= 3
   war zu niedrig. Jetzt nur skip wenn < 5 Cookies UND >= 5 Vendors
   schon vorhanden.

2. Library-Fallback wird jetzt aufgerufen bei < 20 cmp_vendors (statt
   < 3). VW-typische Setups (6 LLM-grob + 30 aus Library) bekommen
   damit eine vollstaendige Vendor-Liste.

3. _extract_cookie_names_from_doc: regex-Pattern-Extract aus dem
   Cookie-Doc-Text selbst — sucht nach 'NAME Tracking Cookies (Marketing)'
   etc. Findet Cookie-Namen die NICHT im Browser-Jar landen (z.B. nur
   nach Consent geladen werden). Diese werden zusaetzlich durch die
   Library matched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:32:07 +02:00
Benjamin Admin c281464071 feat(audit): P71 JC-vs-AVV Entscheidungsbaum
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / test-python-backend (push) Successful in 39s
CI / test-python-document-crawler (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
jc_avv_decision.py: detect_ambiguous_jc_avv prueft ob DSE-Text sowohl
JC-Signale (gemeinsame Auswertung, Schwesterunternehmen, Konzern...)
als auch AVV-Signale (Auftragsverarbeiter, weisungsgebunden...) enthaelt.
Bei Treffer rendert build_jc_avv_decision_html einen Block mit 4 EDPB-
basierten Leitfragen + jeweiliger Empfehlung.

Quellen: EDPB Guidelines 7/2020, EuGH C-25/17, C-40/17.

In Mail-Render zwischen Solutions-Block und VVT eingehaengt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 17:31:37 +02:00
Benjamin Admin 6dc427a754 fix(audit): VW-404-Recovery + P52 LLM-Merge + P51 Banner-UX-Checks
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
VW-404-Fix: submitted_types zaehlt jetzt nur Doc-Types mit >= 200 Zeichen
echtem Text. Eine eingegebene URL die 404/Mini-Text liefert (VW cookie-
richtlinie.html) wird als 'missing' behandelt, sodass Auto-Discovery
alternative URLs auf der Homepage probiert. In-place-Update statt
Duplicate-Entry, rejected_url wird fuer Audit-Transparenz aufgehoben.

P52 LLM-Cascade Merge: vendor_llm_extractor laeuft jetzt bei < 5 Vendors
(nicht nur bei 0), und die Ergebnisse werden MIT existing cmp_vendors
gemerged statt zu ueberschreiben. VW-typische Setups (Generic CMP +
0 cmp_payloads) bekommen damit den Text-basierten Vendor-Layer dazu.

P51 — banner_consistency_checks erweitert:
* check_banner_copyability: scannt banner_html nach user-select:none /
  oncopy=return false / onselectstart. MEDIUM Finding wenn Banner-Text
  nicht kopierbar (Art. 7 (2) DSGVO).
* check_consent_history: prueft auf 'Meine Einwilligungen' / Consent-
  Historie / Datenschutz-Cockpit. MEDIUM wenn keine sichtbare Historie
  (Art. 7 (3) — Widerruf muss so einfach wie Erteilung sein).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 17:27:55 +02:00
Benjamin Admin 309c10c203 feat(audit): P72 MC-Scope-Filter + P73 MC-Solution-Generator
CI / detect-changes (push) Successful in 12s
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P72 — rag_document_checker LEFT JOINs canonical_controls.scope_doc_type.
_filter_by_canonical_scope wirft MCs raus deren scope explizit auf
einen inkompatiblen Doc-Type zeigt (Mapping in _SCOPE_COMPATIBLE).
Konservativ: 'other'/NULL/'process' bleiben drin — Heuristik v1 ist
noch nicht stark genug fuer hartes Filtern.

Erwartete Wirkung: ~10-15% weniger irrelevante MCs pro Doc, weil z.B.
ein TOM-MC nicht mehr als DSE-Finding auftaucht.

P73 — mc_solution_generator.py: Qwen->OVH Cascade generiert pro HIGH/
CRITICAL-Fail eine konkrete Einfuege-Empfehlung mit Anchor (wo + was)
und Aufwand-Schaetzung. JSON-Schema {solution_text, anchor_hint,
effort_min}. In-process LRU-Cache (500 entries) per (mc_id, doc_md5).

Max 3 Solutions pro Doc-Type, global Cap 8 — haelt Latenz < 60s. Bloecke
werden im Mail-Render unter VVT als 'Loesungs-Vorschlaege (KI-generiert)'
eingehaengt. Disclaimer: kein Rechts-Beratung, mit DSB pruefen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 17:21:19 +02:00
Benjamin Admin 4183379dc5 feat(audit): P33 3-Spalten-Vendor-Konsistenz (DSE/Cookie-Doc/Banner)
CI / detect-changes (push) Successful in 11s
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 44s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
check_three_source_vendor_consistency: scannt DSE-, Cookie-Doc- und
Banner-Vendor-Liste auf 15 typische Vendor-Signaturen (Google Analytics,
Meta Pixel, Hotjar, HubSpot, LinkedIn Insight, ...). Listet Vendors die
in mind. einer Quelle stehen, aber nicht in allen sources_with_data.

Liefert MEDIUM-Finding mit konkreter 'fehlt in: DSE, Banner-Liste'-
Liste pro Vendor. Empfehlung: zentrale Vendor-Liste pflegen + in alle
drei Dokumenttypen propagieren. (Art. 13(1)(c)+(e) DSGVO)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 17:11:47 +02:00
Benjamin Admin c93c88577c feat(audit): P88 PDF-Export via WeasyPrint
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
GET /api/compliance/agent/snapshots/{id}/pdf liefert application/pdf
mit dem vollen Audit-Mail-Inhalt im A4-Print-Layout (Header mit
Site/Timestamp/Snapshot-ID, Seitenzahlen unten rechts).

check_replay.py liefert jetzt zusaetzlich 'full_html' (nicht nur
500-char-preview), damit der PDF-Renderer das komplette HTML hat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 17:06:48 +02:00
Benjamin Admin 3207acea3e fix(audit): Replay-Pipeline um P35/P77/P78/P36 Signals-Block ergaenzen
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 44s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
check_replay.py rendert jetzt auch die Textsignal-Findings (Save-Label-
Ambiguitaet, Cookies-in-DSE-Akzeptanz, JC-Klausel positiv, Social-Embeds).
Damit hat der Replay-Test parity mit der echten Mail-Pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 17:04:02 +02:00
Benjamin Admin 9f06911ff9 feat(audit): Cookie-Library-Fallback fuer VW-Pattern (kein bekanntes CMP)
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 17s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
Wenn nach Standard-Extract + Phase-G + LLM-Cascade weiterhin < 3 cmp_vendors
aber >= 5 Cookies im after_accept stehen (typisch: Custom-CMP wie VW
'cookiemgmt'), matcht der Fallback die Cookie-Namen gegen die
compliance.cookie_library und rekonstruiert Vendor-Records aus den
Library-Eintraegen.

Hintergrund: VW Run de2a029e zeigt 4 Vendors trotz 28 after_accept-Cookies.
cmp_payloads ist 0 (kein bekanntes IAB-Tool erkannt) und die hinterlegte
Cookie-URL liefert 404. Die DSE ist mit 34k zwar substanziell, listet aber
keine Vendor-Tabelle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 17:00:49 +02:00
Benjamin Admin 338e03d3b0 feat(audit): P34 Exec-Summary Score-Einordnung — 'wo Sie stehen sollten'
CI / detect-changes (push) Successful in 10s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m46s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Has been skipped
_score_band_explanation: vier Baender (Sehr gut/Akzeptabel/Handlungs-
bedarf/Erhoehtes Risiko) liefern Label + erwartete Handlung. Wird als
neue Zeile unter den KPIs in der Exec-Summary gerendert (mit
score-farbiger Linkmark).

Sachlicher Ton — kein 'Vorstand muss sofort handeln', sondern
realistische Empfehlung (z.B. '70-84: Branchen-Median, einmaliges
Aufraeumen + Halbjahres-Check').

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:51:34 +02:00
Benjamin Admin 4171cf0efd feat(audit): P36 Social-Media-Einbindungs-Check
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 44s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
check_social_embedding: erkennt direkte FB/Insta/Twitter/YouTube-
Embeds (connect.facebook.net, platform.twitter.com etc) vs
Heise-Shariff vs 2-Klick-Loesungen (Embetty).

Direkte Embeds ohne Schutz = HIGH (EuGH C-40/17 Fashion-ID — der
Site-Betreiber wird zum gemeinsam Verantwortlichen und braucht
Einwilligung VOR dem Drittanbieter-Call).
Shariff oder 2-Klick erkannt = INFO (positives Signal).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:45:12 +02:00
Benjamin Admin 30e43afba6 feat(audit): P86 Branchen-Benchmark + P35/P77/P78 Textsignale
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
P86 — industry_benchmark.py: zieht alle Snapshots mit derselben
scan_context.industry, berechnet Median + Percentile, rendert
'Sie 42% — Automotive-Median 58% (Stichprobe: 12)'. Min Sample 3.

P35 — banner_text 'Speichern' ohne 'Ablehnen' = MEDIUM. Mehrdeutiges
Label nach EDPB 03/2022 Deceptive-Design-Guidelines.

P77 — DSE mit prominenter Cookie-Sektion (Vendor-Hints: Speicherdauer,
Anbieter, Datenkategorie) ersetzt die Forderung nach separater
Cookie-Richtlinie. Positives Signal statt False-Positive.

P78 — Art. 26-Klausel im DSE-Text erkannt → positives Signal
'JC-Konstrukt dokumentiert'. Vermeidet False-Positive bei
Konzern-Schwester-Kooperationen.

Alle in Mail eingehaengt: Branchen-Block nach GF-1-Pager, Signale-Block
nach Konsistenz-Check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:43:15 +02:00
Benjamin Admin df8832c521 feat(audit): P75 Banner-vs-CMP + P84 Diff-Mode + P74/P96/P97 Doc-Types
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P75 — check_banner_vs_cmp_partner_count: wenn Banner-Text 'N Partner'
nennt und N < cmp_vendors * 0.6, HIGH-Finding (Art. 13(1)(e) DSGVO).
Erkennt Verharmlosung der tatsaechlichen Vendor-Anzahl.

P84 — run_diff.py: vergleicht aktuellen Lauf mit letztem Snapshot
derselben Site (set-Diff auf normalisierten Finding-Labels). Block
ueber dem GF-1-Pager: 'Seit letztem Lauf: X Findings weg, Y neue'.
USP — keiner der grossen Anbieter hat das.

P74/P96/P97 — Labels fuer legal_notice (Rechtliche Hinweise / IP /
Forward-Looking), dsa (Art. 12+17 Digital Services Act), lizenzhinweise
(OSS-Compliance) in _DOC_TYPE_LABELS registriert. Echte Pflichtangaben-
Checks kommen separat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:38:25 +02:00
Benjamin Admin 7842c95532 feat(audit): P92 CMP-Tool-Verfuegbarkeit + P94 Banner-vs-Cookie-Doc-Konsistenz
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P92 — Wenn der Nutzer 'Anpassen'/'Einstellungen' klickt und der
CMP-Settings-Bereich kein Fehlerfreies Laden zeigt (Error, Timeout,
<80 Zeichen ohne Kategorien, keine Toggles), ist das ein HIGH-
Finding. Granulare Wahl formal vorhanden, faktisch nicht
funktionsfaehig (Art. 7 (3) DSGVO + EDPB 03/2022).

P94 — Cookie-Liste im Banner-Settings vs Cookie-Richtlinie. Heuristik
extrahiert Cookie-Namen aus dem Cookie-Doc-Text (regex auf typische
camelCase/_underscored Patterns + Vendor-Prefixes _ga/_gid/ot_/uc_).
Wenn |only_in_doc| >= 5 ODER |only_in_banner| >= 3 → MEDIUM-Finding.
|only_in_doc| >= 15 UND |only_in_banner| >= 5 → HIGH.

Beide Findings landen im neuen Mail-Block 'Banner-Konsistenz-Pruefung'
(amber-yellow) zwischen Mismatch-Block und VVT. Auch in
check_replay.py eingehaengt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:31:19 +02:00
Benjamin Admin 08671adfdf feat(audit): P82 GF-1-Pager + P87 Konfidenz-Score pro Finding
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 18s
CI / loc-budget (push) Failing after 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
P82 — gf_one_pager.py: kompakte 5-Bullet-Kurzfassung ganz oben in der
Mail. Score (gross + Farbe), Delta-zu-Vorlauf, Top-Findings nach
HIGH/MEDIUM sortiert mit zustaendiger Rolle (DSB / Marketing / IT /
Legal / Web-Team) und Klassifizierungsbits aus dem Wizard.
Sachlicher Ton — keine 4%-Drohung, '4-8 Wochen' als realistischer
Zeitrahmen. Eingehaengt vor Critical-Findings-Block in Mail-Composition
und Replay-Pipeline.

P87 — finding_confidence.py: 13 Regex-Regeln liefern (confidence_pct,
reason) pro Finding-Label. Direkt im DOM beobachtbar = 95-98%,
Library-Mismatch = 82%, Textmuster-Match auf Pflichtangaben = 75-88%.
Im 1-Pager als kleines '(NN% Konfidenz)'-Tag mit Reason-Tooltip
hinter jedem Finding gerendert.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:20:19 +02:00
Benjamin Admin 50fc0ecc59 feat(audit): P79 Pre-Scan-Wizard (8 Pflichtfelder) + P99 erweitert + P102 Replay-Fix
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / nodejs-lint (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m56s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 40s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P79: PreScanWizard.tsx mit 8 Pflichtfeldern (Branche, B2B/B2C,
Direkt-Vertrieb, Rechtsform, Konzern-Struktur, MA-Zahl, Besondere
Daten, Drittland). Scan-Button disabled bis alle 8 ausgefuellt. Werte
landen in scan_context und ueber Backend in compliance_check_snapshots.

P99: DOC_TYPES um dsa + legal_notice + lizenzhinweise + nutzungsbedingungen
erweitert. URL-hinzufuegen-Button war schon da.

P102 (Replay-Bug): check_replay.py liest jetzt e.get('text') statt
nur full_text — Snapshot-Schema verwendet 'text'. Library-Mismatch-
Block wird damit auch im Replay angezeigt.

Backend: ComplianceCheckRequest.scan_context optional; save_snapshot
persistiert ihn in compliance_check_snapshots.scan_context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 15:59:01 +02:00
Benjamin Admin 94057b1536 feat(audit): VW-Cookie-Bug-Fix + P101/P102 Cookie-Library-Mismatch-Findings
CI / loc-budget (push) Failing after 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
VW-Bug B1: extract_vendors_via_llm hatte max_text_chars=12000 -> bei
VW-Cookie-Doc (60k chars, 100 Cookies in Tabelle) wurden 80% abgeschnitten,
LLM extrahierte nur 1 Vendor. Fix: max_text_chars=50000, num_predict
6000->16000 fuer mehr Vendor-Output, Ollama-Timeout 120s->420s.

P101 Aggregator-Script (backend-compliance/scripts/cookie_library_enrich.py)
geht alle compliance_check_snapshots durch und extrahiert (cookie_name,
declared_category, observed_sites). Erste Auswertung ueber 8 Snapshots:
101 unique Cookies, 47 in Library, 54 unbekannt, 18 Mismatches.

P102 Cookie-Klassifikations-Pruefung als Mail-Block. Vergleicht
Site-deklarierte Kategorie vs Library + Vendor-Doku. HIGH wenn Library
sagt 'marketing' aber Site als 'essential'/'statistics' deklariert
(faktische Drittland-/Werbe-Verarbeitung versteckt). MEDIUM sonst.
In agent_compliance_check_routes Mail-Komposition + Replay-Pipeline
eingebaut.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 15:47:11 +02:00
Benjamin Admin 50ed0f45af fix(replay): P80 — DocCheckResult-Import entfernt (gibt es nicht in runner)
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 36s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
Vorher hatte ich den Container hotfixed aber den Fix nicht committed.
Beim naechsten Rebuild kam der Bug aus dem Image zurueck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 12:25:04 +02:00
Benjamin Admin e1df24cad7 fix(audit): P93+P95 — Reject-Wording erweitert + Vendor-zentrisches Cookie-Format akzeptiert
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / test-python-backend (push) Successful in 38s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
P93: 'Cookies verbieten', 'Tracking ablehnen', 'verweigern' usw. zaehlen
nun als expliziter Reject-Mechanismus. EDPB 5/2020 schreibt kein bestimmtes
Wort vor — BMW False-Positive 'Kein Ablehnen-Mechanismus' weg.

P95: cookie_table-Check akzeptiert nun zwei gleichwertige Formate:
(a) klassische Tabelle, (b) Vendor-Detailseite mit Block pro Anbieter
(Name+Anschrift, Zweck, Speicherdauer aggregiert, Cookie-Namen-Liste,
Opt-Out-Link). BMW-Stil mit Adform-Block ist DSK-OH 2024 konform.
False-Positive 'tabellarisches Cookie-Verzeichnis fehlt' wird seltener.

Hinweis-Text in cookie_table umformuliert: nennt beide akzeptablen
Formate, weniger normativ.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 12:21:29 +02:00
Benjamin Admin e5b4672f2a fix(audit): P90 — auto-discovery Timeout 180s -> 300s fuer BMW-Homepage
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 39s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 12:05:41 +02:00
Benjamin Admin 0d5c76ea98 fix(audit): P90-B1 — DSI-Discovery Timeout 120s -> 240s fuer BMW-Impressum
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 13s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 38s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
BMW-fafcb090 zeigte exception 'ReadTimeout' beim consent-tester-Call fuer
anbieterkennzeichnung.html. Der Discovery-Lauf folgt 3 Sub-Documents
(Versicherungsvermittler, Aufsicht, Berufsrecht) plus ePaaS-Captures —
braucht regelmaessig >120s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 11:52:59 +02:00