breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	0d37822b7c	fix(impressum): P9 — 7 False-Positive-Fixes in Pflichtangaben-Checks CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / detect-changes (push) Successful in 10s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 16s Details CI / loc-budget (push) Failing after 16s Details CI / go-lint (push) Has been skipped Details CI / nodejs-build (push) Has been skipped Details CI / test-go (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 37s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details #1 Name des Anbieters: \b Word-Boundary verhindert "ag" in "samstag", plus "aktiengesellschaft" als Volltreffer. #2 Vertretungsberechtigte: Klammer-Liste-Pattern erkennt jetzt BMW- Format "Vorstand (Milan Nedeljkovic, Jochen Goller, ...)" plus "Vorsitzender des Aufsichtsrats: Name". #3 V.i.S.d.P.: war schon INFO, OK. #4 OS-Plattform/VSBG: bei no_direct_sales=True (OEM-Pattern) jetzt als "Nicht anwendbar" skipped statt 0/1 fail. Profile fliesst neu durch check_document_completeness -> runner. #5 Zustaendige Kammer: IHK + Handwerkskammer + Tieraerztekammer in Pattern aufgenommen + severity LOW -> INFO (konditional). #6 Stammkapital: war schon INFO, OK. #7 Link-Disclaimer: neue Check-Eigenschaft "invert"=True. Anti-Pattern ist passed wenn NICHT gefunden, fail wenn gefunden. Vorher feuerte das Finding immer, jetzt nur wenn ein illegaler Disclaimer im Text ist. Plus: L2-INFO-Checks (z.B. profession_chamber) zaehlen nicht mehr in correctness-pct und erzeugen keine DSI-DETAIL-Findings. Konsistent mit P8-Modell: INFO = "selbst pruefen", nicht "fail". Verifiziert mit BMW-Impressum-Text — alle 7 Faelle korrekt klassifiziert: name=passed, representative_person=passed, profession_chamber=INFO, illegal_disclaimer=passed (kein Disclaimer im Text), dispute_resolution=skipped (no_direct_sales), editorial_visdp=INFO, share_capital=INFO. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 00:52:03 +02:00
Benjamin Admin	575644c9c5	feat(audit): P8 — MC-Severity raus, Email nur harte Findings, MC-Audit als Checkliste CI / detect-changes (push) Successful in 10s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 17s Details CI / loc-budget (push) Failing after 17s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m48s Details CI / test-go (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 40s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details Email-Hardening (mc_scorecard.top_fails): Neue _is_hard_finding-Heuristik filtert konditionale MCs ohne Negativ-Beleg aus den Top-Auffaelligkeiten. matched_text leer + Label enthaelt "falls/sofern/wenn/soweit/ggf." -> raus, landet nur noch im MC-Audit als "selbst pruefen". DATA-2066-A05 (kostenfreie Abschaltung Standortdaten) ist das prototypische Beispiel. MC-Audit-Frontend (audit/[checkId]/page.tsx): Severity-Spalte (CRITICAL/HIGH/MEDIUM/LOW) entfernt — der MC-Audit ist eine Checkliste, keine Severity-Drohung. Stattdessen: - Spalte "Prioritaet" mit 3-Tier aus regulation-Mapping: Gesetz (DSGVO/ePrivacy/TDDDG/...) / Behoerden-Leitlinie (EDPB/DSK/EuGH/...) / Best-Practice (ISO/NIST/BSI) - 3-Status: erfuellt (✓) / nicht erfuellt (✗) / selbst pruefen (?) / nicht anwendbar (—). rowReviewStatus() leitet "selbst pruefen" aus matched_text-leer + konditionalem Label ab. - Filter umgebaut auf 5 Stati statt 4 - Default-Filter "Nicht erfuellt" (vorher "Nur Fail") Bonus: f.payload.risk_label TS-Cast im FindingsTab clean gemacht (unknown -> string). Effekt: - Email an die GF zeigt nur noch echte Belege ("DSB fehlt", "Gebuehr fuer Widerruf") - MC-Audit ist eine sachliche Pruefliste fuer den Compliance-Officer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 00:30:04 +02:00
Benjamin Admin	6c223c7c9b	feat(compliance-check): exec-summary + voll-audit + TDM-respect + cookie-KB-extended + saving-scan-funnel CI / detect-changes (push) Successful in 10s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 14s Details CI / loc-budget (push) Failing after 15s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m43s Details CI / test-go (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 37s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details P1 — Exec-Summary oben im Email-Report (4 KPIs + 2 CTAs, dunkler Gradient) P3 — no_direct_sales-Flag fuer OEM-Konfigurator-Sites; AGB/Widerruf/AGB als "NICHT ANWENDBAR" (grau) statt "NICHT GEFUNDEN" (rot) P5 — Voll-Audit Unification: alle Findings (MC + Pflichtangaben + Vendor + Redundanz) in /data/compliance_audits.db.unified_findings; neuer /api/compliance/agent/findings/<id> Endpoint + FindingsTab im Audit-UI mit Filter + CSV-Export P7 — Crawl-Hardening: TDM-Reservation-Check (robots.txt / ai.txt / Header / Meta) vor jedem Run mit 24h-Cache; HeadlessChrome-UA (Firma noch nicht gegruendet — Switch via BREAKPILOT_BRANDED_UA env); per-Domain Rate-Limit 1 req/s + max 2 concurrent P2 — Cookie-Knowledge-DB additiv erweitert (35 -> 74 Cookies): Adobe, Meta, Microsoft, LinkedIn, TikTok, HubSpot, Marketo, Salesforce, Hotjar, FullStory, Mouseflow, Intercom, Drift, Zendesk, Cloudflare, Stripe, OneTrust/Cookiebot/Usercentrics, Matomo, Pinterest, Snapchat, X/Twitter, YouTube, Vimeo, Klaviyo, Mailchimp, Mixpanel, Segment, Amplitude, Optimizely, Datadog; Wire-in in cookie_function_classifier liefert compliance_risk-Label (kritisch/hoch/mittel/gering) pro Vendor A — k-Anonymitaets-Helper (benchmark_k_anonymity) fuer P6-Vorbereitung B — Cross-Tenant-Domain-Assertion im /findings-Endpoint (expected_domain Query-Param -> 403 bei Mismatch) C — Saving-Scan-Funnel: /api/compliance/agent/saving-scan/start mit Validierung + 24h-Rate-Limit pro Domain + Lead-Persistenz in saving_scan_leads + Auto-Discovery via _run_compliance_check; 6 Tests D — Risk-Badge im Email-Vendor-Row Rechtliche Leitplanken (Memory feedback_oem_data_legal.md): nur eigene Knapp-Bewertungen + Source-Pointer, keine 1:1-Kopien fremder CMP-Texte. TDM-Opt-Out-Respect nach § 44b UrhG. KEINE Schema-Aenderungen — alles in Sidecar-SQLite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 23:48:34 +02:00
Benjamin Admin	a616b64273	feat(iace): Customer-Standard-Reuse across customer's prior projects CI / detect-changes (push) Successful in 10s Details CI / guardrail-integrity (push) Has been skipped Details CI / branch-name (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 14s Details CI / loc-budget (push) Failing after 19s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / test-go (push) Successful in 47s Details CI / nodejs-build (push) Successful in 2m46s Details CI / iace-gt-coverage (push) Successful in 28s Details CI / test-python-backend (push) Has been skipped Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details [migration-approved] Task #22. The IACE module is used by a single Maschinenhersteller, but their plants land at many different end customers. When the safety expert commissions the second or third plant at the same customer, whole classes of mitigations (company-wide PPE rules, locked-out energy isolation, customer-standard signage) are already in place there — but rediscovered from scratch every project. Migration 031: iace_projects.customer_name TEXT + partial index. The customer is stored as a plain text field rather than a normalised iace_customers table (option A from the design discussion). A proper customer-management screen can promote this to a FK later without data loss. Backend store_customer_standards.go: - ListCustomerStandardSuggestions(projectID, includeVerified) collects mitigations from all non-archived prior projects sharing the same tenant_id AND case-insensitive customer_name. Aggregates by mitigation.name (since same-named measures from different prior projects collapse into one suggestion) and surfaces: • source_project_count + source_project_names • is_customer_standard / has_verified_instances flags includeVerified=false → strictly is_customer_standard=true includeVerified=true → also status='verified' - ImportCustomerStandardSuggestion(projectID, name): for every prior (mitigation.name → hazard.name) pairing, finds matching hazards in the current project (by name) and ensures a customer-standard mitigation exists. New rows via CreateMitigation (idempotent through the UNIQUE(hazard_id, name) from migration 030); existing rows are flipped to is_relevant=true + is_customer_standard=true + status='verified' via UPDATE. Routes: GET /api/v1/iace/projects/:id/customer-standards?include_verified= POST /api/v1/iace/projects/:id/customer-standards/import body {name} Frontend: - New page /sdk/iace/[projectId]/customer-standards with: • empty-state hint pointing to Auftrag → Kundenname • per-suggestion checkbox + per-row Übernehmen button • bulk "N übernehmen" button • toggle "Auch verifizierte einbeziehen" widening the pool • per-suggestion source_project_count + status badges - Sidebar item "Kundenstandards" (building icon) placed between Verifikation and Nachweise. - Order-page now mirrors Auftraggeber.Firmenname into the top-level customer_name column on save, so the Reuse feature is fed automatically without a separate input field. The same expert effect from migration 029's is_customer_standard flag — "I already know it's covered, no evidence needed" — now becomes a cross-project asset rather than a per-project annotation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 22:31:30 +02:00
Benjamin Admin	27384aea09	feat(cra): Phase 5 — Technical Doc + DoC Generator (Annex V + VII) CI / detect-changes (push) Successful in 11s Details CI / branch-name (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 15s Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 16s Details CI / go-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 3m1s Details CI / test-go (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 39s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details Migration 122: compliance_cra_documents with versioning + approval workflow - doc_type whitelist: doc_eu_conformity, doc_technical, doc_cvd_policy, doc_update_policy, doc_sbom_report - Status state machine: draft → reviewed → approved (+ superseded) - Snapshot generation_context for audit trail New module cra_doc_templates.py — pure-function generators (no DB access): - doc_eu_conformity: EU DoC structured per CRA Annex VII (all 7 mandatory fields) - doc_technical: Technische Dokumentation per CRA Annex V - doc_cvd_policy: ISO/IEC 29147-compliant CVD policy with SLA table - doc_update_policy: Patch/Update policy with Lifecycle + CSAF reference - doc_sbom_report: Latest SBOM summary with top-10 components Returns (title, markdown_content, requirements_coverage) — coverage tracks how many mandatory fields are filled vs placeholders. Backend endpoints: - POST /documents/generate — generates doc, supersedes previous version, increments version number atomically - GET /documents — lists all 5 doc types (also "not_generated" stubs) - GET /documents/{id} — full content_md - POST /documents/{id}/approve — set status + signed_by + signed_at Frontend: - /documents page: 5 doc-type cards with Generate/Re-Generate buttons, inline Markdown preview with .md download, 2-step approval flow (reviewed → approved with signature) - Optional params form: manufacturer, notified_body, security_contact - Dashboard: +1 button (Dokumente, 7 buttons total) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 22:10:23 +02:00
Benjamin Admin	cc80e59e5e	feat(cra): Phase 4 — Vulnerability Disclosure + Post-Market Monitoring Migration 121: compliance_cra_vulnerabilities table with full lifecycle tracking - Status state machine: reported → triaged → patched → disclosed (+ withdrawn) - CRA Art. 14(2) deadlines tracked: reported_to_enisa_at (24h), detailed_report_at (72h) - CVE-ID, severity, CVSS, affected_components (JSONB), embargo_until Backend endpoints in cra_routes.py: - POST /vulnerabilities — create with validation (severity, CVSS range) - GET /vulnerabilities — list with deadline-breach summary (24h/72h counters) - PATCH /vulnerabilities/{id} — update fields + auto-set lifecycle timestamps - DELETE /vulnerabilities/{id} — soft-delete (withdrawn) - GET /monitoring — combined view: CRA deadlines + vuln summary + post-market checklist Frontend: - /vuln page: intake form, vuln cards with 24h/72h-countdown buttons, status-transition flow with auto-timestamps - /monitoring page: CRA deadlines (11.06.26 / 11.09.26 / 11.12.27), breach banner if 24h/72h obligations missed, post-market checklist with deep-links - Dashboard: +2 buttons (Vulns, Monitoring) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 22:08:49 +02:00
Benjamin Admin	0a64da74bb	fix(iace/mitigations): idempotent CreateMitigation + UNIQUE(hazard_id, name) CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Has been skipped Details CI / test-go (push) Successful in 56s Details CI / iace-gt-coverage (push) Successful in 27s Details CI / test-python-backend (push) Has been skipped Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details CI / detect-changes (push) Successful in 11s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 17s Details CI / loc-budget (push) Failing after 17s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details [migration-approved] The init-handler was non-idempotent. A second click on "Neu initialisieren in Grenzen" inserted every engine-suggested mitigation a second time — e.g. the Bremsscheibe project ended up with 5 (hazard_id, name) duplicate pairs (HMI-Usability-Pruefung, Eindeutiges visuelles Feedback, Betriebsarten-Anzeige, Sicher begrenzter Bewegungsbereich, …). 45 such duplicates accumulated across all projects. Migration 030_iace_mitigation_unique.sql: 1. Picks one winning row per (hazard_id, name) using a stable rank: is_relevant DESC (expert decision wins over engine default) status DESC (verified > implemented > planned) created_at DESC (newest beats older on otherwise-equal rows) and deletes the losers (Bremsscheibe: 5 rows; total: 45). 2. Adds UNIQUE constraint iace_mitigations_hazard_name_uniq (hazard_id, name). Store-Layer (CreateMitigation): INSERT … ON CONFLICT (hazard_id, name) DO NOTHING RETURNING id. pgx.ErrNoRows from RETURNING → look up the existing row and return that. Callers (engine init + manual add) always get a usable Mitigation; the second click is silently swallowed instead of failing. Frontend dedupe in groupByTitle stays — it covers any pre-existing duplicates that survived the migration in edge cases (multi-row write in flight, etc.). With the UNIQUE constraint live, the in-memory dedupe is a belt-and-suspenders safety net rather than the load-bearing mechanism. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 19:55:13 +02:00
Benjamin Admin	662327e8b4	feat(compliance-check): MC-Classification + Embedding + Vendor-Redundanz + Action-Recipes + Borlabs-Features CI / nodejs-build (push) Successful in 2m47s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / detect-changes (push) Successful in 10s Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 16s Details CI / loc-budget (push) Failing after 17s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-python-backend (push) Successful in 42s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-go (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details Massiv-Update auf Basis BMW-Test-Iterationen (v1→v9): Core Compliance-Check - Sonnet check_type Klassifikation: text/process/review fuer alle 1874 MCs in compliance.doc_check_controls (script + Sidecar /data/mc_classification.db). rag_document_checker filtert auf check_type='text' fuer doc_check. Plus fits_doc_type-Audit (v2) + ui_only-Audit fuer DSA/E-Commerce-MCs in falscher doc_type-Schublade. - scope_requires-Filter: biometric/ai_decision/child_targeting MCs werden per business_profile gefiltert (FRT skipped fuer BMW etc.). - Embedding-Match (BGE-M3) als Phase-3 nach Regex-Match: Per-doc_type-Threshold-Override (impressum 0.50, dse/cookie 0.60), Short-Field-Rescue (15-Wort-Chunks) fuer Pflichtfelder im Impressum. Title+check_question als Embedding-Input fuer mehr Kontext. - Cookie-Text-Routing: consent-tester gibt cmp_cookie_text aus dem CMP-Reconstruct zurueck, Backend bevorzugt das gegen DOM-Extraction wenn richer (BMW 1824 vs 600 Worte). Vendor-Redundanz + EU-Alternativen + Cost-Saving - vendor_redundancy.analyze() — funktionale Kategorisierung der CMP-Vendors, Detektion von Mehrfach-Anbietern pro Kategorie, EU-Alternative-Lookup (Matomo, IONOS, HERE, Friendly Captcha, Smart AdServer, ...). - vendor_cost_estimator: Tier-Inferenz aus Cookie-Footprint (Cookie-Anzahl + Premium-Feature-Cookies + Third-Party-Quote → starter/professional/ enterprise/premier). - Self-Service-Werbung (Google/Meta/Pinterest/...) = 0 Lizenz-Kosten (nur Media-Spend, separat). DSP-Plattformen behalten enge Range. - Tier-aware Saving-Range: bei Enterprise/Premier nutzen wir den oberen 40-100%-Band der Listpreise, nicht starter→premier. - Multi-Function-Tools (Matomo Pro, SAP CX, IONOS Cloud, Userlike, Smart AdServer, HERE Maps, Vimeo Pro, LamaPoll) — ein Tool ersetzt mehrere Kategorien gleichzeitig. Cookie-Wissens-DB + Funktionale Klassifikation - cookie_knowledge_db: 50 kuratierte Top-Cookies (Google/Meta/Adobe/MS/...) mit vendor, exact_purpose, data_collected, IAB-TCF-IDs, reid_risk, schrems_ii_status, EuGH-Urteile, EU-Alternative. - cookie_function_classifier: pro Cookie funktionale Rolle (tracking_id, ad_pixel, session_id, ab_test, csrf, ...) + blocking_impact. Country-Inferenz aus Rechtsform - cookie_link_validator: Country-Field wird aus Vendor-Name abgeleitet (A/S=DK, GmbH=DE, Inc=US, B.V.=NL, ...) plus Vendor-Lookup-Table. Reduziert false-positive no_country-Flags bei eindeutig-EU-Vendors (Adform DK, Pinterest IE). Action-Recipes + Doc-Anchor-Locator - finding_action_recipes: pro Finding-Typ (no_cookies_listed, no_country, broken_opt_out, "Auftragsverarbeiter erwaehnen", "Art. 22 Profiling", ...) eine strukturierte Anweisung mit what/why/fix_text/where/example. Zum 1:1-Einfuegen in Kunden-Dokumente. - doc_anchor_locator: Embedding-basiert (BGE-M3 cosine) — sucht den passenden Absatz im existierenden Kundendokument fuer jeden Finding. Per-Run Thread-Local-Cache. Fallback: keyword-Match. - Email-Rendering integriert Recipe + Anchor pro Doc-Pruefungs-Fail + Vendor-Flag-Liste mit aufklappbarer Action-Liste. - Score-Erklaerung pro Vendor-Zeile (3/5-Untertitel + Tooltip). Migration-Pipeline (Compliance-Check -> Customer Banner/Documents) - migration_to_banner.py: Vendor-Liste -> CookieBannerConfig mit 4 Kategorien + Review-Flags. - migration_to_document.py: Vendor-Liste -> Cookie-Policy + VVT-Register + Privacy-Policy-Pre-Fills. - agent_migration_routes: 3 Preview-Endpoints (banner-preview, document-preview, summary). Persistierung der cmp_vendors in /data/compliance_audits.db check_payloads-Tabelle. Borlabs-Parity Cookie-Banner-Features - Consent-Historie im Banner: window.bpShowConsentHistory() + localStorage. - Content-Blocker: cookie-banner-content-blocker.ts — YouTube/Maps/Video Placeholder bis Einwilligung. - Google Consent Mode v2 erweitert: wait_for_update + region=EEA/CH/GB. - Consent-Log Export (CSV/JSON) per einwilligungen_export_routes. Bug-Fixes - canonical_control_routes: _jsonish-Helper fuer string-typed jsonb, similar-controls-Endpoint mit _has_embedding_col()-Cache (kein 500 mehr). - Control-Library Frontend: defensive .map-Coercer in 2 Detail-Views. - Embedding-Service-Batching (32er Batches statt 165 in einem Call). - KeyError 'control_id' in MC-Result-Aggregation (defensive .get). - Master-Controls-Klick-Through von /sdk/master-controls auf /sdk/control-library?control=<id> mit URL-Param-Auto-Open. - Dockerfile: /data pre-chowned auf appuser (Audit-DB-Schreibrecht). - Cookie-Text-Routing-Bug (cmp_reconstructed > DOM-extraction). - doc_type-aware MC-Filter (statt all-text-MCs). - Master-Contract-Dedup (60 BMW-Internal-Eintraege = 1 Adobe-Vertrag). - A3-v2-Audit hat 24 UI-Sprache-MCs als 'process' reklassifiziert. Tests - test_migration_mappers.py (9 Tests) - test_migration_endpoints.py (4 Tests) Skripte (one-shot) - classify_mc_check_type.py (v1) + _v2 (PK=control_id,doc_type) - audit_mc_doctype_fit.py (v1 fits) + _v2 (ui_only + scope_requires) BMW-Run-Bilanz v1 (broken) -> v9 (alle Fixes): DSE 7,5% -> 81-83% Impressum 4% -> 100% (6 echte MCs alle erfuellt) Cookie 0% -> 79-83% (CMP-Text-Routing + Embedding) Plus: 10 Konsolidierungs-Kategorien, geschaetzte Saving 200k-3M / Jahr Plus: Action-Recipes + Doc-Anchors fuer jeden Fail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 18:30:08 +02:00
Benjamin Admin	52fb8b91e7	Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-compliance CI / detect-changes (push) Successful in 11s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 16s Details CI / loc-budget (push) Failing after 15s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m56s Details CI / test-go (push) Successful in 58s Details CI / iace-gt-coverage (push) Successful in 31s Details CI / test-python-backend (push) Successful in 44s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details	2026-05-18 18:09:39 +02:00
Benjamin Admin	1cf5de1d45	feat(cra): CRA Compliance module Phase 1+2+3 (intake, scope, path, requirements, backlog, sbom, checks) Phase 1 — Intake + Scope + Path: - Migration 119: compliance_cra_projects table (intake + classification + path + status state machine) - Backend service cra_routes.py: CRUD + scope-check + path-select - Deterministic Annex III/IV classifier (verbatim mapping from migration 059 wiki) - Path validation per classification (CRITICAL → notified_body mandatory) - Frontend: project list, dashboard, 3-step wizard (intake/scope/path) - Sidebar entry under "CRA Compliance" (red) Phase 2 — Annex I Requirements + Priorisierungs-Backlog: - cra_annex_i_data.py: 40 Annex-I requirements (8 categories), 9 measures (M540-M548), 3 CRA deadlines - Endpoints: /requirements (40 items), /backlog (priority-sorted with deadline pressure) - Frontend: requirements table with filters + expandable details, backlog with deadline banner + score-ranked table - Dashboard KPI cards (Critical count, days to CE deadline, etc.) + top-10 backlog snippet Phase 3 — SBOM Upload + Automated Checks: - Migration 120: compliance_cra_sboms (versioned uploads, CycloneDX + SPDX) - SBOM endpoints: POST /sbom/upload (format detection, summary extraction), GET /sboms - Checks reuse compliance_evidence_checks: init creates 6 default CRA checks, run executes - Real implementations: cra_security_txt (HTTP + Contact: line) and cra_tls_cert_check (TLS handshake) - Frontend: SBOM file upload + version list, Checks page with per-check URL input + Run button Backend-Reuse: gap_projects (intake pre-population), compliance_evidence_checks/_check_results. Tenant scoping via existing X-Tenant-ID header pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:56:52 +02:00
Benjamin Admin	3faa312b31	feat(iace/verification): derived view on relevant mitigations + 2 actions Task #21. The verification page used to manage a separate VerificationItem entity that the expert had to populate by hand — disjoint from the actual mitigations list. With the is_relevant flag from migration 029, the verification step has a natural definition: confirm completion for every mitigation the expert flagged as relevant for this project. Page is now a derived view on useMitigations(): filter is_relevant=true, group by title (same dedupe as Massnahmen page), expose two actions per hazard×mitigation row: 1. "Kundenstandard" — already implemented at the customer's site, no evidence file required. Sets is_customer_standard=true and status='verified'. 2. "Verifizieren…" — opens a modal asking for a textual evidence reference (Prüfprotokoll-Nr, audit reference, etc.). Calls the existing POST /mitigations/:mid/verify with verification_result. File upload is deferred to phase 2 once an object-storage backend is in place — the modal explains this. When a row is verified, a "Zurücksetzen" link reverts status to 'implemented' for accidental confirmations. Header counters: total relevant / open / verified / Kundenstandard. Maßnahmen-page polish (same commit): - "Lösch."-column header removed — the trash icon is self-explanatory - groupByTitle now additionally deduplicates by hazard_id within a group (engine occasionally emits duplicate (name, hazard_id) pairs when Reinit is clicked twice; a follow-up migration 030 will add a UNIQUE constraint to prevent these upstream) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 14:49:56 +02:00
Benjamin Admin	8f4f59f0e3	feat(iace/mitigations): is_relevant + is_customer_standard flags [migration-approved] Expert-driven workflow refinement on the Massnahmen page. The engine seeds ~80 mitigations per project, but for a concrete customer site most need a relevance decision before they're meaningful in verification: status: 'planned' \| 'implemented' \| 'verified' (existing — verification track) is_relevant bool (new) (does this apply to this site?) is_customer_standard bool (new) (already in place at customer — no evidence) Decision flow on the Mitigations tab: Engine-seeded → is_relevant=false (Default, waiting for expert) Expert checks "Relevant" → is_relevant=true → surfaces in verification Expert clicks trash → DELETE (banner warns: do not click Reinit afterwards or seeds come back) In verification, customer_standard=true bypasses evidence upload is_customer_standard implies is_relevant (DB CHECK constraint). Migration 029_iace_mitigation_relevance.sql: ALTER TABLE iace_mitigations ADD COLUMN is_relevant ..., is_customer_standard ... + CHECK constraint + partial index on is_relevant for the verification page's filter. Backend (Go): - Mitigation struct gains two bool fields - CreateMitigation: defaults to false/false (engine-seeded mitigations start unbewertet) - UpdateMitigation: new case clauses for both keys; setting is_customer_standard=true auto-flips is_relevant=true to satisfy the CHECK constraint - All three SELECT statements (ListMitigations, ListMitigationsByProject, getMitigation) extended with the two new columns Frontend: - Maßnahmen-page columns: [Relev. ☑] [Lösch. 🗑] Title \| #Hazards \| P·I·V - Group-header checkbox shows tri-state (indeterminate when partial), flips all instances in the group at once - Banner above the table: "Markiere jede Maßnahme als Relevant oder lösche sie. Nach Löschen kein Neu initialisieren mehr drücken." - Relevant rows tinted emerald, customer-standard label visible - Legacy bulk-select state + helpers removed (the Relevant checkbox now IS the primary mass action) - useMitigations gains handleSetRelevant, handleSetCustomerStandard, handleDeleteSilent (for non-confirm bulk deletes) Future use: is_customer_standard mitigations from a prior project at the same customer can later be auto-suggested when commissioning the next plant — turning expert knowledge into reusable customer-profile data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 14:35:56 +02:00
Benjamin Admin	df7d83134b	feat(agent): migrate compliance-check results to banner + documents (M1-M5) After a compliance-check run finishes, the user can now apply the extracted vendor inventory directly to their own: - CookieBanner config (admin /sdk/einwilligungen) - Cookie-Policy / VVT-Register / Privacy-Policy templates (admin /sdk/document-generator) Backend: - migration_to_banner.py: vendor list -> CookieBannerConfig with ESSENTIAL/PERFORMANCE/PERSONALIZATION/EXTERNAL_MEDIA buckets + review flags (broken opt-out URLs, missing expiry, no cookies listed) - migration_to_document.py: vendor list -> pre-fills for 3 doc templates, recipient-type aware (INTERNAL/GROUP/PROCESSOR/CONTROLLER) - agent_migration_routes.py: GET /banner-preview, /document-preview, /summary keyed on check_id - compliance_audit_log: new check_payloads table persists cmp_vendors + extracted_profile so the preview survives an app restart - tests: 9 mapper units + 4 endpoint integration tests Frontend: - MigrationPanel.tsx: modal showing banner-config diff + document pre-fills, plus links into the existing editors - ComplianceCheckTab.tsx: replaces standalone audit link with the panel; net -3 lines, stays at the 500-cap Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 14:06:28 +02:00
Benjamin Admin	f4c9cea770	feat(iace/mitigations): group measure rows by title, collapse 21x→1 row The "Maßnahmen" page in the Bremsscheibe project showed a flat list with heavy redundancy — e.g. "Sicherheitszeichen nach ISO 7010" appeared on 21 separate rows, one per linked hazard. Same for "Gefahrenpiktogramme", "Flucht- und Rettungswege" etc. The signal got lost in the noise. This is a presentation-only regrouping. Each Hazard×Mitigation pair stays a separate DB row with its own status, notes and edit history (option B from the discussion: instances remain independently editable). The page now collapses rows that share the same `m.title` into one group row. Group row shows: - title + ISO 12100 sub-category (if encoded in description) - count of linked hazards on the right - compact status distribution "P · I · V" (Planned/Implemented/Verified) - shared checkbox that selects all instances in the group Click expands the group and reveals the individual hazard×measure rows, each with its own StatusBadge and detail-expand for MitigationHints. State additions: - expandedGroup: Set<string> with keys `${type}:${title}` so the same title across different reduction stages stays independently togglable - groupByTitle() helper trims the title, falls back to "(ohne Titel)" - statusCounts() helper for the P·I·V breakdown Pagination semantics swapped from 50 instances/page to 50 groups/page — makes the list far easier to scan at the ~80-instance scale this project exhibits. LOC: 267 → 346 (well under the 500 hard cap). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 13:50:45 +02:00
Benjamin Admin	6ed30dae5b	feat(agent): MC scorecard + audit drill-down + tenant trend (A1-A6) Now that all 1874 MCs run per check (Task #30 cap removal), the report was about to drown in noise. This commit adds the full aggregation / persistence / drill-down stack so each MC is actionable, not just counted. A1 mc_scorecard.py (new): build_scorecard(checks) -> per-regulation PASS/FAIL/SKIP + severity top_fails(checks, n) -> N most severe failed MCs full_audit_records(...) -> flat rows ready for sidecar SQLite A2 Email rendering: agent_doc_check_scorecard.py (new) builds an HTML scorecard table (regulation × passed/failed/HIGH/MEDIUM/score) shown at the top of the email. agent_doc_check_report._render_document now collapses the 500-MC L2 forest into 'X/Y bestanden (Z Fail)' summary plus a top-10 fails block per doc — old verbose render is gone. A3 compliance_audit_log.py (new) — sidecar SQLite at /data/compliance_audits.db (separate from compliance Postgres schema to comply with the no-new-migrations rule in CLAUDE.md): check_runs(check_id, ts, tenant_id, site_name, base_domain, doc_count, scorecard json, vvt_summary json) mc_results(check_id, doc_type, mc_id, label, passed, skipped, severity, regulation, matched_text, hint) Route persists every run after the email is sent. docker-compose.yml adds compliance-audit volume + env. A4 backfill_mc_regulation_llm.py (new) — Qwen-tagged backfill for the 1636 MCs the regex pass couldn't classify. Batches of 25, format=json, output constrained to the canonical regulation list. Run manually: docker exec bp-compliance-backend python3 \ /app/scripts/backfill_mc_regulation_llm.py [--dry-run] A5 Admin audit tab — GET /api/compliance/agent/audit/<check_id> proxied via /api/sdk/v1/agent/audit/<id>. New page /sdk/agent/audit/[checkId] renders scorecard + filterable MC table (status / doc_type / regulation, expandable rows with matched_text + hint). ComplianceCheckTab now shows 'Voll-Audit oeffnen' link. A6 Trend per tenant — GET /api/compliance/agent/audit/tenant/<id> returns recent runs. Email scorecard shows per-regulation delta badges ('(+12%)', '(-3%)') compared with the previous run for the same tenant + base_domain. Lookup is one SQLite query. Plumbing: rag_document_checker.py — SELECT now includes 'article'; MC results carry 'regulation' + 'article' through to CheckItem. agent_doc_check_routes.CheckItem schema gains regulation + article fields (defaults '') so old clients still parse. agent_compliance_check_routes — response gains 'check_id' so the frontend can build the audit link.	2026-05-17 13:45:58 +02:00
Benjamin Admin	6d29191e9b	fix(vvt): score INTERNAL/GROUP without opt-out/privacy penalty User feedback after BMW test: - 60 'BMW AG — XYZ' rows were rendered as ✗ for Opt-Out/Privacy and scored 38-52%. That's misleading: BMW processing for itself doesn't need a separate opt-out URL (cookie-banner is the consent mechanism) or a separate privacy policy (main DSI covers it). - Title 'Anbieter' was wrong for 60 of 90 rows (internal services). Three orthogonal fixes: 1. score_vendors becomes recipient_type aware: - INTERNAL/GROUP_COMPANY: opt_out_url, privacy_policy_url, country are NOT required (the user's main DSI + cookie-banner cover them). What IS required: name, purpose, cookies disclosed with name + expiry. Cookies-disclosure weight raised to 50 (was 15) so the VVT-relevant data is the score driver. - 'necessary' category: opt-out still skipped (§25 Abs. 2 TDDDG). - External (PROCESSOR/CONTROLLER): existing strict scoring stays. 2. _link_status_badge accepts na_label and renders a neutral em-dash with explanation tooltip instead of red ✗ when the column doesn't apply to that row. _render_vendor_row_full passes na_label based on recipient_type: - INTERNAL/GROUP -> 'Nicht erforderlich (eigene Verarbeitung)' - necessary -> 'Nicht erforderlich (§25 Abs. 2 TDDDG)' 3. Header + summary clarify the split: - h3 changed to 'Verarbeitungstaetigkeiten und Empfaenger aus der Cookie-Richtlinie' (was 'Drittanbieter aus Cookie-Richtlinie'). - Top line: '90 Verarbeitungen erfasst — 60 eigene + 30 externe Empfaenger'. - Disclaimer below: explains the INTERNAL/GROUP exemption so the reader understands why those rows don't show ✗ for missing URLs. - Section labels enriched with the relevant DSGVO article: 'Eigene Verarbeitungstaetigkeiten — fuer das VVT (Art. 30)', 'Auftragsverarbeiter — AVV erforderlich (Art. 28)', 'Joint Controller — Vereinbarung pruefen (Art. 26)'. Expected BMW result after fix: ~85% of the 60 BMW-AG rows jump from ~52% to 90-100% (the real issue, fehlende Cookies-Disclosure, stays flagged). The only true findings remaining are external links that return 4xx (e.g. Criteo 403, Teads 404).	2026-05-17 13:15:40 +02:00
Benjamin Admin	8a44e67293	feat(compliance-check): unlock all 1874 MCs + close gap-table items User: 'wir haben 1800 MCs erstellt um sie zu 10% zu nutzen — das ist Schwachsinn'. Fixed all 6 gaps from the audit. #1 max_controls=0 (was 20): - agent_compliance_check_routes _check_single: passes max_controls=0 to check_document_with_controls -> ALL MCs evaluated per doc_type. - 8 doc_types now use 1874 MCs instead of 160 (10x coverage). - Regex matching is cheap (<1s per doc); LLM-enrich cap of 10 stays. #2 LLM-verify fixed: - llm_verify.py was getting 0/N parsed. Causes: qwen3 thinking-mode wrapped output in <think>...</think>, /api/generate doesn't enforce JSON, prompt didn't handle code-fence wrappers. - Now uses /api/chat with format='json' (forces valid JSON). - _parse_batch_response strips <think> tags, accepts {results:[...]} AND bare [...], adds richer regex-fallback parse, logs raw head on total parse failure for diagnosis. #3 Loeschkonzept checklist (new): - doc_checks/loeschkonzept_checks.py — 9 L1 + 7 L2 checks per DIN 66398 + Art. 5(1)(e)/17/32 DSGVO: scope+responsibility, data categories, retention periods, legal basis refs (HGB/AO/BGB), deletion trigger, deletion process+technical+systems, deletion proof, exceptions + Art. 18 lock, review cycle, DSGVO references. - runner.py registered for loeschkonzept/loeschung/loeschfristen. #4 regulation backfill script: - backend-compliance/scripts/backfill_mc_regulation.py — regex-detects DSGVO/TDDDG/TMG/BGB/HGB/AO/MStV/UWG/VSBG/PAngV/GwG/BDSG/EU-VO references in MC title+question+pass_criteria, UPDATEs regulation + article fields. - Idempotent (only NULL rows), --dry-run flag, batched 200/UPDATE. - Run inside container: docker exec bp-compliance-backend python3 \ /app/scripts/backfill_mc_regulation.py #5 MC alias-fallback: - rag_document_checker._MC_ALIAS_FALLBACK maps doc_types without own MCs to a related set: nutzungsbedingungen->agb, social_media->dse, sub_processor/scc/tom_annex->avv, loeschfristen->loeschkonzept, eu_institution/dsb->dse. - _load_controls retries with the alias when the primary query returns 0 rows. - 14 additional doc_types now get MC coverage transparently. #6 cross-domain auto-discovery: - _autodiscover_missing builds a crawl plan: primary submitted base + up to 2 related domains sharing the owner SLD (e.g. BMW Group: bmw.de + bmwgroup.com + bmwgroup.jobs). - Detection: regex over submitted texts for https?://...<owner>... hostnames distinct from the primary base. - Each crawled base contributes documents + cmp_payloads to the discovery pool. Net effect for BMW: 1874 MCs evaluated (90 from cookie alone, was 20), Loeschkonzept Pflichtangaben benoten-bar, LLM overturns false regex FAILs, Joint-Controller policies on bmwgroup.jobs (Social Media) jetzt entdeckbar. Same wins will apply to CRA-Compliance check.	2026-05-17 13:07:50 +02:00
Benjamin Admin	fab1e35847	feat(vvt): recipient-type classification + 3-section VVT table Per user request: BMW (and others) put their own services AND external vendors in the same cookie-policy widget. The VVT-Tabelle now groups them by Art. 30(1)(d) DSGVO recipient category so the DSB can act on the right buckets: - INTERNAL — owner processing for itself ('BMW AG — XYZ') - GROUP_COMPANY — same brand family, different legal entity ('BMW Bank') - PROCESSOR — Auftragsverarbeiter, AVV-pflichtig (Adobe, Akamai) - CONTROLLER — independent / joint controller (Meta Pixel, Google Ads, LinkedIn — they run their own profiles) - AUTHORITY — government bodies (rare in cookies) - OTHER — fallback New module vendor_classifier.py: - owner_from_url(url) — derive site-owner token (bmw.de -> 'BMW', mercedes-benz.de -> 'Mercedes-Benz') - classify(name, category, owner) — strict 5-tier heuristic: * INTERNAL: vendor name first-token is '<Owner>' / '<Owner> AG' / '<Owner> SE' / '<Owner> GmbH' / '<Owner> AG & Co. KG' * GROUP_COMPANY: starts with '<Owner> ' but isn't '<Owner> AG' * CONTROLLER: matches a known joint-controller list (Meta, Google Ads, YouTube, LinkedIn Insight, TikTok, Pinterest, Taboola, Outbrain, Criteo, Twitter, Reddit, ...) * PROCESSOR: legal-form suffix in name (GmbH, AG, Inc., A/S, B.V., S.A., Ltd., LLC, ...) * OTHER: anything else vendor_extractor.extract_vendors_from_payloads now takes owner_name: - Passes it through to classify() for every extracted vendor record - The route derives owner_name via _company_name_from_url(doc_entries) - LLM-extracted vendors are classified the same way (so V3 fallback also produces tagged records) agent_doc_check_extras.build_vvt_table_html rewritten: - Buckets vendors by recipient_type - Renders one section per non-empty bucket, in canonical order (RECIPIENT_TYPE_SECTIONS), each with section header + count + bad count + nested table - Within each section: sorted by compliance_score ascending - Response JSON cmp_vendors includes recipient_type so the frontend can later import per-category into the VVT module Expected BMW result: ~60 INTERNAL rows (BMW AG own services), ~25 PROCESSOR rows (Adobe, Adform, Akamai, AWS, ...), ~5 CONTROLLER rows (Meta Pixel, Google, LinkedIn, Pinterest, Outbrain, Taboola).	2026-05-17 12:31:49 +02:00
Benjamin Admin	6c7d4c7552	fix(vvt): correct ePaaS schema mapping + category-aware scoring The first BMW VVT table rendered all 24 providers at 20% score because the ePaaS extractor was reading the wrong field names. Actual schema is nested: providers[].processings[].persistences[], NOT providers[] alone. Correct ePaaS schema (verified against bmw.com/epaas/.../de_DE.epaas.json): Provider: {id, name, description, processings[]} Processing: {id, name, description, categoryId, optOutLink, privacyPolicyLink, persistences[]} Persistence: {id, name, domain, type, expiry, description} Two structural changes: 1. One row per processing (not provider). BMW has 26 providers but ~91 processings spread across them (Adobe alone has ACMProcessing, AdobeAnalytics, AdobeCampaign, AdobeTargetAnalytics, AdobeTargetPers.). The cookie widget displays each processing separately — VVT now mirrors that. Display name format: 'Provider Name — Processing Name'. 2. Read optOutLink/privacyPolicyLink from PROCESSING (where they live), not provider. Persistences flatten to cookies[] with name + expiry + description. Plus category mapping: advertising -> marketing strictlyNecessary -> necessary statistics -> statistics functional -> functional Category-aware scoring (cookie_link_validator.score_vendors): - 'necessary' (technisch erforderliche, §25 Abs. 2 TDDDG): no opt-out required, no country required. Score weight shifts to purpose + cookie disclosure (essential cookies must list names + expiry). - All other categories: opt-out URL still mandatory; missing opt-out flags 'no_opt_out_url' and zeros that block of points. Expected BMW result after this fix: - ~91 rows (Adobe Analytics, Adform Retargeting, Akamai Infrastructure, AWS, ..., plus ~60 strictlyNecessary processings) - Marketing rows with present opt-out → ~75-90% - Necessary rows with cookie+expiry → ~85-95% - Rows missing fields → still flagged	2026-05-17 11:19:31 +02:00
Benjamin Admin	189918b043	fix(cmp): stricter heuristic + only replace DOM when CMP is strictly larger Two bugs observed in BMW BMW test run: 1. Generic JSON heuristic captured /de-de/login/bmw/api/flyout/data (4KB, user login fly-out data) and reconstruct_generic produced 56 words of noise. The CMP-prefer logic then 'replaced' the 185-word imprint DOM extraction with those 56 words because self_wc(185) < 300 — even though cmp_wc(56) < self_wc(185). 2. The strict prefilter list was too short. Login/auth/cart endpoints often have category-shaped JSON without being cookie policies. Fixes: - dsi_discovery: replace DOM with CMP only when cmp_wc > self_wc AND meets one of the existing conditions. Tiny captures can no longer silently destroy a bigger DOM extraction. - cmp_extractor: skip non-cookie URLs (/login, /auth, /user, /session, /cart, /checkout, /search, /flyout, /menu, /nav, /translation, /i18n, /locale, /feature-flag). - cmp_extractor: require ≥5KB payload size — real CMP policies are always larger (BMW ePaaS is ~393KB). Tiny matches drop out before reconstruction.	2026-05-17 10:50:19 +02:00
Benjamin Admin	873997c13b	feat(vvt): V3 — LLM vendor extraction fallback for unknown CMPs When the cookie text has no captured CMP payload (long-tail sites that don't use ePaaS/OneTrust/Cookiebot/etc.) we now fall back to a Qwen → OVH LLM cascade to extract a structured vendor list from the policy text. New module backend/compliance/services/vendor_llm_extractor.py: - extract_vendors_via_llm(cookie_text): runs Qwen first (local Ollama), then OVH if Qwen returns nothing usable. - System prompt instructs the model to return STRICT JSON only: {vendors: [{name, country, purpose, category, opt_out_url, privacy_policy_url, persistence, cookies: [...]}]} - Lenient JSON parser tolerates code-fences, prose wrappers, dict vs list. - _normalize() caps array sizes (80 vendors, 30 cookies each), validates URLs (must be http(s)), trims fields to reasonable lengths. Route integration (agent_compliance_check_routes.py): - After named-CMP extract: if cmp_vendors is empty AND the cookie text has ≥500 words (otherwise it's likely navigation chrome), invoke the LLM extractor. Progress message 'Vendor-Liste per LLM extrahieren...'. - Vendors then run through the same validate_vendor_urls + score_vendors pipeline → VVT table rendered identically regardless of source. docker-compose.yml: backend-compliance gains OLLAMA_URL, CMP_LLM_MODEL, OVH_LLM_URL/KEY/MODEL env vars (same names as consent-tester so the configuration is unified). This closes the 'every site eventually gets a VVT table' goal: - Known CMP → V1/V2 structured extraction (fast, exact) - Unknown CMP → V3 LLM extraction (slow, best-effort) - No text at all → no vendors, but other compliance checks still run.	2026-05-17 09:55:42 +02:00
Benjamin Admin	9c0cc0f59f	feat(vvt): V2 — vendor extractors for Cookiebot/Usercentrics/Didomi/TrustArc Backend vendor_extractor.py gets 4 new per-CMP dispatchers, mirroring the JSON schemas observed in each platform: - Cookiebot: 'Categories[].Cookies[]' with Vendor/Host, expiry, purpose - Usercentrics: 'services[]' with cookieMaxAgeSeconds, processingCompanyCountry - Didomi: 'app.vendors[]' with country + policyUrl - TrustArc: 'vendors[*]' + per-category 'Cookies' with provider All 6 named CMPs (ePaaS, OneTrust, Cookiebot, Usercentrics, Didomi, TrustArc) plus the generic-shape fallback are now mapped — every site hitting Phase B of the cascade gets a structured vendor list, scored opt-out links, and a VVT-Tabelle in the email.	2026-05-17 09:52:10 +02:00
Benjamin Admin	ea4dbb223f	feat(vvt): per-vendor extraction + opt-out check + VVT table in email (V1) When a known CMP (ePaaS, OneTrust) renders the cookie policy, we now extract structured vendor records, probe their opt-out + privacy URLs, score each vendor (0-100), and append a 'VVT-Vorschlag' table to the compliance email — one row per vendor, sortable by compliance score. consent-tester: - DSIDiscoveryResult.cmp_payloads: surfaces raw CMP JSON to callers - DSIDiscoveryResponse: new cmp_payloads field - discover_dsi_documents sets cmp_payloads from cmp_capture - cmp_library/{epaas,onetrust}.py: new extract_vendors(d) returning list[VendorRecord] backend: - _fetch_text() now returns (text, cmp_payloads) tuple - doc_entries store cmp_payloads per doc (mostly cookie) - _autodiscover_missing forwards homepage payloads to the cookie entry - New module vendor_extractor.py: dispatches ePaaS/OneTrust/generic schemas; dedupes vendors across multiple payloads - cookie_link_validator.py extended with validate_vendor_urls(vendors) and score_vendors(vendors) — 0-100 score per vendor based on name, purpose, country, opt-out reachable, privacy URL reachable, cookies with names + expiry - agent_doc_check_extras.build_vvt_table_html: renders the table - Route appends VVT HTML after the provider list, before the document-by-document report - Response JSON gains cmp_vendors for future frontend rendering Example for BMW: ~30 ePaaS providers → table with Name \| Kategorie \| Sitz \| Cookies \| Opt-Out (✓/✗) \| Privacy (✓/✗) \| Score. Sorted by score ascending so the worst-compliant vendors are at the top.	2026-05-17 09:50:11 +02:00
Benjamin Admin	c9c0fb5965	feat(cookie-check): enhanced patterns + active opt-out link validator cookie_checks.py: - cookie_names_listed: now also matches CMP placeholder notation (BMW: 'Adfpc###', 'CT###') and 'Diese Datenverarbeitung verwendet die folgenden Cookies oder ähnliche Technologien' as list-shape signal. Cryptic vendor names like 'audience', 'adformfrpid' are accepted via the surrounding markup, not by hard-coding each one. - cookie_providers_named: new pattern 'Gesetzt von: <Firma>' (BMW/ePaaS per-cookie vendor naming) + recognition of full legal-form names (Adform A/S, BMW AG, Adobe Systems Software Ireland Limited). - cookie_duration_values: now matches 'Ablauf: 1 Jahr' / 'Speicherdauer: 30 Tage' (BMW format) in addition to the legacy '<n> <unit>'. New L1 + L2 checks for controller in cookie-policy: - cookie_controller (L1): the cookie policy must name Verantwortlich(er) - cookie_controller_address (L2): PLZ + Ort or address keywords - cookie_controller_contact_or_link (L2): email/phone OR link back to Datenschutzerklärung (the practical equivalent — BMW does this) New L2 checks (parented under opt_out): - cookie_optout_links: detects per-provider opt-out URLs in the text - cookie_privacy_policy_links: per-provider privacy-policy URLs New service: cookie_link_validator.py - extract_links(text): pulls all https?://… URLs that follow 'Opt-Out Link:' / 'Link zur Privacy Policy:' (deduped) - validate_links(links): probes every URL concurrently (HEAD first, GET fallback for 405/403). 10 parallel, 8s per request, 60s batch cap. Returns reachable=True/False + status + final_url. - build_check_items(): renders 2 CheckItems (opt-out + privacy-policy), each pass if ALL links 2xx/3xx, fail with up-to-5 broken-link examples. Hook in _check_single: doc_type=='cookie' triggers the validator after regex+MC checks. Recomputes correctness with the new L2 items. This addresses two concrete BMW observations: 1. BMW's per-cookie structure (Name + Zweck + Ablauf, Gesetzt von: …, Opt-Out Link: …) now recognised → 'Konkrete Cookie-Namen aufgelistet' and 'Konkrete Speicherdauern' should pass. 2. Defective opt-out URLs surface as compliance findings rather than silently passing — Art. 7(3) DSGVO requires a working withdrawal path per provider.	2026-05-17 09:38:32 +02:00
Benjamin Admin	4a5924b8c4	feat(iace): CRA / DIN EN 40000-1-2 cyber-resilience spur [guardrail-change] Phase 18 adds an EU Cyber Resilience Act compliance track to IACE: the engine now fires patterns that surface the manufacturer-side CRA obligations whenever a project's components carry digital elements. Patterns (HP1910-HP1918, hazard_patterns_cra.go): HP1910 Missing SBOM HP1911 Unsigned firmware/software updates HP1912 Factory-default credentials still active HP1913 No coordinated vulnerability disclosure (CVD) policy HP1914 No documented security patch SLA HP1915 Missing user-facing hardening guide HP1916 No incident-notification process to ENISA / CSIRT HP1917 No security assessment prior to placing on market HP1918 AI component without cybersecurity risk assessment Each pattern carries ClarificationQuestionsDE so the operator gets auditor-grade questions to take back to the Anlagenbauer instead of the engine inventing prose. PatternMatch carries DefaultAvoidability (P=1 for all CRA patterns), feeding the PLr graph from Phase 17. Measures (M540-M548, measures_library_cra.go): M540 SBOM (SPDX or CycloneDX) with each machine release M541 Signed updates with rollback protection M542 Forced default-password change at first boot M543 Published CVD policy (security.txt / PSIRT) M544 Documented patch SLA with CVSS-tier response times M545 User-facing hardening guide in the machine docs M546 ENISA incident-notification process (24h/72h/14d) M547 Authenticated update channel + integrity check M548 Pre-market security assessment / pen-test The library is urheberrechtlich neutral: identifiers only (Verordnung (EU) 2024/2847, DIN EN 40000-1-2 Entwurf, IEC 62443, ETSI EN 303 645, ISO/IEC 5962, ISO/IEC 29147). No normative text is reproduced — DIN/Beuth proprietary content is referenced by section number only. Category-compatibility: cyber_resilience pattern category accepts measures with HazardCategory cyber_resilience, cyber_network, or software_control. Updated in both the runtime helper (iace_handler_init_helpers.go) and its test-mirror (pattern_coverage_test.go) — both must move in lockstep. Frontend (clarifications page): When at least one clarification references "2024/2847" or "40000-1-2" in its norm_references, a blue info-banner is rendered at the top of the page: "Cyber Resilience Act (CRA) — Hinweis zur Geltung Diese Klärungsliste enthält Fragen zur Verordnung (EU) 2024/2847 (CRA). Die CRA gilt für Produkte mit digitalen Elementen, die ab dem 11.12.2027 auf dem EU-Markt bereit- gestellt werden. ..." Reminds the user that the CRA pflichten are forward-looking while still allowing the manufacturer to bake them in now. LOC exceptions: Added three pre-existing files to .claude/rules/loc-exceptions.txt (manufacturer_safety_features.go, iace_handler_clarifications.go, routes.go). All three grew across Phases 16-17 and are tagged as Phase 5+ refactor backlog. [guardrail-change] marker required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 02:15:51 +02:00
Benjamin Admin	2afa5a179b	feat(iace): Risikograph EN ISO 13849-1 PLr + Methoden-Kopf im Bericht Phase 17 of the risk-assessment polish. Two pieces: A) PLr per EN ISO 13849-1 Anhang A (Risikograph) - HazardPattern.DefaultAvoidability (1 = P1, 2 = P2). Optional; defaults to P1 if unset (conservative — operator can raise after review). - ComputePLr(s,f,p) implements the canonical 8-leaf binary tree (S1F1P1 -> a, ..., S2F2P2 -> e). Pinned by 8 table-driven tests. - SeverityToS / ExposureToF map the existing 1-5 fields to the binary S/F at the documented threshold (3). - At project initialise, every hazard's Description is appended with "Risikograph EN ISO 13849-1 (Anhang A): S2 · F1 · P1 -> PLr c" so the audit value is visible without leaving the hazard view. - PatternMatch carries DefaultAvoidability so the init handler can pick it up without a second pattern lookup. B) Methoden-Kopf am Bericht - GET /clarifications.html now opens with a standardised methodology block: ISO 12100 Anhang B (hazard ID) + ISO 13849-1 Anhang A (PLr graph) + ISO 12100 6.2/6.3/6.4 (reduction hierarchy). Same wording on every export, ready for the Anlagenbauer-Uebergabe. - Only norm identifiers — no norm text reproduced. C) ISO12100Section in Hazard Description - When a pattern is labeled with ISO12100Section, the hazard description gets a "Klassifikation: EN ISO 12100 Anhang B, Abschnitt 6.3.5.4" suffix. Provenance for the auditor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 02:03:10 +02:00
Benjamin Admin	71d31c914b	feat(iace): ISO 12100 Anhang B mapping — split noise/vibration + section identifier Phase 16 of the Klaerungen / risk-assessment polish. Sources from EN ISO 12100 Anhang B Tabelle B.1 are now first-class: A) HazardPattern.ISO12100Section identifier (string), persisted only as the section number (e.g. "6.3.5.5") — not the norm text. Keeps the library urheberrechtlich neutral (DIN/Beuth license). 57 patterns labeled today; rest will follow on touch. B) Category split per ISO 12100 Nr. 4 vs Nr. 5: - 16 patterns reclassified noise_vibration -> noise_hazard - 7 patterns reclassified noise_vibration -> vibration_hazard - 1 pattern (HP228 UV-/Laermexposition) kept multi-cat acceptableMeasureCategories now accepts both new aliases plus the legacy noise_vibration. Coverage test recognises both as valid. C) 5 new ISO-12100-Annex-B gap patterns (HP1900-HP1904): - HP1900 Vakuum-Verletzung (6.3.5.5) - HP1901 Federenergie / elastische Elemente (6.2.10) - HP1902 Rutschen/Stolpern auf rauer Oberflaeche (6.3.5.6) - HP1903 Hochdruckinjektion (6.3.5.4) — includes clarifying "no hand-locating of leaks" question - HP1904 Ersticken durch Brustkorbquetschung (6.3.5.2) The library now mirrors the ISO 12100 Annex B structure for the gaps the Bremse benchmark surfaced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 01:59:16 +02:00
Benjamin Admin	b090662524	fix(compliance-check): respect auto-discovery 'not found' verdict; DSB not canonical Two related bugs in the BMW test result: 1. AGB rendered as 'MANGELHAFT 0/13' even though BMW has no public AGB: - Auto-discovery correctly returned 'not found' for AGB (no link on bmw.de matches AGB keywords). - But auto_fill_from_dsi then found the substring 'AGB' in a section of the DSI and pseudo-filled the AGB entry with a 264-word DSI fragment. - cross_search_documents would have done the same. - Both now skip entries where discovery_attempted=True AND auto_discovered=False — the 'not found' verdict stands. 2. DSB-Kontakt rendered as a separate 100% OK document with 7566 words = the entire DSI text: - GDPR practice: the DSB is named inside the DSI as an email or contact block (Art. 13(1)(b)), not as a stand-alone page. - cross_search_documents had been assigning the full DSI to the DSB row because it matched 'datenschutzbeauftragte' keywords. - DSB removed from _ALL_DOC_TYPES — no longer canonical, no longer padded as missing, no longer auto-discovered. The frontend row remains so a tenant with a separate DSB page can still submit one. After this fix BMW should render: - DSE: OK - Impressum: LUECKENHAFT (unchanged — regex gaps to fix separately) - Cookie-Richtlinie: OK - Social Media: NICHT GEFUNDEN (bmw.de does not link to it) - AGB: NICHT GEFUNDEN (correct — BMW has no public AGB) - Nutzungsbedingungen: NICHT GEFUNDEN - Widerruf: NICHT GEFUNDEN	2026-05-17 01:53:09 +02:00
Benjamin Admin	c4be077c5d	feat(iace): Klaerungen Phase 3 — DB-Tabelle + Multi-User + PDF-Export [migration-approved] Three pieces complete the Klaerungen lifecycle: 1. Migration 028: iace_clarifications + iace_clarification_comments + iace_clarification_history. Deterministic clarification_key (UNIQUE per project) so engine re-inits don't lose answers. History table logs every status/answer transition. The previous JSONB-in-metadata storage is kept as read-only fallback for pre-migration projects until a one-shot upcopy script runs. 2. Multi-User-Workflow: - assigned_to field on every clarification (free-text user kuerzel for now; an FK to users can be added in a follow-up). - Comment thread per clarification (POST .../comment, GET .../detail returns the thread). - Status-history log written by UpsertClarification when the status or answer actually changes. - Frontend Modal: Zugewiesen-an + Bearbeiter fields, comment thread with inline post, collapsible history section. 3. PDF-Export via print-friendly HTML: - GET /clarifications.html returns a standalone A4-styled document with status badges, norm references, affected hazards and a signature row at the bottom. The Bediener opens the link and uses Strg-P / Cmd-P to save as PDF. No server-side PDF dependency added. - Frontend "PDF / Druck" button next to CSV export. Backend: - internal/iace/store_clarifications.go: UpsertClarification, ListClarificationsForProject, GetClarificationByKey, AddClarificationComment, ListClarificationComments, ListClarificationHistory. - internal/api/handlers/iace_handler_clarifications.go: - AnswerClarification now writes the SQL row, falls back to legacy JSONB read on list. - PostClarificationComment, ListClarificationDetail, ExportClarificationsHTML added. Migration must be applied manually on Mac Mini and prod via psql -f /migrations/028_iace_clarifications.sql — pattern as in scripts/apply_*_migration.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 01:39:17 +02:00
Benjamin Admin	b2b4d77877	fix(auto-discovery): compute missing against canonical 8 types, not submitted Frontend filters out empty doc rows -> req.documents only contains the N submitted entries (3 in BMW case). The old auto-discovery loop computed 'missing' as 'entries in doc_entries with empty text', which was always empty for those N entries -> discovery never fired. Fix: - missing = _ALL_DOC_TYPES - {canonical doc_types in doc_entries} - For each missing type, APPEND a new entry to doc_entries with discovery_attempted=True. If a discovered doc matched, fill text/url and set auto_discovered=True. - Check loop: skip entries with no URL and no text (let padding label them). Entries with URL but no text keep the 'Kein Text' error so the user sees fetch failures explicitly.	2026-05-17 01:28:51 +02:00
Benjamin Admin	f19a75d83d	feat(iace): Klaerungen Phase 2 — Sidebar-Counter + CSV-Export + Hazard-Banner Three pieces complete the Klaerungen UX: 1. Sidebar-Counter: layout.tsx polls /clarifications and shows a colored open-count badge on the "Klaerungen" nav item. Refreshes whenever the user changes route. 2. CSV-Export: new backend endpoint GET /sdk/v1/iace/projects/:id/clarifications.csv produces a UTF-8- BOM-prefixed semicolon-separated CSV (Excel-friendly) with ID, Quelle, Kategorie, Frage, Status, Antwort, Begruendung, Bearbeiter, answered_at, anzahl Gefaehrdungen, Gefaehrdungs-Namen, Norm-Refs. Frontend Klaerungen-Seite bekommt einen "CSV-Export"-Button. 3. Hazard-Banner statt Fragentext im Benchmark-Detail: the previous bulleted clarification list was duplicated across 48 hazards for a single FANUC question. Phase 2 replaces it with a compact status badge — "N offene Klaerung(en) — Klaerungen-Seite oeffnen" (orange) or "Alle N Klaerungen beantwortet" (green) with a direct link. Backend cleanup: iace_handler_init.go no longer appends the "Mit Anlagenbauer zu klaeren" block to Hazard.Description. The description stays focused on the scenario; clarifications live in the dedicated endpoint and answers persist across re-inits via project.metadata. The aggregated "Referenzierte Normen" line on the hazard is kept. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 01:25:36 +02:00
Benjamin Admin	525038359a	feat(compliance-check): auto-discover missing doc types from homepage When the user leaves some doc-type rows empty, the tool now actively searches the website for them — only marks 'not found' as last resort. Flow: 1. User submits N URLs (e.g. just DSI) 2. For each canonical doc_type with no submitted URL/text, the route identifies the most-common base (scheme://netloc) from submitted URLs 3. Calls consent-tester /dsi-discovery on the homepage with max_documents=15 (180s timeout) 4. Classifies every discovered doc into a canonical doc_type via title/URL keyword rules (_DISCOVERY_RULES — covers cookie/widerruf/ social_media/agb/nutzungsbedingungen/dsb/impressum/dse) 5. Fills matching empty entries with the discovered text, marks auto_discovered=True and discovery_attempted=True Padding now differentiates: - 'Auf der Website nicht gefunden' — discovery was attempted, no doc matched. Amber badge, friendly hint to add URL manually. - 'Nicht eingereicht — Quelle nicht angegeben' — user gave NO URLs at all, nothing to crawl from. Grey badge. Email + frontend: - Status labels: NICHT GEFUNDEN (amber) vs NICHT EINGEREICHT (grey) - 'Gepruefte Quellen' table tags auto-discovered URLs with a small blue 'auto-entdeckt' badge so GF sees what tool found vs user submitted. Implementation only runs when ≥1 URL was submitted (no base to crawl from otherwise). Adds 30-90s for unsubmitted types but avoids the 'just say nicht gefunden' anti-pattern.	2026-05-17 01:14:05 +02:00
Benjamin Admin	79efa54898	feat(iace): Klaerungen MVP — Phase 1 New page "Klaerungen" between Massnahmen and Verifikation. Backend: - internal/iace/clarifications.go: Clarification struct + ClarificationAnswer + BuildProjectClarifications() — aggregates pattern-level + manufacturer- level questions from collectAllPatterns + GetManufacturerSafetyFeatures. Deterministic IDs ("pattern:HP1640:0", "manuf:fanuc:dual-check-safety-dcs:1") so persisted answers survive every re-init. - internal/api/handlers/iace_handler_clarifications.go: - GET /projects/:id/clarifications returns aggregated list with affected hazard names + persisted answer state, sorted (open first). - POST /projects/:id/clarifications/:cid/answer writes status/answer/ reasoning/answered_by/answered_at to project.metadata.clarification_- answers — no DB schema change. Frontend: - admin-compliance/app/sdk/iace/layout.tsx: new "Klaerungen" nav item. - app/sdk/iace/[projectId]/clarifications/page.tsx: table grouped by source (FANUC / Pattern HP1640 / …), Filter Offen/Beantwortet/Alle, search field, Antwort-Modal with status/answer/Begruendung/Bearbeiter. A clarification answered once applies to ALL referenced hazards — the operator no longer has to answer the same FANUC DCS question on 48 mechanical hazards individually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 01:05:53 +02:00
Benjamin Admin	bc21480a2a	fix(compliance-check): always render 8 doc types + 4 BMW GT-gap fixes Always-show-8 (user-requested): - agent_compliance_check_routes.py: _pad_results_with_missing pads the results list to always include all 8 canonical doc_types in canonical order. Missing types get a placeholder DocCheckResult with error= 'Nicht eingereicht' + scenario='missing'. - agent_doc_check_report.py: NICHT EINGEREICHT status label (neutral), friendly grey body block instead of red error. - ChecklistView.tsx: 'Nicht eingereicht' chip (neutral grey, not red 'Fehler'); SCENARIO_LABELS adds missing entry + header chip counter. Impressum-Regression fix (#18): - _fetch_text(url, doc_type): cookie/dse/social_media -> max_documents=1 (CMP capture authoritative, sub-pages dilute). Other types -> =3 (Impressum needs Versicherungsvermittler, Aufsicht, Berufsrecht sub- pages). 15s networkidle bail keeps timing safe. ODR/Verbraucherstreitbeilegung filter (#19): - _apply_profile_filter: when profile.needs_odr=True (B2C), override the check's default B2B-oriented hint with action-oriented B2C guidance pointing at Art. 14 EU-VO 524/2013 + §36 VSBG. Previously the check contradicted itself: 'profile says B2C' + hint 'only relevant for B2C online vendors'. Registergericht regex (#20): - impressum_checks.py: accept colon/dot/dash between keyword and city (BMW writes 'registergericht: münchen hrb 42243'). Add 'sitz und registergericht: X' as separate pattern. Industry detection (#21): - business_profiler.py: 'automotive' keywords broadened (antriebs, motor, leasing, werkstatt, probefahrt, plus brand names BMW/Mercedes/ Audi/VW/Porsche/Opel). 'it_services' keywords narrowed — software/ cloud/hosting are mentioned in every privacy policy and were biasing the result toward IT for any tech-aware company.	2026-05-17 01:03:58 +02:00
Benjamin Admin	74f66c4c34	fix(admin/iace/benchmark): show Klaerungsfragen + Normen on Engine column The Go init handler appends two annotated blocks to Hazard.Description ("Mit Anlagenbauer zu klaeren: ..." and "Referenzierte Normen: ...") without changing the DB schema. The benchmark detail view only rendered hazard.scenario \|\| hazard.description, so the appended blocks were silently hidden because scenario is always populated. Split the description into three structured pieces: 1. extractScenario() — pure scenario text, stripped of trailing blocks 2. extractClarifications() — bullet list of "Mit Anlagenbauer zu klaeren" 3. extractEngineNorms() — pipe-separated norm references Each piece is rendered as its own DetailRow. The FANUC DCS clarification that already lives in the DB (48/115 hazards on the Bremse project) is now visible in the Engine column. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 00:42:41 +02:00
Benjamin Admin	5f2da1de88	feat(consent-tester): Phase E — self-improving CMP library cmp_discovery_log.py: - sqlite log at /data/cmp_discoveries.db: every LLM-discovered CMP pattern recorded with domain, strategy, value, sample text - Auto-promote (user-chosen 'voll automatisch' mode): when LLM returns strategy=url AND extracted text >= 800 words, write a new module /data/auto_cmp/auto_<slug>.py with derived regex matcher + reconstruct - record_discovery() called from dsi_discovery._try_llm_cascade on success cmp_library/_registry.py: - Loads both hand-written modules from services/cmp_library/ AND auto-promoted modules from /data/auto_cmp/ (CMP_AUTO_DIR env) - Auto modules use importlib.util.spec_from_file_location, no package install needed; restart consent-tester to pick up new ones dsi_discovery.py: - _try_llm_cascade now calls record_discovery() on every successful LLM analysis (cached AND fresh) main.py: - GET /cmp-discoveries — admin endpoint listing all logged discoveries - DELETE /cmp-discoveries/{id} — rollback (unlinks auto_*.py) This closes the self-improving loop: first encounter with a new CMP fires the LLM (cost) → discovery is auto-promoted → all future runs against the same vendor pattern hit Phase B (Named CMP) at <50ms with no LLM call.	2026-05-16 23:09:23 +02:00
Benjamin Admin	2400aa6a9e	feat(consent-tester): Phase C+D — LLM cascade fallback (Qwen → OVH) New module consent-tester/services/cmp_llm_fallback.py: - LLMCookieExtractor: single-endpoint adapter (Ollama OR OpenAI-compat) - LLMCascade: tries Qwen (local Mac Mini Ollama) first; falls through to OVH (managed 120B) when Qwen returns no usable strategy - LLMCascade.from_env(): reads OLLAMA_URL/CMP_LLM_MODEL + OVH_LLM_URL/ OVH_LLM_KEY/OVH_LLM_MODEL from environment - LLM returns JSON {strategy: url\|selector\|text, value: ...} - Valkey-backed cache per netloc (cmp:hint:<netloc>, 7-day TTL) — next run against the same domain skips the LLM entirely dsi_discovery.py: - Wired network_log collector (URL/status/content-type/size of every JSON response on the page) — passed to LLM prompt as observation - After Named CMP (Phase B) + Heuristic (Phase A) both fail AND DOM < 300 words: invoke LLMCascade.analyze(...) - _apply_llm_hint executes the LLM's strategy: refetch URL via Playwright request context, query DOM selector, or use text directly - Cache HIT path: apply cached hint, only fall back to LLM if cache is stale docker-compose.yml: - consent-tester gets env vars + cmp-data volume (for Phase E) - All LLM endpoints configurable via env, sensible defaults consent-tester/requirements.txt: - redis>=5.0 (asyncio client, Valkey-compatible) - httpx>=0.27	2026-05-16 23:06:05 +02:00
Benjamin Admin	e9002175ac	feat(iace): manufacturer safety feature library (Stufe A — 50+ entries) Adds a curated database of safety-relevant features for the major manufacturers across mechanical/plant engineering, written entirely in own words with norm anchors. No verbatim manufacturer texts — therefore no copyright issue: - Markennennung (§ 23 MarkenG nominative use) is permitted. - Fakten ueber Produkt-Sicherheitsfunktionen are not protected by § 2 UrhG (only Werke, not facts). - NormReferences contain only the identifiers (e.g. "EN ISO 13849-1 PLd Kat.3"), never the norm text itself. Coverage (52 entries across 12 categories): Industrieroboter (10): FANUC DCS, KUKA SafeOperation, ABB SafeMove, Yaskawa FSU, Staeubli CS9, Kawasaki Cubic-S, Mitsubishi MELFA, Universal Robots PolyScope, Doosan PRS, Comau SafeNet CNC/WZM (8): DMG MORI, Mazak, TRUMPF, Okuma, Hermle, Heidenhain SPLC, GROB, Heller Pneumatik (4): Festo, SMC, AVENTICS, Parker Hydraulik (3): Bosch Rexroth, HAWE, HYDAC Safety-PLC / Sicherheitstechnik (8): PILZ, SICK, Schmersal, Euchner, Leuze, Phoenix Contact, Banner, Wieland Standard-PLC (5): Siemens, Beckhoff, Rockwell, Schneider, B&R Pressen (3): Schuler, Bruderer, AIDA Spritzguss (3): Arburg, KraussMaffei, ENGEL Verpackung (2): Krones, Bosch Packaging/Syntegon Laser/Schweissen (3): Bystronic, Amada, Fronius Foerdertechnik (2): Interroll, SEW EURODRIVE Engine integration: - LookupManufacturerFeaturesInText() scans the project narrative for any of the manufacturer aliases (case-insensitive, umlaut-tolerant). - Init-Handler appends matched feature clarifications to the relevant hazard's "Mit Anlagenbauer zu klaeren:" block — for the right HazardCategory only (e.g. FANUC DCS only on mechanical_hazard). - For a Bremse project narrative mentioning "Fanuc Robodrill", the engine now adds clarification questions like "Ist DCS am Roboter konfiguriert?" to relevant mechanical hazards automatically. Tests: 7 new pin tests — manufacturer count, norm prefixes, FANUC/KUKA detection in narrative, umlaut robustness (Staeubli vs Staubli). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:04:56 +02:00
Benjamin Admin	7e426c31f1	feat(consent-tester): Phase B — named CMP library + plugin architecture cmp_extractor.py refactored to thin coordinator (123 LOC, was 223). Discovers all CMP modules via cmp_library/_registry.py:load_all() at import time. Restart consent-tester to pick up new modules. New cmp_library/ folder: - _registry.py: auto-discovers all modules with MATCHER + reconstruct() - epaas.py: BMW Group ePaaS (extracted from cmp_extractor) - onetrust.py: cdn.cookielaw.org Groups/Cookies schema - cookiebot.py: consent.cookiebot.com Categories schema - usercentrics.py: api.usercentrics.eu services schema - didomi.py: sdk.privacy-center.org notice + vendors + purposes - trustarc.py: consent.trustarc.com categories + vendors Each module: - MATCHER: re.Pattern matching the CMP JSON endpoint URL - reconstruct(d: dict) -> str: builds German Markdown cookie-policy text Phase E (self-improving) will write auto_*.py files into the same folder; _registry already picks those up via pkgutil.iter_modules.	2026-05-16 22:59:48 +02:00
Benjamin Admin	4f19310130	fix(iace): HP1654 Greifer durchschlaegt Zaun — DCS-Bezug GT 1.8 fordert konkret den 'sicher begrenzten Bewegungsbereich (Dual Check Safety)'. HP1654 hatte nur M061 'Feste trennende Schutzeinrich- tung' als Mitigation. Ergaenzt um M494 (Safe Limited Position/Space mit DCS-Erlaeuterung), M501 (Schutzzaun-Lastbemessung) und M502 (Greifer- Fail-Safe). Klaerungsfragen verweisen explizit auf DCS bei FANUC, SafeMove bei ABB, SafeOperation bei KUKA und die EN ISO 13849-1 PLd/ Kat.3-Validierung. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:56:40 +02:00
Benjamin Admin	8283483909	feat(consent-tester): Phase A — generic JSON cookie-policy heuristic New module cmp_heuristic.py with: - looks_like_cookie_policy(data): shape-based classifier (top-level keys cookies/categories/providers/vendors/purposes/cookieList/etc. + at least 2 name+description objects, or IAB TCF v2 vendors[]+purposes[]) - reconstruct_generic(data): walks JSON, extracts name + description fields + standalone prologue/dataController/persistence fields, emits flat German Markdown text (max 5000 words, dedup) cmp_extractor.py wired so that AFTER named CMP matchers (epaas, onetrust) fail, every JSON response on the page is tested for the heuristic. If matched, payload is captured as '_heuristic' kind and reconstructed via the generic walker. This is Phase A of the 4-stage cascade (B-D follow). Unknown CMPs that return JSON now work without hand-coding each one. Pre-filter: skips response paths /api/config, /beacon, /track, /analytics, /fonts/, /log/, /heartbeat/, /.well-known/ to avoid spamming the heuristic on every Playwright load.	2026-05-16 22:56:20 +02:00
Benjamin Admin	9814b56f2f	fix(cookie-extract): max_documents=1 + faster networkidle bail (Phase 0 fix) Root cause of the recurring 603-word BMW result: - DSI discovery for cookie-policy URL was hitting 4x networkidle timeouts (60s each = ~240s total). - Backend httpx timeout (180s after the previous fix) gave up before the consent-tester finished, falling through to the raw HTTP fetch which returned BMWs SSR navigation chrome (603 words) as the 'cookie policy'. Two orthogonal fixes: 1. _fetch_text now passes max_documents=1 for user-specified URLs. We only want self-extraction of THAT page; link-following is unnecessary noise. 2. networkidle wait_until window dropped 60s -> 15s. SPAs like BMW/Daimler never reach networkidle anyway; the 60s wait was pure latency. Falls through to domcontentloaded+5s render-wait, same as before.	2026-05-16 22:53:23 +02:00
Benjamin Admin	69729ef6ac	feat(iace): norm references in mitigations + aggregated norm panel per hazard Library measures carry NormReferences (EN/IEC/ISO/DIN/TRBS/TRGS Ziff./Kap./ Pos.) but they were dropped on persist: CreateMitigationRequest only wrote Name + Description. The Fachmann benchmark file lists Normen for 34 of 60 hazards — the engine had this data already but lost it on the way to the UI. Fix without DB schema change: - Mitigation.Description gets a "Normen: EN 60204-1 Ziff. 6.2 \| EN 61140" line appended when the measure has NormReferences. Pipe separator keeps the inline panel short and grep-friendly. - After all mitigations land, the aggregated dedup'd norm list for the hazard is appended to Hazard.Description as a single "Referenzierte Normen: ..." line so the UI can show one panel per hazard without scanning every mitigation. Audit of library coverage (per-pattern) showed GT-Bremse Normen are generally present and richer: - HP1640 covers GT 2.2 (EN 60204-1 Ziff. 6.2, Ziff. 8.2.3, EN 61140 +) - HP1641 covers GT 2.4 (EN 60204-1 Ziff. 8.2.6 +) - HP1605 covers GT 1.7 (ISO 10218-1 Ziff. 5.6.2, 5.8.3 — Ziff. 5.7.3 fehlt) - HP1671 covers GT 1.30 (EN 12417 — Pos. detail fehlt) Followup: 2 fine-grained sub-paragraph references (5.7.3, Pos. 1.1.4) can be added later as measure-text updates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:51:50 +02:00
Benjamin Admin	35d6422247	fix(iace): HP1632 Bersten-Pattern eindeutige Zone fuer Dedup ZoneDE 'Pneumatikkomponenten der Anlage' kollidiert nach normalizeZoneKey mit HP1630 'Pneumatikschlaeuche der Automation' im 3-signifikante-Wort- Vergleich. Neue Zone 'Berstgefaehrdete Druckwandungen Pneumatik (Leitungs- wand, Dichtung, Verschraubung)' hat semantisch eigenstaendige Schluessel- woerter — Dedup mergt nicht mehr in HP1630. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:34:51 +02:00
Benjamin Admin	5ea68ebea4	feat(iace): clarification questions + HP1632 Bersten + HP1637 KSS-Aerosol fix Drei nachhaltige Verbesserungen, getrieben durch die Bremse-Benchmark- Faelle GT 1.4, GT 1.30 und GT 7.4. Die Engine erfindet weiterhin keine Fachmann-Kommentare — Kommentare bleiben aus, weil sie ein Verstaendnis der konkreten Anlage erfordern, das die Engine nicht hat. Statt dessen liefert die Engine norm-basierte Klaerungsfragen und ein praeziseres Pattern-Vokabular. A) HazardPattern.ClarificationQuestionsDE — neues optionales Feld: - Pattern hinterlegt prueffaehige Fragen, die der Bediener mit dem Anlagenbauer abklaert. Beispiele: - HP1640: "Liegt ein Pruefprotokoll nach EN 60204-1 vor?" - HP1666: "Ist die WZM als CE-konformes Subsystem integriert?" - HP1604: "Ist DCS am Roboter konfiguriert und validiert?" - Init-Handler haengt die Fragen an Hazard.Description an mit dem Marker "Mit Anlagenbauer zu klaeren:". Kein DB-Schema-Aenderungs- bedarf. - 11 Patterns mit Klaerungsfragen versehen (HP1602, HP1604, HP1611, HP1612, HP1620, HP1622, HP1637, HP1640, HP1641, HP1666, HP1685). B) HP1632 "Bersten druckbeaufschlagter Pneumatik-Komponente" — neues Pattern, semantisch DISTINKT zu HP1630 "Abspringen": - Bersten = Material-/Druckversagen der Komponente, Mediumaustritt - Abspringen = Verbindung loest sich, Peitscheneffekt Bremse-Benchmark GT 1.4 sprach von Bersten, HP1630 nur von Abspringen — ein 66%-Frontend-Match war eine Sackgasse. Mit HP1632 feuert die Engine ein eigenes Hazard, das auf GT 1.4 einen sauberen Volltreffer liefert. C) HP1637 "Einatmen von KSS-Aerosolen" — Massnahmen vervollstaendigt: Vorher nur M141 (Sicherheitszeichen), neu zusaetzlich M405 (KSS- Aerosolabsaugung), M418 (AGW-Ueberwachung), M526 (WZM-Tueren geschlossen waehrend Bearbeitung), M408 (Hautschutzplan). Klaerungsfrage: "Wurde die Aerosolkonzentration nach Bearbeitungs- ende messtechnisch ermittelt und mit dem AGW verglichen?" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:23:56 +02:00
Benjamin Admin	41023f6343	fix(iace): HP1671 Druckluft-Verletzung — 4 zusaetzliche GT-1.30 Massnahmen HP1671 "Druckluft-Verletzung in Bearbeitungszelle" matched zwar das GT-1.30 Szenario "Einstich, Augenverletzung in Bearbeitungszelle" exakt nach Name und Scenario, hatte aber nur eine einzige Massnahme M061 "Feste trennende Schutzeinrichtung". Die drei spezifischen Massnahmen des Fachmanns (Reinigungsduese in Zelle integriert / Druckluft bei Tueroeffnung aus / Einhausung-Lastbemessung) blieben unsichtbar, weil mein neuer GT-Bremse-Pattern HP1712 zwar diese Massnahmen kennt, aber durch RequiredEnergyTags=["pneumatic"] in diesem Projekt nicht feuert. Fix: HP1671 SuggestedMeasureIDs ["M061"] -> ["M504", "M505", "M501", "M061", "M141"]. EN 12417 Kap. 5.2 / Pos. 1.1.4 ist jetzt durch M504/M505 abgedeckt. HP1712 bleibt als Backup-Pattern fuer Projekte mit explizitem pneumatic-Tag bestehen. Followup: HP1671 und HP1712 sind semantisch redundant — Konsolidierung ist Teil der naechsten Pattern-Hygiene-Iteration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:08:05 +02:00
Benjamin Admin	6689b37f95	fix(agent): bump _fetch_text timeout 60s->180s The dsi-discovery in consent-tester does self-extraction + follows up to 3 sub-links + waits for CMP JSON payloads. On big SPAs (BMW, Daimler) this routinely exceeds 60s. When it timed out, the HTTP fallback returned the SSR shell as text — for the BMW cookie page that's 603 words of site navigation, which then registered as 'Cookie-Richtlinie nicht im eingereichten Text' (33%). With 180s the consent-tester finishes cleanly and we get the CMP-captured 1824 words of real policy.	2026-05-16 22:00:42 +02:00
Benjamin Admin	80d62a0c5f	fix(iace): rename 58 duplicate HP-IDs in extended.go/extended2.go Background: hazard_patterns_extended.go (HP045-074) and _extended2.go (HP074-102) shared their entire ID range with the semantically-different patterns in hazard_patterns_cobot.go, hazard_patterns_press.go, hazard_patterns_operational.go and hazard_patterns_extended_dguv.go. The collision had lived unnoticed because TestGetBuiltinHazardPatterns_- UniqueIDs only checks the 44 builtin patterns (HP001-HP044). Examples of the collision: - HP059 = "Kollision Mensch-Roboter" (cobot.go) vs "Kupplung — mechanisch" (extended.go) - HP060 = "Quetschen durch Werkzeug am Cobot" (cobot.go) vs "Diagnosemodul — Software" (extended.go) - HP073 = "Wartung ohne LOTO" (operational.go) vs "Hydraulikventil — hydraulisch" (extended.go) At runtime collectAllPatterns() returned both patterns under the same ID which made downstream lookups (e.g. hazardPatternMeasures map keyed by pattern_id) non-deterministic — last-loaded wins, dropping the other pattern's mitigation set silently. Rename strategy (no deletes — both patterns are real and earn their SuggestedMeasureIDs after the category-filter work): extended.go HP045..HP073 -> HP1800..HP1828 (29 IDs) extended2.go HP074..HP102 -> HP1830..HP1858 (29 IDs) cobot/press/operational/extended_dguv keep their original IDs because: - compliance_triggers.go references HP059/HP060 with the cobot meaning - pattern_engine_test.go references HP073 with the LOTO/maintenance meaning - phase3_4_test.go references HP073 the same way New regression test: - TestAllPatterns_UniqueIDs runs over collectAllPatterns() and fails if ANY pattern in the runtime set duplicates an ID. The old TestGetBuiltinHazardPatterns_UniqueIDs stays for the builtin subset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:00:06 +02:00
Benjamin Admin	6a3e96d54c	fix(iace): set-based measure-category filter + 235 pattern-author fixes Two-part nachhaltiger fix replacing the previous "fill to 5 mitigations no matter what" behavior that the GT-Bremse benchmark proved unfaithful (e.g. HP1625 "scharfe Kanten" returning M005 "Rotations- bewegung vermeiden" via category fallback; HP1651 "Wiederanlauf Roboter" returning M054 "Sichere thermische Auslegung" via mismatched pattern reference). PART A — Set-based category filter (handlers package): - acceptableMeasureCategories: replaces 1:1 patternCatToMeasureCat with a curated set per pattern category, so e.g. safety_function_failure now accepts software_control measures (watchdogs, plausibility checks) and emc_hazard accepts both electrical and software_control measures - isCategoryCompatible: gate every measure id against the accepted set before creating a mitigation; mismatches log MEASURE-SKIP - The old category fallback is REMOVED. A hazard whose pattern has no category-compatible measure is now created with zero mitigations and logged as COVERAGE-GAP — the operator must consult an expert. No more silent invention of generic defaults. PART B — 235 pattern author-error fixes across 26 files: - HP040-HP044 (AI): M101/M102/M103 (Auffangwanne/Absauganlage) -> M133 Anomalieerkennung + M214 Plausibilitaet + M213 Sensor-Redundanz + M044 Zweikanalige Steuerung + others - HP011-HP015, HP104-HP109, HP1085-HP1095, HP1281-HP1334 (electrical): M001-M005/M054/M061 placeholders -> M481/M482 Isolation + M511-M522 PE/Schutzleiter/RCD/Hauptschalter - HP110-HP1331 (material_environmental): M101-M103 -> M384-M395 Brandschutz/Laserschutz + M533/M408 SDB/PSA - HP800-HP858, HP1178-HP1264 (software/sensor/hmi): M101/M104 -> M105/M106/M107/M214 SPS/Watchdog/Plausibilitaet - HP026, HP611-HP1690 (ergonomic): M001/M082 -> M353-M360 + M530-M532 Hebehilfe/ergonomische Hoehe - HP201-HP1697 (mechanical): M054/M051 -> M002/M008/M061/M141 + M487/M488 Tueroeffnung-Stillsetzung/Wiederanlauf - Plus EMF/Strahlung/Brand/Lärm/Vibration/Kommunikation/Cyber Coverage shift (Pattern-Author-Fehler bei aktiviertem Set-Filter): start: 237 patterns with zero category-compatible measures after Stufe 1A: 5 (AI) after Stufe 1B: 20 (mechanical Bestand) after Stufe 1C: 35 (electrical Bestand) after Stufe 1D: 29 (material_environmental) after Stufe 1E: 29 (software/sensor/hmi) after Stufe 1F: 20 (ergonomic) after Stufe 1G: 80 (thermal/comm/radiation/fire/safety) final: 0 (28 extended.go/extended2.go duplicates fixed) New regression tests: - TestEveryPattern_HasCategoryCompatibleMeasure: every pattern in collectAllPatterns() must reference at least one category-compatible measure; gaps must be explicitly listed in AllowlistKnownGaps (currently empty). Fails CI for any new pattern that drifts. - TestAcceptableMeasureCategories: pins the set-mapping for the 7 most-bug-prone pattern categories. - TestIsCategoryCompatible_EmptyMeasureCat: protects legacy entries. A separate task #11 tracks 58 HP-ID duplicates between extended.go/extended2.go and cobot.go/press.go/operational.go — patterns are semantically different and TestGetBuiltinHazardPatterns_- UniqueIDs misses them because it only checks HP001-HP044. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:11:02 +02:00
Benjamin Admin	938f9a6c51	fix(cmp): tolerate variable URL segments in ePaaS policy pattern BMW ePaaS URLs use 3 segments between /policypage/ and .epaas.json: /epaas/prod/policypage/<tenant>/<config-hash>/<locale>.epaas.json The old pattern only matched 2 segments. Switch to a tolerant pattern that matches any path before .epaas.json (anchored at .epaas.json end).	2026-05-16 20:58:48 +02:00

1 2 3 4 5 ...

1102 Commits