Commit Graph

2 Commits

Author SHA1 Message Date
Benjamin Admin 2677bca9ca feat(iace): benchmark risk comparison (traffic lights) + misuse pattern + 1:n matcher
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Failing after 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m23s
CI / test-go (push) Failing after 37s
CI / iace-gt-coverage (push) Successful in 24s
CI / test-python-backend (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
#1 Risk-number comparison in the benchmark: ComputeRiskComparison derives the
tool's S/F/W/P + Fine-Kinney per matched hazard and compares to the GT values;
exposed on the benchmark response and rendered in a new RiskComparison table
with GREEN/YELLOW/RED traffic lights on the risk number R (like the Excel),
plus per-axis within-1 agreement cards.

#2 Generic misuse pattern HP2103 "Personenbefoerderung auf Hebezeug" — gated to
lift-family machine types, fires for ANY lifting device (not machine-specific).

#3 Benchmark matcher is now 1:n — one broad engine hazard may cover several
fine-grained GT sub-scenarios (foot/hand/leg crush), so coverage reflects real
risk coverage rather than 1:1 wording matches.

Validated on BOTH ground truths (robot cell + lift): leakage 0, ghosts 0,
coverage held.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 17:24:52 +02:00
Benjamin Admin 8bb90d73e5 feat(iace): benchmark system + erklaerteil + dedup-fix
Build + Deploy / build-admin-compliance (push) Successful in 2m7s
Build + Deploy / build-backend-compliance (push) Successful in 3m34s
Build + Deploy / build-ai-sdk (push) Successful in 1m6s
Build + Deploy / build-developer-portal (push) Successful in 1m7s
Build + Deploy / build-tts (push) Successful in 1m58s
Build + Deploy / build-document-crawler (push) Successful in 57s
Build + Deploy / build-dsms-gateway (push) Successful in 34s
Build + Deploy / build-dsms-node (push) Successful in 29s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m28s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 42s
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Successful in 27s
CI / test-python-dsms-gateway (push) Successful in 22s
CI / validate-canonical-controls (push) Successful in 15s
Build + Deploy / trigger-orca (push) Successful in 3m10s
- Erklaerteil-Template fuer Risikobeurteilungen (risk_assessment_template.go)
  in PDF-Export, Markdown-Export und Frontend ReportPrintView eingebaut
- Ground Truth Benchmark-System: Datenmodell, Fuzzy-Matching-Engine,
  3 API Endpoints (import-gt, benchmark, benchmark/summary)
- Frontend Benchmark-Tab mit Score-Cards, Kategorie-Breakdown,
  Hazard-Vergleichstabelle (Zugeordnet/Fehlend/Extra), Business Impact
- Erster Benchmark: 13.3% Coverage (Baseline) gegen 60 GT-Eintraege
- Dedup-Fix: seenCat[cat] -> seenCatZone[cat+zone] erlaubt mehrere
  Gefaehrdungen pro Kategorie an verschiedenen Gefahrenstellen
- Komponenten-spezifische Hazard-Namen und Zone-basierte Zuordnung

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 01:02:33 +02:00