Report the tool's risk number as a plausible range with a confidence
label instead of a false-precision point value (confidence-aware
tonality — the assessment is confirmed by the DSB / safety expert).
- risk_estimation.go: EstimateConfidence (hoch/mittel/niedrig from how the
contact mode resolved), EstimateRiskRange (S±1 and aggregate L=F+W+P ±1,
the empirically validated per-parameter accuracy), RiskLevelRange; share
the riskBandLabel thresholds with EstimateRiskLevel.
- risk_benchmark.go: RiskComparisonPair gains eng_risk_point/low/high +
level + level_range + confidence; RiskAgreement gains high_confidence_pct.
- RiskComparison.tsx: per-hazard range "low–high (level range)" + point,
confidence chip, and an aggregate confidence line; types in useBenchmark.ts.
- Unit tests for the range/confidence helpers.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
#1 Risk-number comparison in the benchmark: ComputeRiskComparison derives the
tool's S/F/W/P + Fine-Kinney per matched hazard and compares to the GT values;
exposed on the benchmark response and rendered in a new RiskComparison table
with GREEN/YELLOW/RED traffic lights on the risk number R (like the Excel),
plus per-axis within-1 agreement cards.
#2 Generic misuse pattern HP2103 "Personenbefoerderung auf Hebezeug" — gated to
lift-family machine types, fires for ANY lifting device (not machine-specific).
#3 Benchmark matcher is now 1:n — one broad engine hazard may cover several
fine-grained GT sub-scenarios (foot/hand/leg crush), so coverage reflects real
risk coverage rather than 1:1 wording matches.
Validated on BOTH ground truths (robot cell + lift): leakage 0, ghosts 0,
coverage held.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>