#1 Risk-number comparison in the benchmark: ComputeRiskComparison derives the
tool's S/F/W/P + Fine-Kinney per matched hazard and compares to the GT values;
exposed on the benchmark response and rendered in a new RiskComparison table
with GREEN/YELLOW/RED traffic lights on the risk number R (like the Excel),
plus per-axis within-1 agreement cards.
#2 Generic misuse pattern HP2103 "Personenbefoerderung auf Hebezeug" — gated to
lift-family machine types, fires for ANY lifting device (not machine-specific).
#3 Benchmark matcher is now 1:n — one broad engine hazard may cover several
fine-grained GT sub-scenarios (foot/hand/leg crush), so coverage reflects real
risk coverage rather than 1:1 wording matches.
Validated on BOTH ground truths (robot cell + lift): leakage 0, ghosts 0,
coverage held.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>