feat(iace): benchmark risk comparison (traffic lights) + misuse pattern + 1:n matcher

#1 Risk-number comparison in the benchmark: ComputeRiskComparison derives the tool's S/F/W/P + Fine-Kinney per matched hazard and compares to the GT values; exposed on the benchmark response and rendered in a new RiskComparison table with GREEN/YELLOW/RED traffic lights on the risk number R (like the Excel), plus per-axis within-1 agreement cards. #2 Generic misuse pattern HP2103 "Personenbefoerderung auf Hebezeug" — gated to lift-family machine types, fires for ANY lifting device (not machine-specific). #3 Benchmark matcher is now 1:n — one broad engine hazard may cover several fine-grained GT sub-scenarios (foot/hand/leg crush), so coverage reflects real risk coverage rather than 1:1 wording matches. Validated on BOTH ground truths (robot cell + lift): leakage 0, ghosts 0, coverage held. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 17:24:52 +02:00
parent ef746ea8f0
commit 2677bca9ca
8 changed files with 284 additions and 1 deletions
@@ -74,8 +74,12 @@ func CompareBenchmark(gt *GroundTruth, hazards []Hazard, mitigations []Mitigatio
 	usedEng := make(map[int]bool)
 	var matched []HazardMatchPair

+	// 1:n matching: a single broad engine hazard may legitimately cover several
+	// fine-grained GT sub-scenarios (e.g. one "crush under descending load"
+	// pattern covers the GT's separate foot / hand / leg crush rows). We only
+	// block a GT entry from matching twice; an engine hazard may match several.
 	for _, p := range pairs {
-		if usedGT[p.gtIdx] || usedEng[p.engIdx] {
+		if usedGT[p.gtIdx] {
 			continue
 		}
 		usedGT[p.gtIdx] = true