breakpilot-compliance

Benjamin_Boenisch/breakpilot-compliance

Fork 0

Commit Graph

Author	SHA1	Message	Date
Benjamin Admin	a7dc12f30f	feat(iace): risk as confidence range + label in benchmark tab Report the tool's risk number as a plausible range with a confidence label instead of a false-precision point value (confidence-aware tonality — the assessment is confirmed by the DSB / safety expert). - risk_estimation.go: EstimateConfidence (hoch/mittel/niedrig from how the contact mode resolved), EstimateRiskRange (S±1 and aggregate L=F+W+P ±1, the empirically validated per-parameter accuracy), RiskLevelRange; share the riskBandLabel thresholds with EstimateRiskLevel. - risk_benchmark.go: RiskComparisonPair gains eng_risk_point/low/high + level + level_range + confidence; RiskAgreement gains high_confidence_pct. - RiskComparison.tsx: per-hazard range "low–high (level range)" + point, confidence chip, and an aggregate confidence line; types in useBenchmark.ts. - Unit tests for the range/confidence helpers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-10 23:04:56 +02:00
Benjamin Admin	2677bca9ca	feat(iace): benchmark risk comparison (traffic lights) + misuse pattern + 1:n matcher CI / detect-changes (push) Successful in 7s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / build-sha-integrity (push) Failing after 4s Details CI / validate-canonical-controls (push) Successful in 11s Details CI / loc-budget (push) Failing after 14s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m23s Details CI / test-go (push) Failing after 37s Details CI / iace-gt-coverage (push) Successful in 24s Details CI / test-python-backend (push) Has been skipped Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details #1 Risk-number comparison in the benchmark: ComputeRiskComparison derives the tool's S/F/W/P + Fine-Kinney per matched hazard and compares to the GT values; exposed on the benchmark response and rendered in a new RiskComparison table with GREEN/YELLOW/RED traffic lights on the risk number R (like the Excel), plus per-axis within-1 agreement cards. #2 Generic misuse pattern HP2103 "Personenbefoerderung auf Hebezeug" — gated to lift-family machine types, fires for ANY lifting device (not machine-specific). #3 Benchmark matcher is now 1:n — one broad engine hazard may cover several fine-grained GT sub-scenarios (foot/hand/leg crush), so coverage reflects real risk coverage rather than 1:1 wording matches. Validated on BOTH ground truths (robot cell + lift): leakage 0, ghosts 0, coverage held. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-09 17:24:52 +02:00

Author

SHA1

Message

Date

Benjamin Admin

a7dc12f30f

feat(iace): risk as confidence range + label in benchmark tab

Report the tool's risk number as a plausible range with a confidence
label instead of a false-precision point value (confidence-aware
tonality — the assessment is confirmed by the DSB / safety expert).

- risk_estimation.go: EstimateConfidence (hoch/mittel/niedrig from how the
  contact mode resolved), EstimateRiskRange (S±1 and aggregate L=F+W+P ±1,
  the empirically validated per-parameter accuracy), RiskLevelRange; share
  the riskBandLabel thresholds with EstimateRiskLevel.
- risk_benchmark.go: RiskComparisonPair gains eng_risk_point/low/high +
  level + level_range + confidence; RiskAgreement gains high_confidence_pct.
- RiskComparison.tsx: per-hazard range "low–high (level range)" + point,
  confidence chip, and an aggregate confidence line; types in useBenchmark.ts.
- Unit tests for the range/confidence helpers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-10 23:04:56 +02:00

Benjamin Admin

2677bca9ca

feat(iace): benchmark risk comparison (traffic lights) + misuse pattern + 1:n matcher

CI / detect-changes (push) Successful in 7s

Details

CI / branch-name (push) Has been skipped

Details

CI / guardrail-integrity (push) Has been skipped

Details

CI / secret-scan (push) Has been skipped

Details

CI / dep-audit (push) Has been skipped

Details

CI / sbom-scan (push) Has been skipped

Details

CI / build-sha-integrity (push) Failing after 4s

Details

CI / validate-canonical-controls (push) Successful in 11s

Details

CI / loc-budget (push) Failing after 14s

Details

CI / go-lint (push) Has been skipped

Details

CI / python-lint (push) Has been skipped

Details

CI / nodejs-lint (push) Has been skipped

Details

CI / nodejs-build (push) Successful in 2m23s

Details

CI / test-go (push) Failing after 37s

Details

CI / iace-gt-coverage (push) Successful in 24s

Details

CI / test-python-backend (push) Has been skipped

Details

CI / test-python-document-crawler (push) Has been skipped

Details

CI / test-python-dsms-gateway (push) Has been skipped

Details

#1 Risk-number comparison in the benchmark: ComputeRiskComparison derives the
tool's S/F/W/P + Fine-Kinney per matched hazard and compares to the GT values;
exposed on the benchmark response and rendered in a new RiskComparison table
with GREEN/YELLOW/RED traffic lights on the risk number R (like the Excel),
plus per-axis within-1 agreement cards.

#2 Generic misuse pattern HP2103 "Personenbefoerderung auf Hebezeug" — gated to
lift-family machine types, fires for ANY lifting device (not machine-specific).

#3 Benchmark matcher is now 1:n — one broad engine hazard may cover several
fine-grained GT sub-scenarios (foot/hand/leg crush), so coverage reflects real
risk coverage rather than 1:1 wording matches.

Validated on BOTH ground truths (robot cell + lift): leakage 0, ghosts 0,
coverage held.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-06-09 17:24:52 +02:00

2 Commits