Files
breakpilot-compliance/ai-compliance-sdk/internal/iace/DATA_SOURCES.md
T
Benjamin Admin 02a31b711c
CI / detect-changes (push) Successful in 6s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-backend (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / build-sha-integrity (push) Failing after 5s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Failing after 37s
CI / iace-gt-coverage (push) Successful in 23s
fix(iace): remove EN ISO 13849-1 risk-graph reproduction; own risk model
IP/copyright fix: ComputePLr reproduced the EN ISO 13849-1 Anhang A risk-graph
decision table (S/F/P -> PLr a..e) and SeverityToS/ExposureToF its parameter
binning, emitted into every hazard description. Removed — we may not reproduce
DIN/Beuth norm logic.

Replaced with BreakPilot's OWN risk model:
- risk_estimation.go: probability (W) + avoidance (P) estimated from public,
  permissively-licensed accident statistics (Eurostat ESAW, CC BY 4.0) by
  contact mode, calibrated to our ground-truth corpus; own risk index + bands.
- iace_handler_init.go now emits "Risikoeinschaetzung (BreakPilot-Modell):
  S F W P -> Risiko: <level>" instead of the norm PLr string.
- DATA_SOURCES.md: data provenance + license register (ESAW CC BY 4.0; BLS/OSHA
  public domain; HSE OGL; DGUV + DIN/Beuth explicitly excluded).
- gt_risk_benchmark_test.go: first GT validation of risk numbers — W within +-1
  99%, P 93% vs the professional across both ground truths.

Removed risk_graph_test.go (pinned the reproduced norm table).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 13:10:53 +02:00

2.6 KiB

Risk-estimation data sources & licenses

Provenance for the probability (W) / avoidance (P) tiers in risk_estimation.go (contactModeTable). We do not vendor any raw dataset — only the small aggregate facts used as anchors plus our own calibrated tiers live in code.

What we use and how

The tiers are derived in two steps:

  1. Anchor — the relative ordering of injury contact modes from public, permissively-licensed occupational-accident statistics (which mechanisms are more vs. less frequent).
  2. Calibrate — adjust the tier values to our own ground-truth corpus (the professional's W/P per mode). Well-sampled modes are set to the GT mean; sparse modes use conservative defaults (no overfitting to a 2-GT sample).

The numbers in code are therefore ours, not a copy of any dataset, and they do not reproduce any standard's risk-graph table, decision tree or matrix.

Primary source — Eurostat ESAW

  • Dataset: European Statistics on Accidents at Work (ESAW), contact mode of injury.
  • License: CC BY 4.0 — commercial and non-commercial reuse permitted, source acknowledgement required.
  • Attribution string: Source: Eurostat (ESAW), CC BY 4.0 — surface this in any generated risk-assessment export that shows engine risk numbers.
  • URL: https://ec.europa.eu/eurostat/statistics-explained/index.php/Accidents_at_work_-_statistics_on_causes_and_circumstances
  • Aggregate facts used (anchor only): contact-mode shares of accidents at work, e.g. impact with stationary object ~24%, struck by moving object ~13% (non-fatal) / ~24% (fatal), trapped/crushed ~14% (fatal), contact with sharp agent ~15%. Retrieved 2026-06.

Acceptable supplements

  • US BLS / OSHA (Bureau of Labor Statistics, occupational injuries) — U.S. Government work, public domain; free for any use.
  • UK HSE (RIDDOR / kinds-of-accident) — Open Government Licence v3; commercial reuse with attribution.

Explicitly excluded

  • DGUV statistics — terms grant only editorial use and forbid modification / re-licensing; unsuitable for a commercial product. Not used.
  • DIN / Beuth / ISO / IEC standards (e.g. risk-graph tables, parameter decision trees, SIL/PL matrices) — copyrighted; not reproduced or re-implemented. Our model uses only the universal, non-protectable risk dimensions (severity, frequency, probability, avoidance).

Maintenance

When a tier in contactModeTable changes, record the source figure and the GT calibration basis here. Add this file to the repository SBOM / license register alongside software dependencies.