Executes the accident-statistics pipeline for the risk anchors: - Refresh contactModeEvidence with real Eurostat ESAW figures (dataset hsw_ph3_08, reference year 2023): impact 24.0%/21.4%, struck-by 13.0%/23.8%, sharp 14.5%, trapped/crushed 13.8% (fatal), + new physical/mental-stress mode 24.7% → ergonomic. GT-calibrated tier VALUES unchanged; the real data confirms the ordering. - Add the versioned source document (datasources/esaw_accident_stats_2023.md, ESAW CC BY 4.0 + OSHA public-domain context) that is ingested into the core RAG collection bp_iace_accident_stats for searchable evidence. - Whitelist bp_iace_accident_stats in the RAG search handler so seeding can full-text search the statistics with citation at seed time. Two-layer design: the small license-tagged code table stays the deterministic tier/citation lookup; the RAG holds the searchable source evidence. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3.0 KiB
Risk-estimation data sources & licenses
Provenance for the probability (W) / avoidance (P) tiers in risk_estimation.go
(contactModeTable). We do not vendor any raw dataset — only the small
aggregate facts used as anchors plus our own calibrated tiers live in code.
What we use and how
The tiers are derived in two steps:
- Anchor — the relative ordering of injury contact modes from public, permissively-licensed occupational-accident statistics (which mechanisms are more vs. less frequent).
- Calibrate — adjust the tier values to our own ground-truth corpus (the professional's W/P per mode). Well-sampled modes are set to the GT mean; sparse modes use conservative defaults (no overfitting to a 2-GT sample).
The numbers in code are therefore ours, not a copy of any dataset, and they do not reproduce any standard's risk-graph table, decision tree or matrix.
Primary source — Eurostat ESAW
-
Dataset: European Statistics on Accidents at Work (ESAW), contact mode of injury.
-
License: CC BY 4.0 — commercial and non-commercial reuse permitted, source acknowledgement required.
-
Attribution string:
Source: Eurostat (ESAW), CC BY 4.0— surface this in any generated risk-assessment export that shows engine risk numbers. -
Aggregate facts used (anchor only): contact-mode shares of accidents at work. Dataset
hsw_ph3_08, reference year 2023 (Figure 7, "contact — mode of injury"), EU shares:- Physical/mental stress: 24.7% (non-fatal)
- Impact with stationary object (victim in motion): 24.0% (non-fatal) / 21.4% (fatal)
- Contact with sharp/pointed/rough agent: 14.5% (non-fatal)
- Struck by object in motion / collision: 13.0% (non-fatal) / 23.8% (fatal)
- Trapped / crushed: 13.8% (fatal)
Retrieved 2026-06. The source document is also ingested into the core RAG collection
bp_iace_accident_statsfor searchable evidence at seeding time.
Acceptable supplements
- US BLS / OSHA (Bureau of Labor Statistics, occupational injuries) — U.S. Government work, public domain; free for any use.
- UK HSE (RIDDOR / kinds-of-accident) — Open Government Licence v3; commercial reuse with attribution.
Explicitly excluded
- DGUV statistics — terms grant only editorial use and forbid modification / re-licensing; unsuitable for a commercial product. Not used.
- DIN / Beuth / ISO / IEC standards (e.g. risk-graph tables, parameter decision trees, SIL/PL matrices) — copyrighted; not reproduced or re-implemented. Our model uses only the universal, non-protectable risk dimensions (severity, frequency, probability, avoidance).
Maintenance
When a tier in contactModeTable changes, record the source figure and the GT
calibration basis here. Add this file to the repository SBOM / license register
alongside software dependencies.