feat(iace): ESAW accident-stats RAG pipeline + real 2023 risk anchors

Executes the accident-statistics pipeline for the risk anchors:
- Refresh contactModeEvidence with real Eurostat ESAW figures
  (dataset hsw_ph3_08, reference year 2023): impact 24.0%/21.4%,
  struck-by 13.0%/23.8%, sharp 14.5%, trapped/crushed 13.8% (fatal),
  + new physical/mental-stress mode 24.7% → ergonomic. GT-calibrated
  tier VALUES unchanged; the real data confirms the ordering.
- Add the versioned source document (datasources/esaw_accident_stats_2023.md,
  ESAW CC BY 4.0 + OSHA public-domain context) that is ingested into the
  core RAG collection bp_iace_accident_stats for searchable evidence.
- Whitelist bp_iace_accident_stats in the RAG search handler so seeding
  can full-text search the statistics with citation at seed time.

Two-layer design: the small license-tagged code table stays the deterministic
tier/citation lookup; the RAG holds the searchable source evidence.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-11 12:12:02 +02:00
parent 877d540ce1
commit 5e18df63b1
4 changed files with 95 additions and 21 deletions
@@ -0,0 +1,57 @@
# Accidents at work — contact mode of injury (EU, 2023)
Canonical, citable source document for the IACE risk-frequency/severity anchors.
This file is the versioned artifact that is ingested into the core RAG
collection `bp_iace_accident_stats` so seeding can full-text search the evidence
and surface the figure with its citation.
## Primary source — Eurostat ESAW
- **Source:** Eurostat — European Statistics on Accidents at Work (ESAW)
- **Dataset:** `hsw_ph3_08` — accidents at work by contact / mode of injury
- **Reference year:** 2023 (Statistics Explained, Figure 7)
- **License:** CC BY 4.0 (reuse permitted, source acknowledgement required)
- **Attribution:** `Quelle: Eurostat (ESAW) hsw_ph3_08, Bezugsjahr 2023, CC BY 4.0`
- **Retrieved:** 2026-06
- **URL:** https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Accidents_at_work_-_statistics_on_causes_and_circumstances
### Contact mode of injury — EU shares, 2023
| Contact mode | Non-fatal | Fatal |
|---|---|---|
| Physical or mental stress | 24.7 % | — |
| Impact with a stationary object (victim in motion) | 24.0 % | 21.4 % |
| Contact with a sharp / pointed / rough-coarse agent | 14.5 % | — |
| Being struck by an object in motion / collision | 13.0 % | 23.8 % |
| Being trapped or crushed | — | 13.8 % |
| No contact / no information | 9.6 % | 15.1 % |
Reading: the non-fatal column anchors the **frequency / probability tier (W)**
of a contact mode; the fatal column (and the fatal-vs-non-fatal gap) anchors its
typical **severity (S)**. Struck-by and trapped/crushed are comparatively rare
among non-fatal but over-represented among fatal accidents — i.e. lower
frequency, higher severity.
## Supplementary context — US OSHA (public domain)
- **Source:** OSHA — Commonly Used Statistics (U.S. Government work, public domain)
- **Retrieved:** 2026-06 · **URL:** https://www.osha.gov/data/commonstats
- 2023: **5,283** fatal work injuries in the US (3.5 per 100,000 FTE workers).
- Most frequently violated standards: Fall Protection, Hazard Communication,
Control of Hazardous Energy (Lockout/Tagout).
US BLS CFOI/SOII event-level tables (public domain) are an intended further
supplement; the BLS site blocks automated retrieval, so those figures are to be
added from a manually downloaded release.
## How these numbers are used
1. **Anchor (ordering):** the relative frequency/severity ordering of contact
modes above sets the *direction* of the W and S tiers in `risk_estimation.go`
(`contactModeTable`).
2. **Calibrate (values):** tier *values* are adjusted to BreakPilot ground truth;
well-sampled modes use the GT mean, sparse modes use conservative defaults —
no overfitting to a small GT sample.
No standard's risk-graph table, decision tree or SIL/PL matrix is reproduced.
Excluded by license: DGUV statistics, DIN/Beuth/ISO/IEC tables.