feat(iace): ESAW accident-stats RAG pipeline + real 2023 risk anchors

Executes the accident-statistics pipeline for the risk anchors:
- Refresh contactModeEvidence with real Eurostat ESAW figures
  (dataset hsw_ph3_08, reference year 2023): impact 24.0%/21.4%,
  struck-by 13.0%/23.8%, sharp 14.5%, trapped/crushed 13.8% (fatal),
  + new physical/mental-stress mode 24.7% → ergonomic. GT-calibrated
  tier VALUES unchanged; the real data confirms the ordering.
- Add the versioned source document (datasources/esaw_accident_stats_2023.md,
  ESAW CC BY 4.0 + OSHA public-domain context) that is ingested into the
  core RAG collection bp_iace_accident_stats for searchable evidence.
- Whitelist bp_iace_accident_stats in the RAG search handler so seeding
  can full-text search the statistics with citation at seed time.

Two-layer design: the small license-tagged code table stays the deterministic
tier/citation lookup; the RAG holds the searchable source evidence.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-11 12:12:02 +02:00
parent 877d540ce1
commit 5e18df63b1
4 changed files with 95 additions and 21 deletions
@@ -9,9 +9,17 @@ import "sort"
// this table only carries the provenance so generated risk numbers are
// auditable and correctly attributed. No raw dataset is vendored; only these
// aggregate facts. Excluded by license: DGUV, DIN/Beuth/ISO/IEC. See
// DATA_SOURCES.md. RAG/Qdrant ingestion is deliberately NOT used here: ~a dozen
// stable aggregate facts are better served by a license-tagged code table than
// by vector retrieval.
// DATA_SOURCES.md.
//
// Two-layer design: this small license-tagged CODE table is the deterministic
// tier/citation lookup (fast, stable, no nondeterminism). The underlying SOURCE
// documents are additionally ingested into the core RAG collection
// `bp_iace_accident_stats` so the seeding UI / an auditor can full-text search
// the evidence and pull the original figure — the RAG is the evidence/search
// layer, not the tier lookup.
//
// Figures below are the EU aggregate shares from Eurostat ESAW dataset
// hsw_ph3_08, reference year 2023 (Figure 7, "contact - mode of injury").
// RiskEvidence is the public-statistics provenance for one contact mode.
type RiskEvidence struct {
@@ -25,9 +33,9 @@ type RiskEvidence struct {
}
const (
esawSource = "Eurostat (ESAW)"
esawSource = "Eurostat (ESAW, hsw_ph3_08, 2023)"
esawLicense = "CC BY 4.0"
esawAttribution = "Quelle: Eurostat (ESAW), CC BY 4.0"
esawAttribution = "Quelle: Eurostat (ESAW) hsw_ph3_08, Bezugsjahr 2023, CC BY 4.0"
esawRetrieved = "2026-06"
)
@@ -40,10 +48,11 @@ func esawEvidence(mode, label, stat string) RiskEvidence {
// figure is documented; other modes are anchored by the ESAW ordering and
// GT-calibrated without a single citable share, so they carry no fabricated stat.
var contactModeEvidence = map[string]RiskEvidence{
"impact_stationary": esawEvidence("impact_stationary", "Anstoßen an ruhendem Objekt", "~24 % der Arbeitsunfälle"),
"struck_by": esawEvidence("struck_by", "Getroffen von bewegtem Objekt", "~13 % (nicht-tödlich) / ~24 % (tödlich)"),
"crushing": esawEvidence("crushing", "Quetschen / Einklemmen", "~14 % der tödlichen Arbeitsunfälle"),
"cutting": esawEvidence("cutting", "Kontakt mit scharfem Gegenstand", "~15 % der Arbeitsunfälle"),
"impact_stationary": esawEvidence("impact_stationary", "Anstoßen an ruhendem Objekt", "24,0 % (nicht-tödlich) / 21,4 % (tödlich)"),
"struck_by": esawEvidence("struck_by", "Getroffen von bewegtem Objekt", "13,0 % (nicht-tödlich) / 23,8 % (tödlich)"),
"crushing": esawEvidence("crushing", "Eingeklemmt / zerquetscht", "13,8 % (tödlich)"),
"cutting": esawEvidence("cutting", "Kontakt mit scharfem/spitzem Agens", "14,5 % (nicht-tödlich)"),
"ergonomic": esawEvidence("ergonomic", "Physische/psychische Belastung", "24,7 % (nicht-tödlich)"),
}
// RiskEvidenceFor returns the documented public statistic for a contact mode.