Files
breakpilot-core/control-pipeline/scripts/gpre0_run_all.sh
T
Benjamin Admin 8510af46eb feat(pipeline): MC Quality Overhaul — 74.5% → 92.8% accuracy, 5.3K → 13.6K MCs
Phase 0: Quality Audit script (Claude Sonnet, 1750 samples)
Phase 1: Object ontology expanded 31 → 74 tokens with descriptions + boundaries
Phase 2: 174K controls re-classified via Haiku (10 batches, $50)
  - Generic tokens removed (documentation, procedure, process)
  - L2 sub-topics added (108K + 64K controls)
  - Bad subtopics fixed (stakeholder_*, escalation fragments)
Phase 3: Re-clustering K=18704 (37K objects → 16.7K groups)
Phase 4: Direct MC generation from canonical tokens (gpre2_direct_mc.py)
Phase 5: Regulation-source split (gpre3, dry-run tested)

New features:
- Tenant-isolated document upload API (rag-service)
- BAuA crawler (Playwright, 131 PDFs downloaded)
- OSHA Technical Manual crawler (23 chapters)
- CE obligation extractor (6141 obligations from Qdrant)

RAG ingestion:
- 126 BAuA PDFs (TRBS/TRGS/ASR): 27,664 chunks
- OSHA Technical Manual: 7,241 chunks
- OSHA 1910 Subpart O (full): 745 chunks
- EuGH C-588/21 P: 216 chunks
- EU 2018/1725: 842 chunks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-10 15:08:15 +02:00

38 lines
1.0 KiB
Bash

#!/bin/bash
# Run all 10 batches sequentially. Safe: if one fails, the rest don't run.
# Each batch saves corrections to JSON before applying to DB.
#
# Usage: bash /app/scripts/gpre0_run_all.sh
# bash /app/scripts/gpre0_run_all.sh 5 # start from batch 5
set -e
START=${1:-1}
TOTAL=10
echo "=== Starting from batch $START of $TOTAL ==="
for i in $(seq $START $TOTAL); do
echo ""
echo "================================================================"
echo " BATCH $i/$TOTAL$(date)"
echo "================================================================"
PYTHONPATH=/app python3 /app/scripts/gpre0_validate_hints.py \
--batch-id $i \
--total-batches $TOTAL \
--batch-size 20
EXIT_CODE=$?
if [ $EXIT_CODE -ne 0 ]; then
echo "BATCH $i FAILED with exit code $EXIT_CODE"
echo "Resume with: bash /app/scripts/gpre0_run_all.sh $i"
exit $EXIT_CODE
fi
echo "BATCH $i DONE — $(date)"
done
echo ""
echo "ALL $TOTAL BATCHES COMPLETE!"