Files
breakpilot-compliance/.gitea/workflows/rag-ingest.yaml
Benjamin Admin 363bf9606a
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 38s
CI/CD / test-python-backend-compliance (push) Successful in 36s
CI/CD / test-python-document-crawler (push) Successful in 28s
CI/CD / test-python-dsms-gateway (push) Successful in 22s
CI/CD / deploy-hetzner (push) Failing after 1s
fix(ci): Connect runner to breakpilot-network for RAG ingestion
- Join breakpilot-network so bp-core-rag-service is reachable
- Make RAG_URL/QDRANT_URL in script respect env vars (${VAR:-default})
- Remove complex fallback logic — fail fast if network not available

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 17:48:13 +01:00

80 lines
2.6 KiB
YAML

# Gitea Actions — RAG Legal Corpus Ingestion
#
# Manuell triggerbarer Workflow zur Ingestion von Rechtstexten in Qdrant.
# Trigger: Gitea UI → Actions → "RAG Ingestion" → Run
#
# Phasen: gesetze, eu, templates, datenschutz, verbraucherschutz, verify, version, all
#
# Voraussetzung: RAG-Service und Qdrant muessen auf Hetzner laufen.
name: RAG Ingestion
on:
workflow_dispatch:
inputs:
phase:
description: 'Ingestion Phase (gesetze, eu, templates, datenschutz, verbraucherschutz, verify, version, all)'
required: true
default: 'verbraucherschutz'
jobs:
ingest:
runs-on: docker
container: docker:27-cli
steps:
- name: Setup
run: |
apk add --no-cache git curl bash python3 > /dev/null 2>&1
- name: Checkout
run: |
git clone --depth 1 --branch main ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
- name: Join breakpilot-network
run: |
# Runner-Container ans breakpilot-network anhaengen,
# damit bp-core-rag-service erreichbar ist
CONTAINER_ID=$(cat /etc/hostname)
echo "Runner container: $CONTAINER_ID"
docker network connect breakpilot-network "$CONTAINER_ID" 2>/dev/null \
&& echo "Verbunden mit breakpilot-network" \
|| echo "WARNUNG: breakpilot-network nicht verfuegbar"
- name: Run Ingestion
run: |
set -euo pipefail
PHASE="${{ github.event.inputs.phase }}"
echo "=== RAG Ingestion: Phase ${PHASE} ==="
echo ""
export WORK_DIR="/tmp/rag-ingestion"
export QDRANT_URL="https://qdrant-dev.breakpilot.ai"
export RAG_URL="http://bp-core-rag-service:8097/api/v1/documents/upload"
export SDK_URL="http://bp-compliance-ai-sdk:8090"
mkdir -p "$WORK_DIR"/{pdfs,repos,texts}
echo "RAG API: $RAG_URL"
echo "Qdrant: $QDRANT_URL"
echo "Work Dir: $WORK_DIR"
echo ""
# Health Check: RAG ueber Container-Netzwerk erreichbar?
if ! curl -sf "$RAG_URL" -X POST -o /dev/null 2>/dev/null; then
echo "FEHLER: RAG API nicht erreichbar unter $RAG_URL"
echo "Stelle sicher, dass bp-core-rag-service laeuft und breakpilot-network existiert."
exit 1
fi
echo "RAG API erreichbar."
echo ""
if [ "$PHASE" = "all" ]; then
bash scripts/ingest-legal-corpus.sh
else
bash scripts/ingest-legal-corpus.sh --only "$PHASE"
fi
echo ""
echo "=== Ingestion abgeschlossen ==="