feat(ci): Add manual RAG ingestion workflow for Gitea Actions
Some checks failed
CI/CD / go-lint (push) Has been cancelled
CI/CD / python-lint (push) Has been cancelled
CI/CD / nodejs-lint (push) Has been cancelled
CI/CD / test-go-ai-compliance (push) Has been cancelled
CI/CD / test-python-backend-compliance (push) Has been cancelled
CI/CD / test-python-document-crawler (push) Has been cancelled
CI/CD / test-python-dsms-gateway (push) Has been cancelled
CI/CD / deploy-hetzner (push) Has been cancelled
Some checks failed
CI/CD / go-lint (push) Has been cancelled
CI/CD / python-lint (push) Has been cancelled
CI/CD / nodejs-lint (push) Has been cancelled
CI/CD / test-go-ai-compliance (push) Has been cancelled
CI/CD / test-python-backend-compliance (push) Has been cancelled
CI/CD / test-python-document-crawler (push) Has been cancelled
CI/CD / test-python-dsms-gateway (push) Has been cancelled
CI/CD / deploy-hetzner (push) Has been cancelled
Adds workflow_dispatch-triggered job to run ingest-legal-corpus.sh on Hetzner. Supports phase selection (verbraucherschutz, gesetze, eu, etc.). Usage: Gitea UI → Actions → "RAG Ingestion" → Run (select phase) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
73
.gitea/workflows/rag-ingest.yaml
Normal file
73
.gitea/workflows/rag-ingest.yaml
Normal file
@@ -0,0 +1,73 @@
|
||||
# Gitea Actions — RAG Legal Corpus Ingestion
|
||||
#
|
||||
# Manuell triggerbarer Workflow zur Ingestion von Rechtstexten in Qdrant.
|
||||
# Trigger: Gitea UI → Actions → "RAG Ingestion" → Run
|
||||
#
|
||||
# Phasen: gesetze, eu, templates, datenschutz, verbraucherschutz, verify, version, all
|
||||
#
|
||||
# Voraussetzung: RAG-Service und Qdrant muessen auf Hetzner laufen.
|
||||
|
||||
name: RAG Ingestion
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
phase:
|
||||
description: 'Ingestion Phase (gesetze, eu, templates, datenschutz, verbraucherschutz, verify, version, all)'
|
||||
required: true
|
||||
default: 'verbraucherschutz'
|
||||
|
||||
jobs:
|
||||
ingest:
|
||||
runs-on: docker
|
||||
container: python:3.12-slim
|
||||
steps:
|
||||
- name: Setup
|
||||
run: |
|
||||
apt-get update -qq && apt-get install -y -qq git curl > /dev/null 2>&1
|
||||
|
||||
- name: Checkout
|
||||
run: |
|
||||
git clone --depth 1 --branch main ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
|
||||
|
||||
- name: Check RAG service
|
||||
run: |
|
||||
# RAG-Service laeuft auf dem Host, nicht im Container
|
||||
# Qdrant ist extern erreichbar
|
||||
echo "Checking Qdrant..."
|
||||
curl -sf "${QDRANT_URL}/collections" > /dev/null 2>&1 \
|
||||
&& echo "Qdrant: OK" \
|
||||
|| echo "WARNUNG: Qdrant nicht erreichbar (${QDRANT_URL})"
|
||||
|
||||
echo "Checking RAG API..."
|
||||
curl -sf -k "${RAG_URL}" -X POST 2>/dev/null | head -c 200 \
|
||||
&& echo "" && echo "RAG API: OK" \
|
||||
|| echo "WARNUNG: RAG API nicht erreichbar (${RAG_URL})"
|
||||
env:
|
||||
QDRANT_URL: "https://qdrant-dev.breakpilot.ai"
|
||||
RAG_URL: "https://localhost:8097/api/v1/documents/upload"
|
||||
|
||||
- name: Run Ingestion
|
||||
run: |
|
||||
set -euo pipefail
|
||||
PHASE="${{ github.event.inputs.phase }}"
|
||||
|
||||
echo "=== RAG Ingestion: Phase ${PHASE} ==="
|
||||
echo ""
|
||||
|
||||
# Konfiguration fuer Hetzner
|
||||
export WORK_DIR="/tmp/rag-ingestion"
|
||||
export RAG_URL="https://localhost:8097/api/v1/documents/upload"
|
||||
export QDRANT_URL="https://qdrant-dev.breakpilot.ai"
|
||||
export SDK_URL="https://localhost:8093"
|
||||
|
||||
mkdir -p "$WORK_DIR"/{pdfs,repos,texts}
|
||||
|
||||
if [ "$PHASE" = "all" ]; then
|
||||
bash scripts/ingest-legal-corpus.sh
|
||||
else
|
||||
bash scripts/ingest-legal-corpus.sh --only "$PHASE"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=== Ingestion abgeschlossen ==="
|
||||
Reference in New Issue
Block a user