fix(rag): Always run download phase before ingestion phases
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 40s
CI/CD / test-python-backend-compliance (push) Successful in 37s
CI/CD / test-python-document-crawler (push) Successful in 26s
CI/CD / test-python-dsms-gateway (push) Successful in 23s
CI/CD / deploy-hetzner (push) Successful in 20s

The gesetze phase failed because it expects text files created by the
download phase. Now the workflow automatically runs download first for
any phase that depends on it. Also adds git and python3 to the alpine
container for repo cloning and text extraction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-11 23:13:33 +01:00
parent c14b31b3bc
commit 0b47612272

View File

@@ -78,12 +78,19 @@ jobs:
-e "SDK_URL=http://bp-compliance-ai-sdk:8090" \
alpine:3.19 \
sh -c "
apk add --no-cache curl bash coreutils > /dev/null 2>&1
apk add --no-cache curl bash coreutils git python3 > /dev/null 2>&1
mkdir -p /tmp/rag-ingestion/{pdfs,repos,texts}
cd /workspace
if [ '${PHASE}' = 'all' ]; then
bash scripts/ingest-legal-corpus.sh
elif [ '${PHASE}' = 'download' ]; then
bash scripts/ingest-legal-corpus.sh --only download
else
# Download-Phase muss immer zuerst laufen (erstellt Textdateien)
echo '=== Running download phase first ==='
bash scripts/ingest-legal-corpus.sh --only download
echo ''
echo '=== Running phase: ${PHASE} ==='
bash scripts/ingest-legal-corpus.sh --only '${PHASE}'
fi
"