fix(rag): Dedup check, BGB split, GewO timeout, arithmetic fix
- Add Qdrant dedup check in upload_file() — skip if regulation_id already exists - Split BGB (2.7MB) into 5 targeted parts via XML extraction: AGB §§305-310, Fernabsatz §§312-312k, Kaufrecht §§433-480, Widerruf §§355-361, Digitale Produkte §§327-327u - Lower large-file threshold 512KB→384KB (fixes GewO 432KB timeout) - Fix arithmetic syntax error when collection_count returns "?" - Replace EGBGB PDF (was empty) with XML extraction - Add unzip to Alpine container for XML archives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -78,7 +78,7 @@ jobs:
|
||||
-e "SDK_URL=http://bp-compliance-ai-sdk:8090" \
|
||||
alpine:3.19 \
|
||||
sh -c "
|
||||
apk add --no-cache curl bash coreutils git python3 > /dev/null 2>&1
|
||||
apk add --no-cache curl bash coreutils git python3 unzip > /dev/null 2>&1
|
||||
mkdir -p /tmp/rag-ingestion/{pdfs,repos,texts}
|
||||
cd /workspace
|
||||
if [ '${PHASE}' = 'all' ]; then
|
||||
|
||||
Reference in New Issue
Block a user