Commit Graph

7 Commits

Author SHA1 Message Date
Benjamin Admin
57f390190d fix(rag): Arithmetic error, dedup auth, EGBGB timeout
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 41s
CI/CD / test-python-backend-compliance (push) Successful in 41s
CI/CD / test-python-document-crawler (push) Successful in 27s
CI/CD / test-python-dsms-gateway (push) Successful in 21s
CI/CD / deploy-hetzner (push) Successful in 19s
- collection_count() returns 0 (not ?) on failure — fixes arithmetic error
- Pass QDRANT_API_KEY to ingestion container for dedup checks
- Include api-key header in collection_count() and dedup scroll queries
- Lower large-file threshold to 256KB (EGBGB 310KB was timing out)
- More targeted EGBGB XML extraction (Art. 246a + Anlage only)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:05:07 +01:00
Benjamin Admin
c88653b221 fix(rag): Dedup check, BGB split, GewO timeout, arithmetic fix
- Add Qdrant dedup check in upload_file() — skip if regulation_id already exists
- Split BGB (2.7MB) into 5 targeted parts via XML extraction:
  AGB §§305-310, Fernabsatz §§312-312k, Kaufrecht §§433-480,
  Widerruf §§355-361, Digitale Produkte §§327-327u
- Lower large-file threshold 512KB→384KB (fixes GewO 432KB timeout)
- Fix arithmetic syntax error when collection_count returns "?"
- Replace EGBGB PDF (was empty) with XML extraction
- Add unzip to Alpine container for XML archives

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:39:09 +01:00
Benjamin Admin
87d06c8b20 fix(rag): Handle large file uploads + don't abort on individual failures
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 1m5s
CI/CD / test-python-backend-compliance (push) Successful in 43s
CI/CD / test-python-document-crawler (push) Successful in 33s
CI/CD / test-python-dsms-gateway (push) Successful in 27s
CI/CD / deploy-hetzner (push) Successful in 17s
- Extended timeout (15 min) for files > 500KB (BGB is 1.5MB)
- upload_file returns 0 even on failure so set -e doesn't kill script
- Failed uploads are still counted and reported in summary

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 23:33:28 +01:00
Benjamin Admin
363bf9606a fix(ci): Connect runner to breakpilot-network for RAG ingestion
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 38s
CI/CD / test-python-backend-compliance (push) Successful in 36s
CI/CD / test-python-document-crawler (push) Successful in 28s
CI/CD / test-python-dsms-gateway (push) Successful in 22s
CI/CD / deploy-hetzner (push) Failing after 1s
- Join breakpilot-network so bp-core-rag-service is reachable
- Make RAG_URL/QDRANT_URL in script respect env vars (${VAR:-default})
- Remove complex fallback logic — fail fast if network not available

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 17:48:13 +01:00
Benjamin Admin
ebe7e90bd8 feat(rag): Expand Phase H to Layer 1 Safe Core (~60 documents)
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 40s
CI/CD / test-python-backend-compliance (push) Successful in 39s
CI/CD / test-python-document-crawler (push) Successful in 29s
CI/CD / test-python-dsms-gateway (push) Successful in 25s
CI/CD / deploy-hetzner (push) Failing after 1s
Phase H now includes:
- 16 German laws (PAngV, VSBG, ProdHaftG, BDSG, HGB, AO, DDG, TKG, etc.)
- 15 EUR-Lex EU laws (DSGVO, Consumer Rights Dir, Sale of Goods Dir,
  E-Commerce Dir, Unfair Terms Dir, DMA, NIS2, Product Liability Dir, etc.)
- 2 NIST frameworks (CSF 2.0, Privacy Framework 1.0)
- 1 HLEG Ethics Guidelines

Updated rag-sources.md with complete inventory of already-ingested vs
new documents, plus Layer 2-5 TODO roadmap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:54:07 +01:00
Benjamin Admin
7f38df9d9c feat(scope): Split HT-H01 B2B/B2C + register Verbraucherschutz document types + RAG ingestion
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 38s
CI/CD / test-python-backend-compliance (push) Successful in 39s
CI/CD / test-python-document-crawler (push) Successful in 27s
CI/CD / test-python-dsms-gateway (push) Successful in 24s
CI/CD / deploy-hetzner (push) Has been cancelled
- Split HT-H01 into HT-H01a (B2C/Hybrid mit Verbraucherschutzpflichten) und
  HT-H01b (reiner B2B mit Basis-Pflichten). B2B-Webshops bekommen keine
  Widerrufsbelehrung/Preisangaben/Fernabsatz mehr.
- Add excludeWhen/requireWhen to HardTriggerRule for conditional trigger logic
- Register 6 neue ScopeDocumentType: widerrufsbelehrung, preisangaben,
  fernabsatz_info, streitbeilegung, produktsicherheit, ai_act_doku
- Full DOCUMENT_SCOPE_MATRIX L1-L4 for all new types
- Align HardTriggerRule interface with actual engine field names
- Add Phase H (Verbraucherschutz) to RAG ingestion script:
  10 deutsche Gesetze + 4 EU-Verordnungen + HLEG Ethics Guidelines
- Add scripts/rag-sources.md with license documentation
- 9 new tests for B2B/B2C trigger split, all 326 tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:03:49 +01:00
Benjamin Admin
a228b3b528 feat: add RAG corpus versioning and source policy backend
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-ai-compliance (push) Successful in 34s
CI / test-python-backend-compliance (push) Successful in 32s
CI / test-python-document-crawler (push) Successful in 23s
CI / test-python-dsms-gateway (push) Successful in 18s
Part 1 — RAG Corpus Versioning:
- New DB table compliance_corpus_versions (migration 017)
- Go CorpusVersionStore with CRUD operations
- Assessment struct extended with corpus_version_id
- API endpoints: GET /rag/corpus-status, /rag/corpus-versions/:collection
- RAG routes (search, regulations) now registered in main.go
- Ingestion script registers corpus versions after each run
- Frontend staleness badge in SDK sidebar

Part 3 — Source Policy Backend:
- New FastAPI router with CRUD for allowed sources, PII rules,
  operations matrix, audit trail, stats, and compliance report
- SQLAlchemy models for all source policy tables (migration 001)
- Frontend API base corrected from edu-search:8088/8089 to
  backend-compliance:8002/api

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 07:58:08 +01:00