Commit Graph

2 Commits

Author SHA1 Message Date
BreakPilot Dev
e636b8cef8 feat(rag): Migrate national DPA laws from bp_dsfa_corpus to bp_legal_corpus
Move 23 sources (18 national data protection laws + 5 EDPB guidelines/SCC)
from bp_dsfa_corpus to bp_legal_corpus with vector preservation. Extend
REGULATIONS array with national_law and eu_guideline types. Mark migrated
sources in dsfa_corpus_ingestion.py to prevent re-ingestion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 23:26:18 +01:00
BreakPilot Dev
53219e3eaf feat(klausur-service): Add Tesseract OCR, DSFA RAG, TrOCR, grid detection and vocab session store
New modules:
- tesseract_vocab_extractor.py: Bounding-box OCR with multi-PSM pipeline
- grid_detection_service.py: CV-based grid/table detection for worksheets
- vocab_session_store.py: PostgreSQL persistence for vocab sessions
- trocr_api.py: TrOCR handwriting recognition endpoint
- dsfa_rag_api.py + dsfa_corpus_ingestion.py: DSFA RAG corpus search

Changes:
- Dockerfile: Install tesseract-ocr + deu/eng language packs
- requirements.txt: Add PyMuPDF, pytesseract, Pillow
- main.py: Register new routers, init DB pools + Qdrant collections

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 00:00:19 +01:00