breakpilot-pwa

Archived

Author	SHA1	Message	Date
BreakPilot Dev	f927c0c205	feat(rag): Add DACH legal corpus ingestion (DE/AT/CH laws) Add 29 new regulations (7 DE + 7 AT + 4 CH + 11 P2/P3) with country metadata, legal corpus text excerpts, and updated RAG admin UI with AT/CH type colors and labels. Fix module path in deploy script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 09:24:33 +01:00
BreakPilot Dev	e636b8cef8	feat(rag): Migrate national DPA laws from bp_dsfa_corpus to bp_legal_corpus Move 23 sources (18 national data protection laws + 5 EDPB guidelines/SCC) from bp_dsfa_corpus to bp_legal_corpus with vector preservation. Extend REGULATIONS array with national_law and eu_guideline types. Mark migrated sources in dsfa_corpus_ingestion.py to prevent re-ingestion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-10 23:26:18 +01:00
BreakPilot Dev	945b955b54	feat(ocr): Word-based image deskew for Ground Truth pipeline Begradigt schiefe Scans vor der OCR-Extraktion anhand der linksbuendigen Wortanfaenge der Vokabelspalte. Tesseract liefert achsenparallele Boxen, die bei ~2-3 Grad Schraege in Nachbarzeilen bluten — der Deskew behebt das. - Neue Funktion deskew_image_by_word_alignment() in cv_vocab_pipeline.py - Deskew-Integration im extract-with-boxes Endpoint (vor OCR) - Neuer GET Endpoint /deskewed-image/{page} fuer begradigtes Seitenbild - Frontend: GroundTruthPanel wechselt nach Extraktion auf deskewed Image - ~1s Overhead durch schnellen Tesseract-Pass auf halbiertem Bild Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-10 12:14:44 +01:00
BreakPilot Dev	8c77df494b	feat(ocr): Add Ground Truth labeling UI for OCR comparison Adds a step-through tool for creating 100% correct reference data (ground truth) with position information. Users scan a page, review each vocabulary entry with image crops, confirm or correct the OCR text, and save the result as JSON. Backend: extract_entries_with_boxes() helper + 3 endpoints (extract-with-boxes, ground-truth save/load). Frontend: GroundTruthPanel component with SVG overlay, ImageCrop, keyboard shortcuts (Enter/Tab/arrows), and tab navigation in page.tsx. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-10 09:04:36 +01:00
BreakPilot Dev	53219e3eaf	feat(klausur-service): Add Tesseract OCR, DSFA RAG, TrOCR, grid detection and vocab session store New modules: - tesseract_vocab_extractor.py: Bounding-box OCR with multi-PSM pipeline - grid_detection_service.py: CV-based grid/table detection for worksheets - vocab_session_store.py: PostgreSQL persistence for vocab sessions - trocr_api.py: TrOCR handwriting recognition endpoint - dsfa_rag_api.py + dsfa_corpus_ingestion.py: DSFA RAG corpus search Changes: - Dockerfile: Install tesseract-ocr + deu/eng language packs - requirements.txt: Add PyMuPDF, pytesseract, Pillow - main.py: Register new routers, init DB pools + Qdrant collections Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-10 00:00:19 +01:00
BreakPilot Dev	fa958d31f6	feat(ocr): Add CV Document Reconstruction Pipeline for vocabulary extraction New OCR method using classical Computer Vision: high-res rendering (432 DPI), deskew, dewarp, binarization, projection-profile layout analysis, multi-pass Tesseract OCR with region-specific PSM, and Y-coordinate line alignment. Includes bugfix for convert_pdf_to_image call (line 869) and 39 unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-09 23:52:35 +01:00
BreakPilot Dev	a7a5674818	fix(ocr-compare): Replace Ollama call in grid analysis with heuristic from comparison results Ollama crashes when two concurrent vision requests hit the 32B model (compare-ocr + analyze-grid). The grid analysis was redundantly calling Ollama again even though compare-ocr already extracted all vocabulary. - compare-ocr now saves vocabulary in session for reuse - analyze-grid builds grid from session data (no Ollama, instant response) - Grid button disabled until comparison results are available - Added export-to-editor functionality Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-09 18:08:13 +01:00
Benjamin Admin	21a844cb8a	fix: Restore all files lost during destructive rebase A previous `git pull --rebase origin main` dropped 177 local commits, losing 3400+ files across admin-v2, backend, studio-v2, website, klausur-service, and many other services. The partial restore attempt (`660295e2`) only recovered some files. This commit restores all missing files from pre-rebase ref 98933f5e while preserving post-rebase additions (night-scheduler, night-mode UI, NightModeWidget dashboard integration). Restored features include: - AI Module Sidebar (FAB), OCR Labeling, OCR Compare - GPU Dashboard, RAG Pipeline, Magic Help - Klausur-Korrektur (8 files), Abitur-Archiv (5+ files) - Companion, Zeugnisse-Crawler, Screen Flow - Full backend, studio-v2, website, klausur-service - All compliance SDKs, agent-core, voice-service - CI/CD configs, documentation, scripts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-09 09:51:32 +01:00
BreakPilot Dev	baee45b861	feat(ocr): Add Grid Detection v4 tests, docs, and SBOM update - Add comprehensive tests for grid_detection_service.py (31 tests) - mm coordinate conversion tests - Deskew calculation tests - Column detection tests - Integration tests for vocabulary tables - Add OCR-Compare documentation (OCR-Compare.md) - mm coordinate system documentation - Deskew correction documentation - Worksheet Editor integration guide - API endpoints documentation - Add TypeScript tests for ocr-integration.ts - mm to pixel conversion tests - OCR export format tests - localStorage operations tests - Update SBOM to v1.5.0 - Add OCR Grid Detection System section - Document Fabric.js (MIT) for Worksheet Editor - Document NumPy and OpenCV usage Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 21:31:35 -08:00

9 Commits