-
f615c5f66d
feat(ocr-pipeline): generic header/footer detection via projection gap analysis
Benjamin Admin
2026-03-02 16:13:48 +01:00
-
a052f73de3
fix(ocr-pipeline): pass left_x/right_x to classify_column_types in API path
Benjamin Admin
2026-03-02 15:42:39 +01:00
-
34ccdd5fd1
feat(ocr-pipeline): filter scan artifacts in content bounds and add margin regions
Benjamin Admin
2026-03-02 15:29:18 +01:00
-
e718353d9f
feat(ocr-pipeline): 6 systematic improvements for robustness, performance & UX
Benjamin Admin
2026-03-02 14:46:38 +01:00
-
c3a924a620
fix(ocr-pipeline): merge phonetic-only rows and fix bracket noise filter
Benjamin Admin
2026-03-02 14:14:20 +01:00
-
650f15bc1b
fix(ocr-pipeline): tolerate dictionary punctuation in noise filter
Benjamin Admin
2026-03-02 13:12:40 +01:00
-
40a77a82f6
fix(ocr-pipeline): use midpoint boundaries for column word assignment
Benjamin Admin
2026-03-02 12:53:56 +01:00
-
87931c35e4
fix(ocr-pipeline): stop noise filter from stripping parenthesized words
Benjamin Admin
2026-03-02 12:51:28 +01:00
-
29b1d95acc
fix(ocr-pipeline): improve word-column assignment and LLM review accuracy
Benjamin Admin
2026-03-02 12:40:26 +01:00
-
dbf0db0c13
feat(ocr-pipeline): improve LLM review UI + add reconstruction step
Benjamin Admin
2026-03-02 12:19:21 +01:00
-
2a493890b6
feat(ocr-pipeline): add SSE streaming and phonetic filter to LLM review
Benjamin Admin
2026-03-02 11:46:06 +01:00
-
e171a736e7
fix(ocr-pipeline): increase LLM timeout to 300s and disable qwen3 thinking
Benjamin Admin
2026-03-02 11:31:03 +01:00
-
938d1d69cf
feat(ocr-pipeline): add LLM-based OCR correction step (Step 6)
Benjamin Admin
2026-03-02 11:13:17 +01:00
-
e9f368d3ec
feat(ocr-pipeline): add abbreviation allowlist to noise filter
Benjamin Admin
2026-03-02 10:46:54 +01:00
-
3028f421b4
feat(ocr-pipeline): add cell text noise filter for OCR artifacts
Benjamin Admin
2026-03-02 10:19:31 +01:00
-
2b1c499d54
fix(ocr-pipeline): filter OCR noise from image areas and artifacts
Benjamin Admin
2026-03-02 09:56:54 +01:00
-
72cc77dcf4
fix(ocr-pipeline): cells = result, no post-processing content shuffling
Benjamin Admin
2026-03-02 09:41:30 +01:00
-
e3f939a628
refactor(ocr-pipeline): make post-processing fully generic
Benjamin Admin
2026-03-02 09:27:30 +01:00
-
6bca3370e0
fix(ocr-pipeline): fix vocab post-processing destroying correct cell results
Benjamin Admin
2026-03-02 09:16:50 +01:00
-
befc44d2dd
perf(ocr-pipeline): limit cell-OCR fallback to EN/DE columns only
Benjamin Admin
2026-03-02 09:01:08 +01:00
-
6db3c02db4
fix(admin-lehrer): force unique build ID to bust browser caches
Benjamin Admin
2026-03-02 08:54:05 +01:00
-
8f2c2e8f68
feat(ocr-pipeline): hybrid word-lookup with cell-OCR fallback
Benjamin Admin
2026-03-02 08:21:12 +01:00
-
50ad06f43a
fix(ocr-pipeline): always run fresh word detection, skip stale cache
Benjamin Admin
2026-03-02 08:05:13 +01:00
-
2c4160e4c4
fix(ocr-pipeline): exclusive word-to-column assignment prevents duplicates
Benjamin Admin
2026-03-02 07:54:45 +01:00
-
9bbde1c03e
fix(ocr-pipeline): re-populate row.words for word-lookup in Step 5
Benjamin Admin
2026-03-02 07:38:33 +01:00
-
77869e32f4
feat(ocr-pipeline): use word-lookup instead of cell-OCR for cell grid
Benjamin Admin
2026-03-02 07:24:46 +01:00
-
89b5f49918
fix(ocr-pipeline): filter phantom rows with word_count=0 from cell grid
Benjamin Admin
2026-03-01 18:40:13 +01:00
-
7f27783008
feat(ocr-pipeline): add SSE streaming for word recognition (Step 5)
Benjamin Admin
2026-03-01 17:54:20 +01:00
-
a666e883da
fix(ocr-pipeline): exclude header/footer/page_ref from cell grid columns
Benjamin Admin
2026-03-01 17:33:48 +01:00
-
27b895a848
feat(ocr-pipeline): generic cell-grid with optional vocab mapping
Benjamin Admin
2026-03-01 17:22:56 +01:00
-
3bcb7aa638
fix(ocr-pipeline): remove overzealous grid row count validation
Benjamin Admin
2026-03-01 13:01:27 +01:00
-
c4f2e6554e
fix(ocr-pipeline): prevent grid from producing more rows than gap-based
Benjamin Admin
2026-03-01 12:52:41 +01:00
-
8e861e5a4d
fix(ocr-pipeline): use gap-based row height for cluster tolerance
Benjamin Admin
2026-03-01 12:34:15 +01:00
-
4970ca903e
fix(ocr-pipeline): invalidate downstream results when steps are re-run
Benjamin Admin
2026-03-01 12:24:44 +01:00
-
97d4355aa9
fix(ocr-pipeline): group words by vertical center, merge close clusters
Benjamin Admin
2026-03-01 12:14:42 +01:00
-
8ad5823fd8
feat(ocr-pipeline): word-center grid with section-break detection
Benjamin Admin
2026-03-01 12:04:08 +01:00
-
ec47045c15
feat(ocr-pipeline): uniform grid regularization for row detection (Step 7)
Benjamin Admin
2026-03-01 11:50:50 +01:00
-
ba65e47654
feat(ocr-pipeline): move oversized row splitting from Step 5 to Step 4
Benjamin Admin
2026-03-01 11:46:18 +01:00
-
8507e2e035
fix(ocr-pipeline): split oversized cells before OCR to capture all text
Benjamin Admin
2026-03-01 11:32:10 +01:00
-
854d8b431b
feat(rag-qa): add 14 missing PDF mappings for EDPB, ENISA, EDPS, TMG, UrhG
Benjamin Admin
2026-03-01 11:10:09 +01:00
-
f2521d2b9e
feat(ocr-pipeline): British/American IPA pronunciation choice
Benjamin Admin
2026-03-01 11:08:52 +01:00
-
954d21e469
fix: use local Inter font to avoid Google Fonts timeout in Docker build
Benjamin Admin
2026-02-28 21:26:34 +01:00
-
010616be5a
fix(ocr-pipeline): generic example attachment + cell padding
Benjamin Admin
2026-02-28 21:24:28 +01:00
-
e3aa8e899e
feat(rag-qa): add fullscreen mode for split-view chunk browser
Benjamin Admin
2026-02-28 21:23:32 +01:00
-
266b9dfad3
Fix PDF 404: default to bp_compliance_ce collection, add PDF existence check
Benjamin Admin
2026-02-28 21:13:26 +01:00
-
ab294d5a6f
feat(ocr-pipeline): deterministic post-processing pipeline
Benjamin Admin
2026-02-28 21:00:09 +01:00
-
b48cd8bb46
Fix ChunkBrowserQA layout: proper height constraints, remove bottom nav duplication
Benjamin Admin
2026-02-28 20:24:50 +01:00
-
d481e0087b
deps: add eng-to-ipa for IPA dictionary lookup
Benjamin Admin
2026-02-28 20:23:40 +01:00
-
f7e0f2bb4f
feat(ocr-pipeline): line breaks, hyphen rejoin & oversized row splitting
Benjamin Admin
2026-02-28 18:49:28 +01:00
-
e7fb9d59f1
Fix ChunkBrowserQA: use regulation_id from Qdrant payload instead of regulation_code
Benjamin Admin
2026-02-28 18:22:12 +01:00
-
859342300e
fix(ocr-pipeline): configure RapidOCR for German + tighter word detection
Benjamin Admin
2026-02-28 18:17:49 +01:00
-
8c42fefa77
feat(rag): add QA Split-View Chunk-Browser for ingestion verification
Benjamin Admin
2026-02-28 17:46:11 +01:00
-
984dfab975
fix(ocr-pipeline): add libgl1 for RapidOCR OpenCV dependency
Benjamin Admin
2026-02-28 17:30:12 +01:00
-
45435f226f
feat(ocr-pipeline): line grouping fix + RapidOCR integration
Benjamin Admin
2026-02-28 17:13:58 +01:00
-
4ec7c20490
feat(ocr-pipeline): add rapidocr + onnxruntime to requirements
Benjamin Admin
2026-02-28 17:08:21 +01:00
-
17604b8eb2
test: add tests for API proxy scroll/collection-count and Chunk-Browser logic
Benjamin Admin
2026-02-28 16:46:42 +01:00
-
f39314fb27
docs: add Chunk-Browser documentation
Benjamin Admin
2026-02-28 09:50:36 +01:00
-
356d39d6ee
fix(ocr-pipeline): use PSM 6 (block) for multi-line cell OCR in word grid
Benjamin Admin
2026-02-28 09:40:04 +01:00
-
491df4e1b0
feat: add Chunk-Browser tab to RAG page
Benjamin Admin
2026-02-28 09:35:52 +01:00
-
954103cdf2
feat(ocr-pipeline): add Step 5 word recognition (grid from columns × rows)
Benjamin Admin
2026-02-28 02:18:29 +01:00
-
47dc2e6f7a
feat(rag): source URLs, low-chunk warnings & IFRS/EFRAG entries
Benjamin Admin
2026-02-28 01:56:09 +01:00
-
203b3c0e2d
fix(ocr-pipeline): mask out images in row detection horizontal projection
Benjamin Admin
2026-02-28 01:39:20 +01:00
-
b58aecd081
feat(ocr-pipeline): add Step 4 row detection UI in admin frontend
Benjamin Admin
2026-02-28 01:28:05 +01:00
-
04b83d5f46
feat(ocr-pipeline): add row detection step with horizontal gap analysis
Benjamin Admin
2026-02-28 01:14:31 +01:00
-
c7ae44ff17
feat(rag): add 42 new regulations to RAG overview + update collection totals
Benjamin Admin
2026-02-28 01:04:27 +01:00
-
ce0815007e
feat(ocr-pipeline): replace clustering column detection with whitespace-gap analysis
Benjamin Admin
2026-02-28 00:36:28 +01:00
-
b03cb0a1e6
Fix Landkarte tab crash: variable name shadowed isInRag function
Benjamin Admin
2026-02-28 00:01:01 +01:00
-
5a45cbf605
Update RAG page: Chunks/Status columns use hardcoded data, Key Intersections show RAG status
Benjamin Admin
2026-02-27 23:53:21 +01:00
-
164b35c06a
fix(ocr-pipeline): tighten page_ref constraints based on live testing
Benjamin Admin
2026-02-27 23:33:11 +01:00
-
2297f66edb
feat(rag): Add RAG status indicators and 4 new EU regulations
Benjamin Admin
2026-02-27 23:23:52 +01:00
-
db8327f039
fix(ocr-pipeline): tune column detection based on GT comparison
Benjamin Admin
2026-02-27 23:16:31 +01:00
-
587b066a40
feat(ocr-pipeline): ground-truth comparison tool for column detection
Benjamin Admin
2026-02-27 22:48:37 +01:00
-
03fa186fec
fix(ocr-pipeline): increase merge distance to 6% for better column merging
Benjamin Admin
2026-02-27 20:19:09 +01:00
-
1040729874
fix(ocr-pipeline): avoid backslash in f-string for Python 3.11 compat
Benjamin Admin
2026-02-27 20:06:20 +01:00
-
4f37afa222
feat(ocr-pipeline): verticality filter for column detection
Benjamin Admin
2026-02-27 19:57:13 +01:00
-
bb879a03a8
feat(ocr-pipeline): add column_ignore type for margins/empty areas
Benjamin Admin
2026-02-27 08:51:56 +01:00
-
f535d3c967
fix(ocr-pipeline): manual editor layout + no re-detection on cached result
Benjamin Admin
2026-02-27 08:45:49 +01:00
-
7a3570fe46
feat(ocr-pipeline): manual column editor for Step 3
Benjamin Admin
2026-02-27 08:27:54 +01:00
-
1393a994f9
Flexible inhaltsbasierte Spaltenerkennung (2-Phasen)
Benjamin Admin
2026-02-26 23:33:35 +01:00
-
cf27a95308
feat(ocr-pipeline): word-based 5-column detection for vocabulary pages
Benjamin Admin
2026-02-26 23:08:14 +01:00
-
aa06ae0f61
feat: Persistente Sessions (PostgreSQL) + Spaltenerkennung (Step 3)
Benjamin Admin
2026-02-26 22:16:37 +01:00
-
09b820efbe
refactor(dewarp): replace displacement map with affine shear correction
Benjamin Admin
2026-02-26 18:23:04 +01:00
-
ff2bb79a91
fix(dewarp): change manual slider to percentage (0-200%) instead of raw multiplier
Benjamin Admin
2026-02-26 18:10:34 +01:00
-
fb496c5e34
perf(klausur-service): split Dockerfile into base + app layer
Benjamin Admin
2026-02-26 17:43:24 +01:00
-
9df745574b
fix(ocr-pipeline): dewarp visibility, grid on both sides, session persistence
Benjamin Admin
2026-02-26 17:29:53 +01:00
-
44e8c573af
fix: Deskew Ground Truth Frage auf Rotation beschraenken
Benjamin Admin
2026-02-26 17:16:24 +01:00
-
589d2f811a
feat: Dewarp-Korrektur als Schritt 2 in OCR Pipeline (7 Schritte)
Benjamin Admin
2026-02-26 16:46:41 +01:00
-
d552fd8b6b
feat: OCR Pipeline mit 6-Schritt-Wizard fuer Seitenrekonstruktion
Benjamin Admin
2026-02-26 15:38:08 +01:00
-
e7b6654b85
docs: update CLAUDE.md for direct MacBook development workflow
Benjamin Admin
2026-02-25 23:09:42 +01:00
-
41a8f3b183
feat: add Coolify deployment configuration
Sharang Parnerkar
2026-02-25 10:43:15 +01:00
-
-
414e0f5ec0
feat: edu-search-service migriert, voice-service/geo-service entfernt
Benjamin Boenisch
2026-02-15 18:36:38 +01:00
-
d4e1d6bab6
fix: correct gradeToOberstufenPoints formula for grades < 2.0
Benjamin Boenisch
2026-02-15 17:32:24 +01:00
-
ccff37f91b
fix(ci): replace actions/checkout with manual git clone
Benjamin Boenisch
2026-02-15 16:58:30 +01:00
-
ce32850b09
fix(ci): use docker runner label instead of ubuntu-latest
Benjamin Boenisch
2026-02-15 16:53:32 +01:00
-
50d0e65ef1
ci: add Gitea Actions workflow for external CI
Benjamin Boenisch
2026-02-15 16:39:00 +01:00
-
4fd4b08f75
remove: geo-service komplett entfernt
Benjamin Boenisch
2026-02-15 16:15:29 +01:00
-
e76ae5d510
fix: agent-core test failures in session_manager and message_bus
Benjamin Boenisch
2026-02-15 13:50:54 +01:00
-
0dae5da405
fix: geo-service Pakete einzeln installieren (rasterio braucht GDAL)
Benjamin Boenisch
2026-02-15 13:30:00 +01:00
-
5ff2c8bad4
refactor: voice-service entfernt (verschoben nach breakpilot-core)
Benjamin Boenisch
2026-02-15 13:26:07 +01:00
-
d075973a08
fix: geo-service und agent-core Test-Imports in Pipeline
Benjamin Boenisch
2026-02-15 13:15:13 +01:00