feat(ocr-pipeline): line grouping fix + RapidOCR integration

Fix A: Use _group_words_into_lines() with adaptive Y-tolerance to
correctly order words in multi-line cells (fixes word reordering bug).

RapidOCR: Add as alternative OCR engine (PaddleOCR models on ONNX
Runtime, native ARM64). Engine selectable via dropdown in UI or
?engine= query param. Auto mode prefers RapidOCR when available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-02-28 17:13:58 +01:00
parent 4ec7c20490
commit 45435f226f
4 changed files with 180 additions and 17 deletions

View File

@@ -143,6 +143,7 @@ export interface WordResult {
image_width: number
image_height: number
duration_seconds: number
ocr_engine?: string
summary: {
total_entries: number
with_english: number