fix: word_boxes auch fuer breite Spalten (Full-Page OCR) speichern

word_boxes wurden nur im Cell-Crop-Pfad (narrow columns) gesetzt, aber nicht im Full-Page Word-Assignment-Pfad (broad columns). Jetzt werden die Tesseract-Wort-Koordinaten in beiden Pfaden gespeichert. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 20:41:29 +01:00
parent 35f2706098
commit 7d19145edb
1 changed files with 14 additions and 0 deletions
@@ -458,6 +458,20 @@ def build_cell_grid_v2(
                    'ocr_engine': 'word_lookup',
                    'is_bold': False,
                }
+                # Store word bounding boxes for pixel-accurate overlay
+                if words and text.strip():
+                    cell['word_boxes'] = [
+                        {
+                            'text': w.get('text', ''),
+                            'left': w['left'],
+                            'top': w['top'],
+                            'width': w['width'],
+                            'height': w['height'],
+                            'conf': w.get('conf', 0),
+                        }
+                        for w in words
+                        if w.get('text', '').strip()
+                    ]
                cells.append(cell)

    # --- Phase 2: Narrow columns via cell-crop OCR (parallel) ---