fix: word_boxes auch fuer breite Spalten (Full-Page OCR) speichern
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m3s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 21s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m3s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 21s
word_boxes wurden nur im Cell-Crop-Pfad (narrow columns) gesetzt, aber nicht im Full-Page Word-Assignment-Pfad (broad columns). Jetzt werden die Tesseract-Wort-Koordinaten in beiden Pfaden gespeichert. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -458,6 +458,20 @@ def build_cell_grid_v2(
|
|||||||
'ocr_engine': 'word_lookup',
|
'ocr_engine': 'word_lookup',
|
||||||
'is_bold': False,
|
'is_bold': False,
|
||||||
}
|
}
|
||||||
|
# Store word bounding boxes for pixel-accurate overlay
|
||||||
|
if words and text.strip():
|
||||||
|
cell['word_boxes'] = [
|
||||||
|
{
|
||||||
|
'text': w.get('text', ''),
|
||||||
|
'left': w['left'],
|
||||||
|
'top': w['top'],
|
||||||
|
'width': w['width'],
|
||||||
|
'height': w['height'],
|
||||||
|
'conf': w.get('conf', 0),
|
||||||
|
}
|
||||||
|
for w in words
|
||||||
|
if w.get('text', '').strip()
|
||||||
|
]
|
||||||
cells.append(cell)
|
cells.append(cell)
|
||||||
|
|
||||||
# --- Phase 2: Narrow columns via cell-crop OCR (parallel) ---
|
# --- Phase 2: Narrow columns via cell-crop OCR (parallel) ---
|
||||||
|
|||||||
Reference in New Issue
Block a user