Fundamentally rearchitect build_cell_grid_v2 to combine the best of both approaches: - Broad columns (>15% image width): Use full-page Tesseract word assignment. Handles IPA brackets, punctuation, sentence flow, and ellipsis correctly. No garbled phonetics. - Narrow columns (<15% image width): Use isolated cell-crop OCR to prevent neighbour bleeding from adjacent broad columns. This eliminates the need for complex phonetic bracket replacement on broad columns since full-page Tesseract reads them correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
276 KiB
276 KiB