When PaddleOCR returns "!Betonung" as a single word box, the overlay
positions text starting at the "!" instead of the actual word. Split
such boxes into ["!", "Betonung"] with proportional position splitting,
matching the existing IPA bracket splitting logic.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PaddleOCR returns "badge[bxd3]" without space, but the IPA fixer
produces "badge [bˈædʒ]" with space, creating a token count mismatch
between cell.text and word_boxes. Now also split at "[" boundaries
so each IPA bracket gets its own sub-box.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PaddleOCR returns phrase-level bounding boxes (e.g. "competition
[kompa'tifn]" as one box) but the overlay slide mechanism expects
one box per word for accurate positioning. Multi-word boxes are now
split proportionally by character count with small gaps between words.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
words_first was storing word_boxes in percent coordinates while
cv_cell_grid.py uses absolute pixel coordinates. The overlay slide
mechanism divides by imgW to get percentages, so percent-in-percent
caused positions near zero. Now both grid builders use the same format.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>