fix: sort word_boxes in reading order (Y-grouped, then X-sorted)

Words on the same visual line can have slightly different top values (1-6px). Sorting by (top, left) produced wrong word order in the frontend display. Now uses _group_words_into_lines to group by Y proximity first, then sort by X within each line. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 10:41:30 +01:00
parent 2b73d9beec
commit aae8a96aa2
1 changed files with 9 additions and 1 deletions
@@ -191,8 +191,16 @@ def _build_cells(
        # but the overlay slide mechanism expects one box per word. Split multi-word
        # boxes into individual word positions proportional to character length.
        # Also split at "[" boundaries (IPA patterns like "badge[bxd3]").
        #
        # Sort in reading order: group by Y (same visual line), then sort by X.
        # Simple (top, left) sort fails when words on the same line have slightly
        # different top values (1-6px), causing wrong word order.
        y_tol_wb = max(10, int(bh * 0.4))
        reading_lines = _group_words_into_lines(cell_words, y_tolerance_px=y_tol_wb)
        ordered_cell_words = [w for line in reading_lines for w in line]
        word_boxes = []
-        for w in sorted(cell_words, key=lambda ww: (ww['top'], ww['left'])):
+        for w in ordered_cell_words:
            raw_text = w.get('text', '').strip()
            # Split by whitespace, at "[" boundaries (IPA), and after leading "!"
            # e.g. "badge[bxd3]" → ["badge", "[bxd3]"]