fix: sort word_boxes in reading order (Y-grouped, then X-sorted)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m0s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 21s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m0s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 21s
Words on the same visual line can have slightly different top values (1-6px). Sorting by (top, left) produced wrong word order in the frontend display. Now uses _group_words_into_lines to group by Y proximity first, then sort by X within each line. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -191,8 +191,16 @@ def _build_cells(
|
|||||||
# but the overlay slide mechanism expects one box per word. Split multi-word
|
# but the overlay slide mechanism expects one box per word. Split multi-word
|
||||||
# boxes into individual word positions proportional to character length.
|
# boxes into individual word positions proportional to character length.
|
||||||
# Also split at "[" boundaries (IPA patterns like "badge[bxd3]").
|
# Also split at "[" boundaries (IPA patterns like "badge[bxd3]").
|
||||||
|
#
|
||||||
|
# Sort in reading order: group by Y (same visual line), then sort by X.
|
||||||
|
# Simple (top, left) sort fails when words on the same line have slightly
|
||||||
|
# different top values (1-6px), causing wrong word order.
|
||||||
|
y_tol_wb = max(10, int(bh * 0.4))
|
||||||
|
reading_lines = _group_words_into_lines(cell_words, y_tolerance_px=y_tol_wb)
|
||||||
|
ordered_cell_words = [w for line in reading_lines for w in line]
|
||||||
|
|
||||||
word_boxes = []
|
word_boxes = []
|
||||||
for w in sorted(cell_words, key=lambda ww: (ww['top'], ww['left'])):
|
for w in ordered_cell_words:
|
||||||
raw_text = w.get('text', '').strip()
|
raw_text = w.get('text', '').strip()
|
||||||
# Split by whitespace, at "[" boundaries (IPA), and after leading "!"
|
# Split by whitespace, at "[" boundaries (IPA), and after leading "!"
|
||||||
# e.g. "badge[bxd3]" → ["badge", "[bxd3]"]
|
# e.g. "badge[bxd3]" → ["badge", "[bxd3]"]
|
||||||
|
|||||||
Reference in New Issue
Block a user