fix(sub-columns): convert relative word positions to absolute coords for split
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m51s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 17s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m51s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 17s
Word 'left' values in ColumnGeometry.words are relative to the content ROI (left_x), but geo.x is in absolute image coordinates. The split position was computed from relative word positions and then compared against absolute geo.x, resulting in negative widths and no splits on real data. Pass left_x through to _detect_sub_columns to bridge the two coordinate systems. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1307,6 +1307,29 @@ class TestSubColumnDetection:
|
||||
|
||||
assert len(result) == 1
|
||||
|
||||
def test_sub_column_split_with_left_x_offset(self):
|
||||
"""Word 'left' values are relative to left_x; geo.x is absolute.
|
||||
|
||||
Real-world scenario: left_x=195, EN column at geo.x=310.
|
||||
Page refs at relative left=115-157, vocab words at relative left=216.
|
||||
Without left_x, split_x would be ~202 (< geo.x=310) → negative width → no split.
|
||||
With left_x=195, split_abs = 202 + 195 = 397, which is between geo.x(310)
|
||||
and geo.x+geo.width(748) → valid split.
|
||||
"""
|
||||
content_w = 1469
|
||||
left_x = 195
|
||||
page_refs = [self._make_word(115, "p.59"), self._make_word(157, "p.60"),
|
||||
self._make_word(157, "p.61")]
|
||||
vocab = [self._make_word(216, f"word{i}") for i in range(40)]
|
||||
all_words = page_refs + vocab
|
||||
geo = self._make_geo(x=310, width=438, words=all_words, content_w=content_w)
|
||||
|
||||
result = _detect_sub_columns([geo], content_w, left_x=left_x)
|
||||
|
||||
assert len(result) == 2, f"Expected 2 columns, got {len(result)}"
|
||||
assert result[0].word_count == 3
|
||||
assert result[1].word_count == 40
|
||||
|
||||
|
||||
class TestCellsToVocabEntriesPageRef:
|
||||
"""Test that page_ref cells are mapped to source_page field."""
|
||||
|
||||
Reference in New Issue
Block a user