Fix colspan: use original words before split_cross_column_words
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m33s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 35s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m33s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 35s
_split_cross_column_words was destroying the colspan information by cutting word-blocks at column boundaries BEFORE _detect_colspan_cells could analyze them. Now passes original (pre-split) words to colspan detection while using split words for cell building. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1424,6 +1424,8 @@ def _build_zone_grid(
|
||||
# Split word boxes that straddle column boundaries (e.g. "sichzie"
|
||||
# spanning Col 1 + Col 2). Must happen after column detection and
|
||||
# before cell assignment.
|
||||
# Keep original words for colspan detection (split destroys span info).
|
||||
original_zone_words = zone_words
|
||||
if len(columns) >= 2:
|
||||
zone_words = _split_cross_column_words(zone_words, columns)
|
||||
|
||||
@@ -1431,11 +1433,11 @@ def _build_zone_grid(
|
||||
cells = _build_cells(zone_words, columns, rows, img_w, img_h)
|
||||
|
||||
# --- Detect colspan (merged cells spanning multiple columns) ---
|
||||
# A word-block that extends across column boundaries indicates a merged
|
||||
# cell (like Excel cell-merge). Detect these and replace the split
|
||||
# cells with a single spanning cell.
|
||||
# Uses the ORIGINAL (pre-split) words to detect word-blocks that span
|
||||
# multiple columns. _split_cross_column_words would have destroyed
|
||||
# this information by cutting words at column boundaries.
|
||||
if len(columns) >= 2:
|
||||
cells = _detect_colspan_cells(zone_words, columns, rows, cells, img_w, img_h)
|
||||
cells = _detect_colspan_cells(original_zone_words, columns, rows, cells, img_w, img_h)
|
||||
|
||||
# Prefix cell IDs with zone index
|
||||
for cell in cells:
|
||||
|
||||
Reference in New Issue
Block a user