Fix colspan: use original words before split_cross_column_words
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m33s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 35s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m33s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 35s
_split_cross_column_words was destroying the colspan information by cutting word-blocks at column boundaries BEFORE _detect_colspan_cells could analyze them. Now passes original (pre-split) words to colspan detection while using split words for cell building. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1424,6 +1424,8 @@ def _build_zone_grid(
|
|||||||
# Split word boxes that straddle column boundaries (e.g. "sichzie"
|
# Split word boxes that straddle column boundaries (e.g. "sichzie"
|
||||||
# spanning Col 1 + Col 2). Must happen after column detection and
|
# spanning Col 1 + Col 2). Must happen after column detection and
|
||||||
# before cell assignment.
|
# before cell assignment.
|
||||||
|
# Keep original words for colspan detection (split destroys span info).
|
||||||
|
original_zone_words = zone_words
|
||||||
if len(columns) >= 2:
|
if len(columns) >= 2:
|
||||||
zone_words = _split_cross_column_words(zone_words, columns)
|
zone_words = _split_cross_column_words(zone_words, columns)
|
||||||
|
|
||||||
@@ -1431,11 +1433,11 @@ def _build_zone_grid(
|
|||||||
cells = _build_cells(zone_words, columns, rows, img_w, img_h)
|
cells = _build_cells(zone_words, columns, rows, img_w, img_h)
|
||||||
|
|
||||||
# --- Detect colspan (merged cells spanning multiple columns) ---
|
# --- Detect colspan (merged cells spanning multiple columns) ---
|
||||||
# A word-block that extends across column boundaries indicates a merged
|
# Uses the ORIGINAL (pre-split) words to detect word-blocks that span
|
||||||
# cell (like Excel cell-merge). Detect these and replace the split
|
# multiple columns. _split_cross_column_words would have destroyed
|
||||||
# cells with a single spanning cell.
|
# this information by cutting words at column boundaries.
|
||||||
if len(columns) >= 2:
|
if len(columns) >= 2:
|
||||||
cells = _detect_colspan_cells(zone_words, columns, rows, cells, img_w, img_h)
|
cells = _detect_colspan_cells(original_zone_words, columns, rows, cells, img_w, img_h)
|
||||||
|
|
||||||
# Prefix cell IDs with zone index
|
# Prefix cell IDs with zone index
|
||||||
for cell in cells:
|
for cell in cells:
|
||||||
|
|||||||
Reference in New Issue
Block a user