fix: border ghost filter + row overlap fix for box zones
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m53s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 17s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m53s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 17s
1. Add _filter_border_ghosts() to grid editor - removes OCR artefacts like | sitting on box borders before row/column clustering. The tall | (h=55) was inflating row 0's y_max, causing row overlap. 2. Fix _assign_word_to_row() to prefer closest y_center when rows overlap, instead of always returning the first matching row. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -134,12 +134,16 @@ def _assign_word_to_column(word: Dict, columns: List[Dict]) -> int:
|
||||
|
||||
|
||||
def _assign_word_to_row(word: Dict, rows: List[Dict]) -> int:
|
||||
"""Return row index for a word based on its Y-center."""
|
||||
"""Return row index for a word based on its Y-center.
|
||||
|
||||
When rows overlap (e.g. due to tall border-ghost characters inflating
|
||||
a row's y_max), prefer the row whose y_center is closest.
|
||||
"""
|
||||
y_center = word['top'] + word['height'] / 2
|
||||
# Find the row whose y_range contains this word's center
|
||||
for row in rows:
|
||||
if row['y_min'] <= y_center <= row['y_max']:
|
||||
return row['index']
|
||||
# Find all rows whose y_range contains this word's center
|
||||
matching = [r for r in rows if r['y_min'] <= y_center <= r['y_max']]
|
||||
if matching:
|
||||
return min(matching, key=lambda r: abs(r['y_center'] - y_center))['index']
|
||||
# Fallback: nearest row by Y-center
|
||||
return min(rows, key=lambda r: abs(r['y_center'] - y_center))['index']
|
||||
|
||||
|
||||
Reference in New Issue
Block a user