fix: Edge-Gaps in _split_broad_columns ignorieren + return-Tuple bei leerem Ergebnis
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m57s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 16s

Gaps die den Spaltenrand beruehren (Margins) werden jetzt ausgeschlossen,
nur interne Gaps werden als Split-Kandidaten betrachtet. Behebt das
Problem dass trailing whitespace faelschlich als groesster Gap gewaehlt
wurde. Early-return in _run_ocr_pipeline_for_page gibt jetzt korrekt
([], rotation) statt [] zurueck.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-07 22:16:29 +01:00
parent 4e8ea77140
commit e1ae5d5fa9
2 changed files with 14 additions and 23 deletions

View File

@@ -1510,7 +1510,7 @@ async def _run_ocr_pipeline_for_page(
if not is_vocab:
logger.warning(f" Page {page_number + 1}: layout is not vocab table "
f"(types: {col_types}), returning empty")
return []
return [], rotation
# 8. Map cells → vocab entries
entries = _cells_to_vocab_entries(cells, columns_meta)