fix: move column expansion AFTER sub-column split
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m54s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 18s

The narrow column expansion was running inside detect_column_geometry()
on the 4 main columns, but the narrowest columns (marker ~14px,
page_ref ~93px) are created AFTERWARDS by _detect_sub_columns().

Extracted expand_narrow_columns() as standalone function and call it
after sub-column splitting in the columns API endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-04 10:07:40 +01:00
parent e426de937c
commit 9dd77ab54a
2 changed files with 84 additions and 65 deletions

View File

@@ -51,6 +51,7 @@ from cv_vocab_pipeline import (
deskew_image_by_word_alignment,
detect_column_geometry,
detect_row_geometry,
expand_narrow_columns,
_apply_shear,
dewarp_image,
dewarp_image_manual,
@@ -802,6 +803,9 @@ async def detect_columns(session_id: str):
geometries = _detect_sub_columns(geometries, content_w, left_x=left_x,
top_y=top_y, header_y=header_y, footer_y=footer_y)
# Expand narrow columns (sub-columns are often very narrow)
geometries = expand_narrow_columns(geometries, content_w, left_x, word_dicts)
# Phase B: Content-based classification
regions = classify_column_types(geometries, content_w, top_y, w, h, bottom_y,
left_x=left_x, right_x=right_x, inv=inv)