fix(ocr-pipeline): skip edge-touching gaps in header/footer detection
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m50s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 16s

Gaps that extend to the image boundary (top/bottom edge) are not valid
content separators — they typically represent dewarp padding. Only gaps
with content on both sides qualify as header/footer boundaries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-02 17:54:49 +01:00
parent f1fcc67357
commit 0532b2a797
2 changed files with 18 additions and 0 deletions

View File

@@ -1092,6 +1092,17 @@ class TestHeaderFooterGapDetection:
assert header_y is None
assert footer_y is None
def test_edge_gaps_ignored_dewarp_padding(self):
"""Trailing gap at bottom edge (dewarp padding) should not be detected as footer."""
h, w = 2000, 800
# Body lines from 10 to 1700
bands = self._make_body_with_lines(h, w, 10, 1700)
# Gap from 1700 to 2000 = bottom edge padding (no content after)
inv = self._make_inv(h, w, bands)
header_y, footer_y = _detect_header_footer_gaps(inv, w, h)
# The trailing gap touches the image edge → not a valid separator
assert footer_y is None
class TestRegionContentCheck:
"""Tests for _region_has_content() and _add_header_footer() type selection."""