fix: Zeilen-Regularisierung im Overlay ueberspringen (generisch fuer gemischte Inhalte)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 49s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m21s
CI / test-python-agent-core (push) Successful in 20s
CI / test-nodejs-website (push) Successful in 26s

Seiten mit Info-Boxen (andere Zeilenhoehe) fuehren dazu, dass _regularize_row_grid
die Zeilenpositionen verzerrt. Neuer skip_regularize Parameter nutzt stattdessen
die gap-basierten Zeilen, die der tatsaechlichen Seitengeometrie folgen.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-11 08:29:06 +01:00
parent 2df2a01a8b
commit b91f799ccf
4 changed files with 23 additions and 9 deletions

View File

@@ -1577,7 +1577,7 @@ async def _get_columns_overlay(session_id: str) -> Response:
# ---------------------------------------------------------------------------
@router.post("/sessions/{session_id}/rows")
async def detect_rows(session_id: str):
async def detect_rows(session_id: str, skip_regularize: bool = False):
"""Run row detection on the cropped (or dewarped) image using horizontal gap analysis."""
if session_id not in _cache:
await _load_session_to_cache(session_id)
@@ -1686,6 +1686,7 @@ async def detect_rows(session_id: str):
combined_h = combined_inv.shape[0]
rows = detect_row_geometry(
combined_inv, combined_words, left_x, right_x, 0, combined_h,
skip_regularize=skip_regularize,
)
# Remap y-coordinates back to absolute page coords
@@ -1702,10 +1703,12 @@ async def detect_rows(session_id: str):
r.y = abs_y
r.height = abs_y_end - abs_y
else:
rows = detect_row_geometry(inv, word_dicts, left_x, right_x, top_y, bottom_y)
rows = detect_row_geometry(inv, word_dicts, left_x, right_x, top_y, bottom_y,
skip_regularize=skip_regularize)
else:
# No boxes — standard row detection
rows = detect_row_geometry(inv, word_dicts, left_x, right_x, top_y, bottom_y)
rows = detect_row_geometry(inv, word_dicts, left_x, right_x, top_y, bottom_y,
skip_regularize=skip_regularize)
duration = time.time() - t0