StepAnsicht: fix row filtering for partial-width boxes
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 45s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m34s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 36s

Content rows were incorrectly filtered out when their Y overlapped
with a box, even if the box only covered the right half of the page.
Now checks both Y AND X overlap — rows are only excluded if they
start within the box's horizontal range.

Fixes: rows next to Box 2 (lend, coconut, taste) were missing from
reconstruction because Box 2 (x=871, w=525) only covers the right
side, but left-side content rows at x≈148 were being filtered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-04-14 17:00:28 +02:00
parent dcb873db35
commit 1b7e095176

View File

@@ -110,8 +110,19 @@ export function StepAnsicht({ sessionId, onNext }: StepAnsichtProps) {
boxIdx++
}
// Skip rows that fall inside a box boundary
const insideBox = boxBounds.some((bb) => ry >= bb.yStart && ry <= bb.yEnd)
// Skip rows only if they fall FULLY inside a box (both Y and X overlap).
// Small boxes (e.g. on the right half) don't cover left-side content rows.
const rowCells = contentZone!.cells.filter((c) => c.row_index === row.index)
const rowXMin = rowCells.length > 0
? Math.min(...rowCells.map((c) => c.bbox_px?.x ?? contentZone!.bbox_px.x))
: contentZone!.bbox_px.x
const insideBox = boxBounds.some((bb) => {
if (ry < bb.yStart || ry > bb.yEnd) return false
// Check horizontal overlap: row must be mostly inside box x-range
const boxXMin = bb.zone.bbox_px.x
const boxXMax = boxXMin + bb.zone.bbox_px.w
return rowXMin >= boxXMin - 20 && rowXMin <= boxXMax
})
if (!insideBox) {
currentRows.push(row)
}