Benjamin Admin 882b177fc3
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 1m55s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 16s
fix: remove image-area artifacts + fix heading false positive for dictionary entries
Three fixes for dictionary page session 5997:

1. Heading detection: column_1 cells with article words (die/der/das)
   now count as content cells, preventing "die Zuschrift, die Zuschriften"
   from being falsely merged into a spanning heading cell.

2. Step 5j-pre: new artifact cell filter removes short garbled text from
   OCR on image areas (e.g. "7 EN", "Tr", "\\", "PEE", "a="). Cells
   survive earlier filters because their rows have real content in other
   columns. Also cleans up empty rows after removal.

3. Footer "PEE" auto-fixed: artifact filter removes the noise cell,
   empty row gets cleaned up, footer detection no longer sees it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 07:59:24 +01:00
Description
No description provided
42 MiB
Languages
TypeScript 60.2%
Python 32.9%
Go 5.5%
C# 0.8%
CSS 0.2%
Other 0.3%