606bef059114b37f8473b266f38c2afc4398662c
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 1m14s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m55s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 17s
1. Word-to-column assignment now uses overlap-based matching instead of center-point matching. This fixes narrow page_ref columns losing their last digit (e.g. "p.59" → "p.5") when the digit's center falls slightly past the midpoint boundary into the next column. 2. Post-OCR empty row filter: rows where ALL cells have empty text are removed after OCR. This catches inter-row gaps that had stray Tesseract artifacts giving word_count > 0 but no actual content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Description
No description provided
Languages
TypeScript
60.2%
Python
32.9%
Go
5.5%
C#
0.8%
CSS
0.2%
Other
0.3%