Benjamin Admin 962bbbe9f6 Remove scattered debris rows and disable spanning header detection
- Add Rule 3 to junk-row filter: rows where no word is longer than
  2 chars are removed as scattered OCR debris from illustrations
- Fully disable spanning-header detection which falsely flagged IPA
  transcriptions and vocabulary entries as spanning headers
- First-row heuristic remains for genuine header detection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 10:47:17 +01:00
S
Description
No description provided
15 MiB
Languages
TypeScript 57.9%
Python 33.8%
Go 6.8%
C# 0.8%
Shell 0.2%
Other 0.3%