Benjamin Admin
9da45c2a59
Fix false header detection and add decorative margin/footer filters
- Remove all_colored spanning header heuristic that falsely flagged
colored vocabulary entries (Scotland, secondary school) as headers
- Add _filter_decorative_margin: removes vertical A-Z alphabet strips
along page margins (single-char words in a compact vertical strip)
- Add _filter_footer_words: removes page numbers in bottom 5% of page
- Tighten spanning header rule: require ≥3 columns spanned + ≤3 words
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>