Address 5 weaknesses found via ground-truth comparison on session df3548d1: - Add column_ignore for edge columns with < 3 words (margin detection) - Absorb tiny clusters (< 5% width) into neighbors post-merge - Restrict page_ref to left 35% of content area across all 3 levels - Loosen marker thresholds (width < 6%, words <= 15) and add strong marker score for very narrow non-edge columns (< 4%) - Add EN/DE position tiebreaker when language signals are both weak Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
79 KiB
79 KiB