Benjamin Admin 46c8c28d34 fix: border strip pre-filter + 3-column detection for vocabulary tables
The border strip filter (Step 4e) used the LARGEST x-gap which incorrectly
removed base words along with edge artifacts. Now uses a two-stage approach:
1. _filter_border_strip_words() pre-filters raw words BEFORE column detection,
   scanning from the page edge inward to find the FIRST significant gap (>30px)
2. Step 4e runs as fallback only when pre-filter didn't apply

Session 4233 now correctly detects 3 columns (base word | oder | synonyms)
instead of 2. Threshold raised from 15% to 20% to handle pages with many
edge artifacts. All 4 ground-truth sessions pass regression.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 21:01:43 +01:00
Description
No description provided
42 MiB
Languages
TypeScript 60.2%
Python 32.9%
Go 5.5%
C# 0.8%
CSS 0.2%
Other 0.3%