breakpilot-lehrer/klausur-service/backend/grid_editor_api.py at 46c8c28d3459c4a05e76f59ebf9cd2c518cbf999

Files

Benjamin Admin 46c8c28d34 fix: border strip pre-filter + 3-column detection for vocabulary tables

The border strip filter (Step 4e) used the LARGEST x-gap which incorrectly
removed base words along with edge artifacts. Now uses a two-stage approach:
1. _filter_border_strip_words() pre-filters raw words BEFORE column detection,
   scanning from the page edge inward to find the FIRST significant gap (>30px)
2. Step 4e runs as fallback only when pre-filter didn't apply

Session 4233 now correctly detects 3 columns (base word | oder | synonyms)
instead of 2. Threshold raised from 15% to 20% to handle pages with many
edge artifacts. All 4 ground-truth sessions pass regression.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-21 21:01:43 +01:00

105 KiB

Raw Blame History

View Raw

105 KiB Raw Blame History

105 KiB

Raw Blame History