Preserve alphabetic marker columns, broaden junk filter, enable IPA in grid

- _merge_inline_marker_columns: skip merge when ≥50% of words are
  alphabetic (preserves "to", "in", "der" columns)
- Rule 2 (oversized stub): widen to ≤3 words / ≤5 chars (catches "SEA &")
- IPA phonetics: map longest-avg-text column to column_en so
  fix_cell_phonetics runs in the grid editor
- ocr_pipeline_overlays: add missing split_page_into_zones import

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-18 11:08:23 +01:00
parent 962bbbe9f6
commit f139d0903e
2 changed files with 67 additions and 16 deletions

View File

@@ -25,7 +25,7 @@ from ocr_pipeline_common import (
)
from ocr_pipeline_session_store import get_session_db, get_session_image
from cv_color_detect import _COLOR_HEX, _COLOR_RANGES
from cv_box_detect import detect_boxes
from cv_box_detect import detect_boxes, split_page_into_zones
from ocr_pipeline_rows import _draw_box_exclusion_overlay
logger = logging.getLogger(__name__)