Content zones (above/between/below boxes) now share the same column
structure: columns are detected once from ALL content-zone words, then
applied to each content zone. Box zones still detect columns independently.
This fixes the issue where narrow columns (page refs like p.55) were not
detected in small content zones above boxes, even though the same column
existed in the larger content zone below the box.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_build_cells() creates new word_box dicts, so color fields set before
grid building were lost. Now detect_word_colors() runs after cells
are built, on the final word_boxes. Recovery still runs before grid
building so recovered words participate in column/row detection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New cv_color_detect.py module:
- detect_word_colors(): annotates existing words with text color (HSV analysis)
- recover_colored_text(): finds colored text regions missed by standard OCR
(e.g. red ! markers) using HSV masks + contour detection
Integrated into build-grid: words get color/color_name fields, recovered
colored regions are merged into the word list before grid building.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only cluster left-edges of words that begin a new group within their row
(first word or preceded by a large gap). This filters out mid-phrase
word positions (IPA transcriptions, second words in multi-word entries)
that were causing too many false columns.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Column detection now clusters word left-edges by X-proximity and filters
by row coverage (Y-coverage), matching the proven approach from cv_layout.py
but using precise OCR word positions instead of ink-based estimates.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: new grid_editor_api.py with build-grid endpoint that detects
bordered boxes, splits page into zones, clusters columns/rows per zone
from Kombi word positions. New DB column grid_editor_result JSONB.
Frontend: GridEditor component with editable HTML tables per zone,
column bold toggle, header row toggle, undo/redo, keyboard navigation
(Tab/Enter/Arrow), image overlay verification, and save/load.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>