Two new functions:
- _is_artifact_row(): marks rows as artifacts if all detected tokens
are single characters (scanner shadows produce dots/dashes, not words).
A real vocabulary row always contains at least one 2+ char word.
- _heal_row_gaps(): after removing empty/artifact rows, expands each
remaining content row to the midpoint of adjacent gaps, so OCR crops
are not artificially narrow. First row extends to content top_bound;
last row to content bottom_bound.
Applied in both build_cell_grid() and build_cell_grid_streaming() after
the word_count>0 filter and before OCR.
Addresses cases like:
- Row 21: scan shadow → single-char artifacts → filtered before OCR
- Row 23: completely empty (word_count=0) → already filtered
- Row 22: real content → now expanded upward/downward to fill the space
that rows 21 and 23 occupied, giving OCR the correct full height
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>