Scanner shadow detection (range > 40, darkest < 180) fails on camera
book scans where the gutter shadow is subtle (range ~25, darkest ~214).
New _detect_gutter_continuity() detects gutters by their unique property:
the shadow runs continuously from top to bottom without interruption.
Divides the image into horizontal strips and checks what fraction of
strips are darker than the page median at each column. A gutter column
has >= 75% of strips darker. The transition point where the smoothed
dark fraction drops below 50% marks the crop boundary.
Integrated as fallback between scanner shadow and binary projection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. page_crop: Score all dark runs by center-proximity × darkness ×
narrowness instead of picking the widest. Fixes ad810209 where a
wide dark area at 35% was chosen over the actual spine at 50%.
2. cv_words_first: Replace x-center-only word→column assignment with
overlap-based three-pass strategy (overlap → midpoint-range → nearest).
Fixes truncated German translations like "Schal" instead of
"Schal - die Schals" in session 079cd0d9.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The _detect_spine_shadow function was triggering on normal text content
because shadow_range > 20 was too low and convolution edge artifacts
created artificially low values. Now requires: range > 40, darkest < 180,
narrow valley (not text plateau), and brightness rise toward page content.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spalten-, Zeilen-, Woerter-Overlay und alle nachfolgenden Steps
(LLM-Review, Rekonstruktion) lesen jetzt image/cropped mit Fallback
auf image/dewarped. Tests fuer page_crop.py hinzugefuegt (25 Tests).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>