Each quality improvement step can now be toggled independently:
- CLAHE checkbox (Step 3: image enhancement on/off)
- MaxCols dropdown (Step 2: 0=unlimited, 2-5)
- MinConf dropdown (Step 1: auto/20/30/40/50/60)
Backend: Query params enhance, max_cols, min_conf on process-single-page.
Response includes active_steps dict showing which steps are enabled.
Frontend: Toggle controls in VocabularyTab above the table.
This allows empirical A/B testing of each step on the same scan.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Step 1: scan_quality.py — Laplacian blur + contrast scoring, adjusts
OCR confidence threshold (40 for good scans, 30 for degraded).
Quality report included in API response + shown in frontend.
Step 2: max_columns parameter in cv_words_first.py — limits column
detection to 3 for vocab tables, preventing phantom columns D/E
from degraded OCR fragments.
Step 3: ocr_image_enhance.py — CLAHE contrast + bilateral filter
denoising + unsharp mask, only for degraded scans (gated by
quality score). Pattern from handwriting_htr_api.py.
Frontend: quality info shown in extraction status after processing.
Reprocess button now derives pages from vocabulary data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs fixed:
1. reprocessPages() failed silently after session resume because
successfulPages was empty. Now derives pages from vocabulary
source_page or selectedPages as fallback.
2. process-single-page endpoint built vocabulary entries WITHOUT
applying merge logic (_merge_wrapped_rows, _merge_continuation_rows).
Now applies full merge pipeline after vocabulary extraction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>