Our IPA system only has English dictionaries (Britfone MIT, eng_to_ipa
MIT). The "IPA: nur DE" option was useless at best and misleading.
Removed from dropdown, type definition, and API validation.
Syllable DE mode stays — pyphen has a German hyphenation dictionary.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The dropdowns only updated state but didn't trigger buildGrid().
Now a useEffect watches ipaMode/syllableMode and rebuilds
automatically (skipping the initial mount).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend ipa_mode and syllable_mode toggles with language options:
- auto: smart detection (default)
- en: only English headword column
- de: only German definition columns
- all: all content columns
- none: skip entirely
Also improve English column auto-detection: use garbled IPA patterns
(apostrophes, colons) in addition to bracket patterns. This correctly
identifies English dictionary pages where OCR produces garbled ASCII
instead of bracket IPA.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: Remove en_col_type fallback heuristic (longest avg text) that
incorrectly identified German columns as English. IPA now only applied
when OCR bracket patterns are actually found. Add ipa_mode (auto/all/none)
and syllable_mode (auto/all/none) query params to build-grid API.
Frontend: Add IPA and Silben dropdown selects to GridToolbar. Modes
are passed as query params on rebuild. Auto = current smart detection,
All = force for all words, Aus = skip entirely.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Page-split sessions (start_step=1) have no orientation_result stored.
StepOrientation now auto-runs orientation detection when loading an
existing session that lacks a result.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Page-split now creates independent sessions (no parent_session_id),
parent marked as status='split' and hidden from list. Navigation uses
useSearchParams for URL-based step tracking (browser back/forward works).
page.tsx reduced from 684 to 443 lines via usePipelineNavigation hook.
Box sub-sessions (column detection) remain unchanged.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removed overflow-y-auto and maxHeight from the grid container div.
The page itself handles scrolling — nested scroll containers caused
the bottom rows to be cut off after editing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Syllable dividers now require CV validation: morphological vertical
line detection checks if word_box image actually shows thin isolated
pipe lines before applying pyphen. Only first word per cell gets
pipes (matching dictionary print layout).
2. Grid editor scroll: changed maxHeight from 80vh to calc(100vh-200px)
so editor remains scrollable after edits.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The right panel (grid area) had no vertical overflow handling, causing
the last ~5 rows to be clipped and invisible. Added overflow-y-auto
with max-height 80vh, and removed overflow-hidden from the GridTable
wrapper that was cutting off content.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The reset useEffects in StepOrientation/Deskew/Dewarp/Crop were clearing
orientationResult when sessionId changed (e.g. during handleOrientationComplete),
causing the right side of ImageCompareView to show nothing. Using key={sessionId}
on the step components instead forces React to remount with fresh state when
switching sessions, without interfering with the upload/orientation flow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Step components (Deskew, Dewarp, Crop, Orientation) had local state
guards that prevented reloading when sessionId changed via sub-session
tab clicks. Added useEffect reset hooks that clear all local state
when sessionId changes, allowing the component to properly reload
the new session's data.
Also renamed "Box N" to "Seite N" in BoxSessionTabs per user feedback.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After orientation detection, the frontend now automatically calls the
page-split endpoint. When a double-page book spread is detected, two
sub-sessions are created and each goes through the full pipeline
(deskew/dewarp/crop) independently — essential because each page of a
spread tilts differently due to the spine.
Frontend changes:
- StepOrientation: calls POST /page-split after orientation, shows
split info ("Doppelseite erkannt"), notifies parent of sub-sessions
- page.tsx: distinguishes page-split sub-sessions (current_step < 5)
from crop-based sub-sessions (current_step >= 5). Page-split subs
only skip orientation, not deskew/dewarp/crop.
- page.tsx: handleOrientationComplete opens first sub-session when
page-split was detected
Backend changes (orientation_crop_api.py):
- page-split endpoint falls back to original image when orientation
rotated a landscape spread to portrait
- start_step parameter: 1 if split from original, 2 if from oriented
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Users can now right-click any cell to set text color (red, green, blue,
orange, purple, black) or remove the color bar without changing text.
A "reset" option restores the OCR-detected color. This enables accurate
Ground Truth marking when OCR assigns colors to wrong cells.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Color bar (red/colored indicator) now only shows when word_boxes
text still matches the cell text — editing the cell hides stale colors
- New "Auto-Korrektur" button: detects dominant prefix+number patterns
per column (e.g. p.70, p.71) and completes partial entries (.65 → p.65)
— requires 3+ matching entries before correcting
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resize handle: wider (9px), z-40 (above z-30 buttons).
Add-column button moved to bottom-right corner to avoid overlap.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New ImageLayoutEditor: SVG overlay on original scan with draggable
column dividers, horizontal guidelines (margins/header/footer),
double-click to add columns, x-button to delete
- GridTable: MIN_COL_WIDTH 40→80px for better readability
- Arrow up/down keys navigate between rows in the grid editor
- Ctrl+Click for multi-cell selection, Ctrl+B to toggle bold on selection
- getAdjacentCell works for cells that don't exist yet (new rows/cols)
- deleteColumn now merges x-boundaries correctly
- Session restore fix: grid_editor_result/structure_result in session GET
- Footer row 3-state cycle, auto-create cells for empty footer rows
- Grid save/build/GT-mark now advance current_step=11
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New document category "Woerterbuch" (frontend type + backend validation)
- Column delete: hover column header → red "x" button (with confirmation)
- Column add: hover column header → "+" button inserts after that column
- Both operations support undo/redo, update cell IDs and summary
- Available in both GridEditor and StepGridReview (Kombi last step)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New StepGridReview component: split-view (scan image left, grid right),
confidence stats, row-accept buttons, zoom controls
- Kombi Pipeline case 6 now uses StepGridReview instead of plain GridEditor
- Kombi step label changed to "Review & GT"
- Ground Truth queue page simplified to overview/navigation only
(links to Kombi pipeline for actual review work)
- Deep-link support: /ai/ocr-overlay?session=xxx&mode=kombi
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Changed `typeof grid.zones[][]` to `GridZone[][]` which was causing
a silent build error, preventing the vsplit zone grouping logic from
being compiled into the production bundle.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pages with two side-by-side vocabulary columns separated by a vertical
black line are now split into independent sub-zones before row/column
detection. Each sub-zone gets its own rows, preventing misalignment from
different heading rhythms.
- _detect_vertical_dividers(): finds pipe word_boxes at consistent x
positions spanning >50% of zone height
- _split_zone_at_vertical_dividers(): creates left/right PageZone objects
with layout_hint and vsplit_group metadata
- Column union skips vsplit zones (independent column sets)
- Frontend renders vsplit zones side by side via flex layout
- PageZone gets layout_hint + vsplit_group fields
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Post-processing steps like 5h (slash-IPA conversion) modify cell.text
but not individual word_boxes. The colored per-word display showed
stale word_box text instead of the corrected cell text. Now falls
back to the plain input when texts don't match.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Footer rows (e.g. page numbers) now show "F" in amber below the row
number, mirroring the blue "H" label for headers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Users can now draw rectangles on the document image in the Structure
Detection step to mark areas (e.g. header graphics, alphabet strips)
that should be excluded from OCR results during grid building.
- Backend: PUT/DELETE endpoints for exclude regions stored in structure_result
- Backend: _build_grid_core() filters all words inside user-defined exclude regions
- Frontend: Interactive rectangle drawing with visual overlay and delete buttons
- Preserve exclude regions when re-running structure detection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Ground Truth button on last step of Pipeline/Kombi modes in ocr-overlay
- Prominent category picker in active session info bar (pulses when unset)
- GT badge shown when session has ground truth reference
- Backend: auto-detect pipeline from ocr_engine, store in GT snapshot
- Pipeline info shown in GT session list and regression reports
- Also pass pipeline param from ocr-pipeline StepGroundTruth
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract _build_grid_core() from build_grid() endpoint for reuse.
New ocr_pipeline_regression.py with endpoints to mark sessions as
ground truth, list them, and run regression comparisons after code
changes. Frontend button in StepGroundTruth.tsx to mark/update GT.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The frontend was checking for an existing structure_result and reusing
it, which meant the backend fix (passing word_boxes to graphic detection)
never had a chance to run on existing sessions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When union columns from multiple content zones are applied, column
boundaries can span wider than any single zone's bbox. Using
zone.bbox_px.w as the scale reference caused the total scaled width
to exceed the container, pushing the table off-screen.
Now uses the actual total column width sum as the scale reference,
guaranteeing columns always fit within the container.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: merge gaps within 5% of image width — the spine area may have
thin ink strips splitting one physical gap into multiple detected gaps.
Only use gaps >= 2% width as split points.
Frontend: StepCrop now handles multi_page crop responses without
crashing on missing original_size/cropped_size fields.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: add layout_metrics (avg_row_height_px, font_size_suggestion_px)
to build-grid response for faithful grid reconstruction.
Frontend: rewrite GridTable from HTML <table> to CSS Grid layout.
Column widths are now proportional to the OCR-measured x_min/x_max
positions. Row heights use the average content row height from the
scan. Column and row resize via drag handles (Excel-like).
Font: add Noto Sans (supports IPA characters) via next/font/google.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a cell has colored words (red !, blue phonetics), render each
word as a separate span with its own color instead of coloring the
entire input text with the first non-black color found.
Switches to editable input on cell selection (click).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add cv_graphic_detect.py for detecting non-text visual elements (arrows,
circles, lines, exclamation marks, icons, illustrations). Draw detected
graphics on structure overlay image and display them in the frontend
StepStructureDetection component with shape counts and individual listings.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New pipeline step between Crop and Columns that visualizes detected
document structure: boxes (line-based + shading), page zones, and
color regions. Shows original image on the left, annotated overlay
on the right.
Backend: POST /detect-structure endpoint + /image/structure-overlay
Frontend: StepStructureDetection component with zone/box/color details
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add color/color_name/recovered fields to OcrWordBox type
- GridTable: show colored text + left-edge color indicator strip
- GridEditor: show color stats and recovered count in summary bar
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: new grid_editor_api.py with build-grid endpoint that detects
bordered boxes, splits page into zones, clusters columns/rows per zone
from Kombi word positions. New DB column grid_editor_result JSONB.
Frontend: GridEditor component with editable HTML tables per zone,
column bold toggle, header row toggle, undo/redo, keyboard navigation
(Tab/Enter/Arrow), image overlay verification, and save/load.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Since ocr_region_paddle() now runs RapidOCR locally (same PP-OCRv5 models),
the "PaddleOCR (Hetzner)" labels were misleading. Renamed to "PP-OCRv5 (lokal)".
Removed the Kombi-Vergleich tab since both sides would produce identical results.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add /rapid-kombi backend endpoint using local RapidOCR + Tesseract merge,
KombiCompareStep component for parallel execution and side-by-side overlay,
and wordResultOverride prop on OverlayReconstruction for direct data injection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: Add spatial overlap check (>=50% horizontal IoU) to Kombi merge
so words at the same position are deduplicated even when OCR text differs.
Frontend: Add yPct/hPct to WordPosition so each word renders at its actual
vertical position instead of all words collapsing to the cell center Y.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs both OCR engines on the preprocessed image and merges results:
word boxes matched by IoU, coordinates averaged by confidence weight.
Unmatched Tesseract words (bullets, symbols) are added for better coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The slide positioning hook was re-matching cell.text tokens against
word_boxes via fuzzy text similarity, which broke positioning for
special characters (!, bullet points, IPA). Now uses word_box
coordinates directly — exact OCR positions without re-interpretation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New 2-step mode (Upload → PaddleOCR+Overlay) alongside the existing
7-step pipeline. Backend endpoint runs PaddleOCR on the original image
and clusters words into rows/cells directly. Frontend adds a mode
toggle and PaddleDirectStep component.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace sequential 1:1 token-to-box mapping with fuzzy text matching.
Each token from cell.text finds its best matching word_box by text
similarity (normalized prefix match + substring bonus). Handles:
- Reordered boxes (different sort between text and boxes)
- IPA corrections changing token boundaries
- Token/box count mismatches
Unmatched tokens get interpolated positions from matched neighbors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When engine=paddle is selected, the backend overrides grid_method to
words_first and returns plain JSON (no SSE streaming). The frontend
was not aware of this override — it sent stream=true and tried to parse
SSE events from a JSON response, resulting in "Keine Daten".
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PaddleOCR als neue engine=paddle Option in der OCR-Pipeline.
Microservice auf Hetzner (paddleocr-service/), async HTTP-Client
(paddleocr_remote.py), Frontend-Dropdown, automatisch words_first.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TEXT kommt aus cell.text (bereinigt, IPA-korrigiert).
POSITIONEN kommen aus word_boxes (exakte OCR-Koordinaten).
Tokens werden 1:1 in Leserichtung zugeordnet.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: _ocr_cell_crop speichert jetzt word_boxes mit exakten
Tesseract/RapidOCR Wort-Koordinaten (left, top, width, height)
im Cell-Ergebnis. Absolute Bildkoordinaten, bereits zurueckgemappt.
Frontend: Slide-Hook nutzt word_boxes direkt wenn vorhanden —
jedes Wort wird exakt an seiner OCR-Position platziert. Kein
Pixel-Scanning noetig. Fallback auf alten Slide wenn keine Boxes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>