StepUpload now has 3 phases:
1. File selection: drop zone / file picker → shows preview
2. Review: title input, category, file info → "Hochladen" button
3. Uploaded: shows session image → "Weiter" button
No more auto-advance after upload. User controls every step.
openSession() removed from onUploaded callback to prevent
step-reset race condition.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
openSession mapped dbStep=1 to uiStep=0 (upload), overriding handleNext's
advancement to step 1. Fix: sessions always exist post-upload, so always
skip past the upload step in openSession.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 1 of the clean architecture refactor: Replaces the 751-line ocr-overlay
monolith with a modular pipeline. Each step gets its own component file.
Frontend: /ai/ocr-kombi route with 11 steps (Upload, Orientation, PageSplit,
Deskew, Dewarp, ContentCrop, OCR, Structure, GridBuild, GridReview, GroundTruth).
Session list supports document grouping for multi-page uploads.
Backend: New ocr_kombi/ module with multi-page PDF upload (splits PDF into N
sessions with shared document_group_id). DB migration adds document_group_id
and page_number columns.
Old /ai/ocr-overlay remains fully functional for A/B testing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The page_number was only shown in GridEditor.tsx (ocr-overlay) but
the OCR pipeline uses StepGridReview.tsx which has its own summary bar.
Display the extracted page number (e.g. "S. 233") next to the
dictionary detection badge.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_filter_footer_words now returns page number info (text, y_pct, number)
instead of just removing footer words. The page number is included in
the grid result as `page_number` and displayed in the frontend summary
bar as "S. 233".
This preserves page numbers for later page concatenation in the
customer frontend while still removing them from the grid content.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hybrid approach mirroring English IPA:
- Primary: wiki-pronunciation-dict (636k entries, CC-BY-SA, Wiktionary)
- Fallback: epitran rule-based G2P (MIT license)
IPA modes now use language-appropriate dictionaries:
- auto/en: English IPA (Britfone + eng_to_ipa)
- de: German IPA (wiki-pronunciation-dict + epitran)
- all: EN column gets English IPA, other columns get German IPA
- none: disabled
Frontend shows CC-BY-SA attribution when German IPA is active.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Our IPA system only has English dictionaries (Britfone MIT, eng_to_ipa
MIT). The "IPA: nur DE" option was useless at best and misleading.
Removed from dropdown, type definition, and API validation.
Syllable DE mode stays — pyphen has a German hyphenation dictionary.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The dropdowns only updated state but didn't trigger buildGrid().
Now a useEffect watches ipaMode/syllableMode and rebuilds
automatically (skipping the initial mount).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend ipa_mode and syllable_mode toggles with language options:
- auto: smart detection (default)
- en: only English headword column
- de: only German definition columns
- all: all content columns
- none: skip entirely
Also improve English column auto-detection: use garbled IPA patterns
(apostrophes, colons) in addition to bracket patterns. This correctly
identifies English dictionary pages where OCR produces garbled ASCII
instead of bracket IPA.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: Remove en_col_type fallback heuristic (longest avg text) that
incorrectly identified German columns as English. IPA now only applied
when OCR bracket patterns are actually found. Add ipa_mode (auto/all/none)
and syllable_mode (auto/all/none) query params to build-grid API.
Frontend: Add IPA and Silben dropdown selects to GridToolbar. Modes
are passed as query params on rebuild. Auto = current smart detection,
All = force for all words, Aus = skip entirely.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Page-split sessions (start_step=1) have no orientation_result stored.
StepOrientation now auto-runs orientation detection when loading an
existing session that lacks a result.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Page-split now creates independent sessions (no parent_session_id),
parent marked as status='split' and hidden from list. Navigation uses
useSearchParams for URL-based step tracking (browser back/forward works).
page.tsx reduced from 684 to 443 lines via usePipelineNavigation hook.
Box sub-sessions (column detection) remain unchanged.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of returning to parent (which creates a redirect loop), the
handleNext function now finds the next incomplete sub-session and opens
it directly. When all sub-sessions are done, returns to session list.
Also fixes openSession auto-redirect to prefer the first incomplete
sub-session over the most advanced one.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When reopening a parent session that has page-split sub-sessions,
the UI was showing the parent's pipeline step (always step 1/Orientation)
instead of navigating to the sub-sessions. Now automatically opens the
most advanced sub-session, matching the behavior of handleOrientationComplete.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removed overflow-y-auto and maxHeight from the grid container div.
The page itself handles scrolling — nested scroll containers caused
the bottom rows to be cut off after editing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Syllable dividers now require CV validation: morphological vertical
line detection checks if word_box image actually shows thin isolated
pipe lines before applying pyphen. Only first word per cell gets
pipes (matching dictionary print layout).
2. Grid editor scroll: changed maxHeight from 80vh to calc(100vh-200px)
so editor remains scrollable after edits.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The right panel (grid area) had no vertical overflow handling, causing
the last ~5 rows to be clipped and invisible. Added overflow-y-auto
with max-height 80vh, and removed overflow-hidden from the GridTable
wrapper that was cutting off content.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Page-split sub-sessions (current_step=2) had orientation marked as skipped
but uiStep remained at 0 (orientation step), causing StepOrientation to
render for a sub-session that has no orientation data. Now advances to
uiStep=1 (deskew) when orientation is skipped.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The reset useEffects in StepOrientation/Deskew/Dewarp/Crop were clearing
orientationResult when sessionId changed (e.g. during handleOrientationComplete),
causing the right side of ImageCompareView to show nothing. Using key={sessionId}
on the step components instead forces React to remount with fresh state when
switching sessions, without interfering with the upload/orientation flow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Step components (Deskew, Dewarp, Crop, Orientation) had local state
guards that prevented reloading when sessionId changed via sub-session
tab clicks. Added useEffect reset hooks that clear all local state
when sessionId changes, allowing the component to properly reload
the new session's data.
Also renamed "Box N" to "Seite N" in BoxSessionTabs per user feedback.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The page-split detection was only implemented in the regular pipeline
page but not in the OCR Overlay page where the user actually tests
with Kombi mode. Now the overlay page has full sub-session support:
- openSession: handles sub_sessions, parent_session_id, skip logic
for page-split vs crop-based sub-sessions, preserves current mode
- handleOrientationComplete: async, fetches API to detect sub-sessions
- BoxSessionTabs: shown between stepper and step content
- handleNext: returns to parent after sub-session completion
- handleSessionChange/handleBoxSessionsCreated: session switching
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
handleOrientationComplete was checking subSessions from React state,
but due to batching the state was still empty when the user clicked
"Seiten verarbeiten". Now fetches session data directly from the API
to reliably detect sub-sessions and auto-open the first one.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After orientation detection, the frontend now automatically calls the
page-split endpoint. When a double-page book spread is detected, two
sub-sessions are created and each goes through the full pipeline
(deskew/dewarp/crop) independently — essential because each page of a
spread tilts differently due to the spine.
Frontend changes:
- StepOrientation: calls POST /page-split after orientation, shows
split info ("Doppelseite erkannt"), notifies parent of sub-sessions
- page.tsx: distinguishes page-split sub-sessions (current_step < 5)
from crop-based sub-sessions (current_step >= 5). Page-split subs
only skip orientation, not deskew/dewarp/crop.
- page.tsx: handleOrientationComplete opens first sub-session when
page-split was detected
Backend changes (orientation_crop_api.py):
- page-split endpoint falls back to original image when orientation
rotated a landscape spread to portrait
- start_step parameter: 1 if split from original, 2 if from oriented
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Users can now right-click any cell to set text color (red, green, blue,
orange, purple, black) or remove the color bar without changing text.
A "reset" option restores the OCR-detected color. This enables accurate
Ground Truth marking when OCR assigns colors to wrong cells.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Color bar (red/colored indicator) now only shows when word_boxes
text still matches the cell text — editing the cell hides stale colors
- New "Auto-Korrektur" button: detects dominant prefix+number patterns
per column (e.g. p.70, p.71) and completes partial entries (.65 → p.65)
— requires 3+ matching entries before correcting
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resize handle: wider (9px), z-40 (above z-30 buttons).
Add-column button moved to bottom-right corner to avoid overlap.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New ImageLayoutEditor: SVG overlay on original scan with draggable
column dividers, horizontal guidelines (margins/header/footer),
double-click to add columns, x-button to delete
- GridTable: MIN_COL_WIDTH 40→80px for better readability
- Arrow up/down keys navigate between rows in the grid editor
- Ctrl+Click for multi-cell selection, Ctrl+B to toggle bold on selection
- getAdjacentCell works for cells that don't exist yet (new rows/cols)
- deleteColumn now merges x-boundaries correctly
- Session restore fix: grid_editor_result/structure_result in session GET
- Footer row 3-state cycle, auto-create cells for empty footer rows
- Grid save/build/GT-mark now advance current_step=11
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New document category "Woerterbuch" (frontend type + backend validation)
- Column delete: hover column header → red "x" button (with confirmation)
- Column add: hover column header → "+" button inserts after that column
- Both operations support undo/redo, update cell IDs and summary
- Available in both GridEditor and StepGridReview (Kombi last step)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New StepGridReview component: split-view (scan image left, grid right),
confidence stats, row-accept buttons, zoom controls
- Kombi Pipeline case 6 now uses StepGridReview instead of plain GridEditor
- Kombi step label changed to "Review & GT"
- Ground Truth queue page simplified to overview/navigation only
(links to Kombi pipeline for actual review work)
- Deep-link support: /ai/ocr-overlay?session=xxx&mode=kombi
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Ground-truth: zone.columns use 'label' not 'col_type' — calling
.replace() on undefined crashed the page after grid data loaded
- Model-management: same AIToolsSidebarResponsive wrapper bug as the
other pages — does not render children
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
AIToolsSidebarResponsive does not accept children — it renders only a
sidebar nav. Using it as a wrapper caused page content to never render.
Replaced with plain div, matching the pattern used by ocr-pipeline.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both pages passed `moduleId` which is not a valid prop for PagePurpose.
The component expects explicit title/purpose/audience — calling
audience.join() on undefined caused the client-side crash.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Changed `typeof grid.zones[][]` to `GridZone[][]` which was causing
a silent build error, preventing the vsplit zone grouping logic from
being compiled into the production bundle.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pages with two side-by-side vocabulary columns separated by a vertical
black line are now split into independent sub-zones before row/column
detection. Each sub-zone gets its own rows, preventing misalignment from
different heading rhythms.
- _detect_vertical_dividers(): finds pipe word_boxes at consistent x
positions spanning >50% of zone height
- _split_zone_at_vertical_dividers(): creates left/right PageZone objects
with layout_hint and vsplit_group metadata
- Column union skips vsplit zones (independent column sets)
- Frontend renders vsplit zones side by side via flex layout
- PageZone gets layout_hint + vsplit_group fields
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Post-processing steps like 5h (slash-IPA conversion) modify cell.text
but not individual word_boxes. The colored per-word display showed
stale word_box text instead of the corrected cell text. Now falls
back to the plain input when texts don't match.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Footer rows (e.g. page numbers) now show "F" in amber below the row
number, mirroring the blue "H" label for headers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Users can now draw rectangles on the document image in the Structure
Detection step to mark areas (e.g. header graphics, alphabet strips)
that should be excluded from OCR results during grid building.
- Backend: PUT/DELETE endpoints for exclude regions stored in structure_result
- Backend: _build_grid_core() filters all words inside user-defined exclude regions
- Frontend: Interactive rectangle drawing with visual overlay and delete buttons
- Preserve exclude regions when re-running structure detection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Ground Truth button on last step of Pipeline/Kombi modes in ocr-overlay
- Prominent category picker in active session info bar (pulses when unset)
- GT badge shown when session has ground truth reference
- Backend: auto-detect pipeline from ocr_engine, store in GT snapshot
- Pipeline info shown in GT session list and regression reports
- Also pass pipeline param from ocr-pipeline StepGroundTruth
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract _build_grid_core() from build_grid() endpoint for reuse.
New ocr_pipeline_regression.py with endpoints to mark sessions as
ground truth, list them, and run regression comparisons after code
changes. Frontend button in StepGroundTruth.tsx to mark/update GT.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The frontend was checking for an existing structure_result and reusing
it, which meant the backend fix (passing word_boxes to graphic detection)
never had a chance to run on existing sessions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When union columns from multiple content zones are applied, column
boundaries can span wider than any single zone's bbox. Using
zone.bbox_px.w as the scale reference caused the total scaled width
to exceed the container, pushing the table off-screen.
Now uses the actual total column width sum as the scale reference,
guaranteeing columns always fit within the container.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: merge gaps within 5% of image width — the spine area may have
thin ink strips splitting one physical gap into multiple detected gaps.
Only use gaps >= 2% width as split points.
Frontend: StepCrop now handles multi_page crop responses without
crashing on missing original_size/cropped_size fields.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend: add layout_metrics (avg_row_height_px, font_size_suggestion_px)
to build-grid response for faithful grid reconstruction.
Frontend: rewrite GridTable from HTML <table> to CSS Grid layout.
Column widths are now proportional to the OCR-measured x_min/x_max
positions. Row heights use the average content row height from the
scan. Column and row resize via drag handles (Excel-like).
Font: add Noto Sans (supports IPA characters) via next/font/google.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a cell has colored words (red !, blue phonetics), render each
word as a separate span with its own color instead of coloring the
entire input text with the first non-black color found.
Switches to editable input on cell selection (click).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Graphic detection needs word positions to exclude text from the ink mask.
Previously Struktur ran before OCR, causing every word to be detected as
a graphic element. Now:
- Pipeline: Struktur at index 7 (after Wörter)
- Kombi: Struktur at index 5 (after PP-OCRv5+Tesseract, before Tabelle)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>