Commit Graph

30 Commits

Author SHA1 Message Date
Benjamin Admin
610825ac14 SpreadsheetView: add bullet marker (•) for multi-line cells
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 44s
CI / test-go-edu-search (push) Successful in 40s
CI / test-python-klausur (push) Failing after 2m34s
CI / test-python-agent-core (push) Successful in 33s
CI / test-nodejs-website (push) Successful in 38s
Multi-line cells (containing \n) that don't already start with a
bullet character get • prepended in the frontend. This ensures
bullet points are visible regardless of whether the backend inserted
them (depends on when boxes were last rebuilt).

Skips header rows and cells that already have •, -, or – prefix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:53:54 +02:00
Benjamin Admin
6aec4742e5 SpreadsheetView: keep bullets as single cells with text-wrap
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 44s
CI / test-go-edu-search (push) Successful in 35s
CI / test-python-klausur (push) Failing after 2m37s
CI / test-python-agent-core (push) Successful in 27s
CI / test-nodejs-website (push) Successful in 31s
Revert row expansion — multi-line bullet cells stay as single cells
with \n and text-wrap (tb='2'). This way the text reflows when the
user resizes the column, like normal Excel behavior.

Row height auto-scales by line count (24px * lines).
Vertical alignment: top (vt=0) for multi-line cells.
Removed leading-space indentation hack (didn't work reliably).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:07:07 +02:00
Benjamin Admin
f2bc62b4f5 SpreadsheetView: bullet indentation, expanded rows, box borders
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 46s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m43s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 1m4s
Multi-line cells (\n): expanded into separate rows so each line gets
its own cell. Continuation lines (after •) indented with leading spaces.
Bullet marker lines (•) are bold.

Font-size detection: cells with word_box height >1.3x median get bold
and larger font (fs=12) for box titles.

Headers: is_header rows always bold with light background tint.

Box borders: thick colored outside border + thin inner grid lines.
Content zone: light gray grid borders.

Auto-fit column widths from longest text per column.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:15:43 +02:00
Benjamin Admin
674c9e949e SpreadsheetView: auto-fit column widths to longest text
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Failing after 22s
CI / test-go-edu-search (push) Failing after 23s
CI / test-python-klausur (push) Failing after 11s
CI / test-python-agent-core (push) Failing after 8s
CI / test-nodejs-website (push) Failing after 24s
Column widths now calculated from the longest text in each column
(~7.5px per character + padding). Takes the maximum of auto-fit
width and scaled original pixel width.

Multi-line cells: uses the longest line for width calculation.
Spanning header cells excluded from width calculation (they span
multiple columns and would inflate single-column widths).

Minimum column width: 60px.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:43:50 +02:00
Benjamin Admin
e131aa719e SpreadsheetView: formatting improvements for Excel-like display
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Failing after 21s
CI / test-go-edu-search (push) Failing after 19s
CI / test-python-klausur (push) Failing after 11s
CI / test-python-agent-core (push) Failing after 10s
CI / test-nodejs-website (push) Failing after 23s
Height: sheet height auto-calculated from row count (26px/row + toolbar),
no more cutoff at 21 rows. Row count set to exact (no padding).

Box borders: thick colored outside border + thin inner grid lines.
Content zone: light gray grid lines on all cells.

Headers: bold (bl=1) for is_header rows. Larger font detected via
word_box height comparison (>1.3x median → fs=12 + bold).

Box cells: light tinted background from box_bg_hex.
Header cells in boxes: slightly stronger tint.

Multi-line cells: text wrap enabled (tb='2'), \n preserved.
Bullet points (•) and indentation preserved in cell text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:29:50 +02:00
Benjamin Admin
17f0fdb2ed Refactor: extract _build_grid_core into grid_build_core.py + clean StepAnsicht
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Failing after 19s
CI / test-go-edu-search (push) Failing after 23s
CI / test-python-klausur (push) Failing after 10s
CI / test-python-agent-core (push) Failing after 9s
CI / test-nodejs-website (push) Failing after 26s
grid_editor_api.py: 2411 → 474 lines
- Extracted _build_grid_core() (1892 lines) into grid_build_core.py
- API file now only contains endpoints (build, save, get, gutter, box, unified)

StepAnsicht.tsx: 212 → 112 lines
- Removed useGridEditor imports (not needed for read-only spreadsheet)
- Removed unified grid fetch/build (not used with multi-sheet approach)
- Removed Spreadsheet/Grid toggle (only spreadsheet mode now)
- Simple: fetch grid-editor data → pass to SpreadsheetView

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:54:55 +02:00
Benjamin Admin
d4353d76fb SpreadsheetView: multi-sheet tabs instead of unified single sheet
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 36s
CI / test-go-edu-search (push) Successful in 36s
CI / test-python-klausur (push) Failing after 2m21s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 31s
Each zone becomes its own Excel sheet tab with independent column widths:
- Sheet "Vokabeln": main content zone with EN/DE/example columns
- Sheet "Pounds and euros": Box 1 with its own 4-column layout
- Sheet "German leihen": Box 2 with single column for flowing text

This solves the column-width conflict: boxes have different column
widths optimized for their content, which is impossible in a single
unified sheet (Excel limitation: column width is per-column, not per-cell).

Sheet tabs visible at bottom (showSheetTabs: true).
Box sheets get colored tab (from box_bg_hex).
First sheet active by default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:51:21 +02:00
Benjamin Admin
b42f394833 Integrate Fortune Sheet spreadsheet editor in StepAnsicht
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 36s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m40s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 33s
Install @fortune-sheet/react (MIT, v1.0.4) as Excel-like spreadsheet
component. New SpreadsheetView.tsx converts unified grid data to
Fortune Sheet format (celldata, merge config, column/row sizes).

StepAnsicht now has Spreadsheet/Grid toggle:
- Spreadsheet mode: full Fortune Sheet with toolbar (bold, italic,
  color, borders, merge cells, text wrap, undo/redo)
- Grid mode: existing GridTable for quick editing

Box-origin cells get light tinted background in spreadsheet view.
Colspan cells converted to Fortune Sheet merge format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:08:03 +02:00
Benjamin Admin
c1a903537b Unified Grid: merge all zones into single Excel-like grid
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m35s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 33s
Backend (unified_grid.py):
- build_unified_grid(): merges content + box zones into one zone
- Dominant row height from median of content row spacings
- Full-width boxes: rows integrated directly
- Partial-width boxes: extra rows inserted when box has more text
  lines than standard rows fit (e.g., 7 lines in 5-row height)
- Box-origin cells tagged with source_zone_type + box_region metadata

Backend (grid_editor_api.py):
- POST /sessions/{id}/build-unified-grid → persists as unified_grid_result
- GET /sessions/{id}/unified-grid → retrieve persisted result

Frontend:
- GridEditorCell: added source_zone_type, box_region fields
- GridTable: box-origin cells get tinted background + left border
- StepAnsicht: split-view with original image (left) + editable
  unified GridTable (right). Auto-builds on first load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 23:37:55 +02:00
Benjamin Admin
7085c87618 StepAnsicht: dominant row height for content + proportional box rows
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 33s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m35s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 31s
Content sections: use dominant (median) row height from all content
rows instead of per-section average. This ensures uniform row height
above and below boxes (the standard case on textbook pages).

Box sections: distribute height proportionally by text line count
per row. A header (1 line) gets 1/7 of box height, a bullet with
3 lines gets 3/7. Fixes Box 2 where row 3 was cut off because
even distribution didn't account for multi-line cells.

Removed overflow:hidden from box container to prevent clipping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:43:02 +02:00
Benjamin Admin
1b7e095176 StepAnsicht: fix row filtering for partial-width boxes
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 45s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m34s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 36s
Content rows were incorrectly filtered out when their Y overlapped
with a box, even if the box only covered the right half of the page.
Now checks both Y AND X overlap — rows are only excluded if they
start within the box's horizontal range.

Fixes: rows next to Box 2 (lend, coconut, taste) were missing from
reconstruction because Box 2 (x=871, w=525) only covers the right
side, but left-side content rows at x≈148 were being filtered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:00:28 +02:00
Benjamin Admin
dcb873db35 StepAnsicht: section-based layout with averaged row heights
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 38s
CI / test-go-edu-search (push) Successful in 38s
CI / test-python-klausur (push) Failing after 2m28s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 40s
Major rewrite of reconstruction rendering:
- Page split into vertical sections (content/box) around box boundaries
- Content sections: uniform row height = (last_row - first_row) / (n-1)
- Box sections: rows evenly distributed within box height
- Content rows positioned absolutely at original y-coordinates
- Font size derived from row height (55% of row height)
- Multi-line cells (bullets) get expanded height with indentation
- Boxes render at exact bbox position with colored border
- Preparation for unified grid where boxes become part of main grid

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:29:40 +02:00
Benjamin Admin
fd39d13d06 StepAnsicht: use server-rendered OCR overlay image
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 40s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m38s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 24s
Replace manual word_box positioning (wild/unsnapped) with the
server-rendered words-overlay image from the OCR step endpoint.
This shows the same cleanly snapped red letters as the OCR step.

Endpoint: /sessions/{id}/image/words-overlay

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:26:54 +02:00
Benjamin Admin
c5733a171b StepAnsicht: fix font size and row spacing to match original
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 40s
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-agent-core (push) Has been cancelled
CI / test-python-klausur (push) Has been cancelled
- Font: use font_size_suggestion_px * scale directly (removed 0.85 factor)
- Row height: calculate from row-to-row spacing (y_min of next row
  minus y_min of current row) instead of text height (y_max - y_min).
  This produces correct line spacing matching the original layout.
- Multi-line cells: height multiplied by line count

Content zone should now span from ~250 to ~2050 matching the original.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:24:27 +02:00
Benjamin Admin
18213f0bde StepAnsicht: split-view with coordinate grid for comparison
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 45s
CI / test-go-edu-search (push) Successful in 40s
CI / test-python-klausur (push) Failing after 2m37s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 36s
Left panel: Original scan + OCR word overlay (red text at exact
word_box positions) + coordinate grid
Right panel: Reconstructed layout + same coordinate grid

Features:
- Coordinate grid toggle with 50/100/200px spacing options
- Grid lines labeled with pixel coordinates in original image space
- Both panels share the same scale for direct visual comparison
- OCR overlay shows detected text in red mono font at original positions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:00:22 +02:00
Benjamin Admin
cd8eb6ce46 Add Ansicht step (Step 12) — read-only page layout preview
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 49s
CI / test-python-klausur (push) Failing after 2m33s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 36s
New pipeline step showing the reconstructed page with all zones
positioned at their original coordinates:
- Content zones with vocabulary grid cells
- Box zones with colored borders (from structure detection)
- Colspan cells rendered across multiple columns
- Multi-line cells (bullets) with pre-wrap whitespace
- Toggle to overlay original scan image at 15% opacity
- Proportionally scaled to viewport width
- Pure CSS positioning (no canvas/Fabric.js)

Pipeline: 14 steps (0-13), Ground Truth moved to Step 13.
Added colspan field to GridEditorCell type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:42:33 +02:00
Benjamin Admin
48de4d98cd Fix infinite loop in StepBoxGridReview auto-build
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 35s
CI / test-python-klausur (push) Failing after 2m41s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 35s
Auto-build was triggering on every grid.zones.length change, which
happens on every rebuild (zone indices increment). Now uses a ref
to ensure auto-build fires only once. Also removed boxZones.length===0
condition that could trigger unnecessary builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:06:11 +02:00
Benjamin Admin
8b29d20940 StepBoxGridReview: show box border color from structure detection
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m46s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 35s
- Use box_bg_hex for border color (from Step 7 structure detection)
- Numbered color badges per box
- Show color name in box header
- Add box_bg_color/box_bg_hex to GridZone type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:18:36 +02:00
Benjamin Admin
12b194ad1a Fix StepBoxGridReview: match GridTable props interface
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 46s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m50s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 38s
GridTable expects zone (singular), onSelectCell, onCellTextChange,
onToggleColumnBold, onToggleRowHeader, onNavigate — not the
incorrect prop names from the first version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 22:39:38 +02:00
Benjamin Admin
5da9a550bf Add Box-Grid-Review step (Step 11) to OCR pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 44s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m52s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 37s
New pipeline step between Gutter Repair and Ground Truth that processes
embedded boxes (grammar tips, exercises) independently from the main grid.

Backend:
- cv_box_layout.py: classify_box_layout() detects flowing/columnar/
  bullet_list/header_only layout types per box
- build_box_zone_grid(): layout-aware grid building (single-column for
  flowing text, independent columns for tabular content)
- POST /sessions/{id}/build-box-grids endpoint with SmartSpellChecker
- Layout type overridable per box via request body

Frontend:
- StepBoxGridReview.tsx: shows each box with cropped image + editable
  GridTable. Layout type dropdown per box. Auto-builds on first load.
- Auto-skip when no boxes detected on page
- Pipeline steps updated: 13 steps (0-12), Ground Truth moved to 12

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:26:06 +02:00
Benjamin Admin
8c482ce8dd Fix Grid Build step: show grid-editor summary instead of word_result
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m31s
CI / test-python-agent-core (push) Successful in 21s
CI / test-nodejs-website (push) Successful in 23s
The Grid Build step was showing word_result.grid_shape (from the initial
OCR word clustering, often just 1 column) instead of the grid-editor
summary (zone-based, with correct column/row/cell counts). Now reads
summary.total_rows/total_columns/total_cells from the grid-editor result.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:01:18 +02:00
Benjamin Admin
2828871e42 Show detected page number in session header
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 42s
CI / test-python-klausur (push) Failing after 2m21s
CI / test-python-agent-core (push) Successful in 27s
CI / test-nodejs-website (push) Successful in 28s
Extracts page_number from grid_editor_result when opening a session
and displays it as "S. 233" badge in the SessionHeader, next to the
category and GT badges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:20:53 +02:00
Benjamin Admin
611e1ee33d Add GT badge to grouped sessions and sub-pages in session list
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m29s
CI / test-python-agent-core (push) Successful in 28s
CI / test-nodejs-website (push) Successful in 34s
The GT badge was only shown on ungrouped SessionRow items. Now also
visible on document group rows (e.g. "GT 1/2") and individual pages
within expanded groups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 23:54:55 +02:00
Benjamin Admin
e6f8e12f44 Show full Grid-Review in Ground Truth step + GT badge in session list
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 34s
CI / test-go-edu-search (push) Successful in 37s
CI / test-python-klausur (push) Failing after 2m18s
CI / test-python-agent-core (push) Successful in 22s
CI / test-nodejs-website (push) Successful in 27s
- StepGroundTruth now shows the split view (original image + table)
  so the user can verify the final result before marking as GT
- Backend session list now returns is_ground_truth flag
- SessionList shows amber "GT" badge for marked sessions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:34:32 +02:00
Benjamin Admin
d1e7dd1c4a Fix gutter repair: detect short fragments + show spell alternatives
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 48s
CI / test-go-edu-search (push) Successful in 49s
CI / test-python-klausur (push) Failing after 2m37s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 35s
- Lower min word length from 3→2 for hyphen-join candidates so fragments
  like "ve" (from "ver-künden") are no longer skipped
- Return all spellchecker candidates instead of just top-1, so user can
  pick the correct form (e.g. "stammeln" vs "stammelt")
- Frontend shows clickable alternative buttons for spell_fix suggestions
- Backend accepts text_overrides in apply endpoint for user-selected alternatives

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:09:12 +02:00
Benjamin Admin
71e1b10ac7 Add gutter repair step to OCR Kombi pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 36s
CI / test-python-klausur (push) Failing after 2m31s
CI / test-python-agent-core (push) Successful in 28s
CI / test-nodejs-website (push) Successful in 29s
New step "Wortkorrektur" between Grid-Review and Ground Truth that detects
and fixes words truncated or blurred at the book gutter (binding area) of
double-page scans. Uses pyspellchecker (DE+EN) for validation.

Two repair strategies:
- hyphen_join: words split across rows with missing chars (ve + künden → verkünden)
- spell_fix: garbled trailing chars from gutter blur (stammeli → stammeln)

Interactive frontend with per-suggestion accept/reject and batch controls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 18:50:16 +02:00
Benjamin Admin
0168ab1a67 Remove Hauptseite/Box tabs from Kombi pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m15s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 20s
Page-split now creates independent sessions that appear directly in
the session list. After split, the UI switches to the first child
session. BoxSessionTabs, sub-session state, and parent-child tracking
removed from Kombi code. Legacy ocr-overlay still uses BoxSessionTabs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 17:43:58 +01:00
Benjamin Admin
9f68bd3425 feat: Implement page-split step with auto-detection and sub-session naming
StepPageSplit now:
- Auto-calls POST /page-split on step entry
- Shows oriented image + detection result
- If double page: creates sub-sessions named "Title — S. 1/2"
- If single page: green badge "keine Trennung noetig"
- Manual "Weiter" button (no auto-advance)

Also:
- StepOrientation wrapper simplified (no page-split in orientation)
- StepUpload passes name back via onUploaded(sid, name)
- page.tsx: after page-split "Weiter" switches to first sub-session
- useKombiPipeline exposes setSessionName

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 17:56:45 +01:00
Benjamin Admin
469f09d1e1 fix: Redesign StepUpload for manual step control
StepUpload now has 3 phases:
1. File selection: drop zone / file picker → shows preview
2. Review: title input, category, file info → "Hochladen" button
3. Uploaded: shows session image → "Weiter" button

No more auto-advance after upload. User controls every step.
openSession() removed from onUploaded callback to prevent
step-reset race condition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 17:35:36 +01:00
Benjamin Admin
d26a9f60ab Add OCR Kombi Pipeline: modular 11-step architecture with multi-page support
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m24s
CI / test-python-agent-core (push) Successful in 22s
CI / test-nodejs-website (push) Successful in 20s
Phase 1 of the clean architecture refactor: Replaces the 751-line ocr-overlay
monolith with a modular pipeline. Each step gets its own component file.

Frontend: /ai/ocr-kombi route with 11 steps (Upload, Orientation, PageSplit,
Deskew, Dewarp, ContentCrop, OCR, Structure, GridBuild, GridReview, GroundTruth).
Session list supports document grouping for multi-page uploads.

Backend: New ocr_kombi/ module with multi-page PDF upload (splits PDF into N
sessions with shared document_group_id). DB migration adds document_group_id
and page_number columns.

Old /ai/ocr-overlay remains fully functional for A/B testing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 15:55:28 +01:00