Commit Graph

213 Commits

Author SHA1 Message Date
Benjamin Admin
e131aa719e SpreadsheetView: formatting improvements for Excel-like display
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Failing after 21s
CI / test-go-edu-search (push) Failing after 19s
CI / test-python-klausur (push) Failing after 11s
CI / test-python-agent-core (push) Failing after 10s
CI / test-nodejs-website (push) Failing after 23s
Height: sheet height auto-calculated from row count (26px/row + toolbar),
no more cutoff at 21 rows. Row count set to exact (no padding).

Box borders: thick colored outside border + thin inner grid lines.
Content zone: light gray grid lines on all cells.

Headers: bold (bl=1) for is_header rows. Larger font detected via
word_box height comparison (>1.3x median → fs=12 + bold).

Box cells: light tinted background from box_bg_hex.
Header cells in boxes: slightly stronger tint.

Multi-line cells: text wrap enabled (tb='2'), \n preserved.
Bullet points (•) and indentation preserved in cell text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:29:50 +02:00
Benjamin Admin
17f0fdb2ed Refactor: extract _build_grid_core into grid_build_core.py + clean StepAnsicht
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Failing after 19s
CI / test-go-edu-search (push) Failing after 23s
CI / test-python-klausur (push) Failing after 10s
CI / test-python-agent-core (push) Failing after 9s
CI / test-nodejs-website (push) Failing after 26s
grid_editor_api.py: 2411 → 474 lines
- Extracted _build_grid_core() (1892 lines) into grid_build_core.py
- API file now only contains endpoints (build, save, get, gutter, box, unified)

StepAnsicht.tsx: 212 → 112 lines
- Removed useGridEditor imports (not needed for read-only spreadsheet)
- Removed unified grid fetch/build (not used with multi-sheet approach)
- Removed Spreadsheet/Grid toggle (only spreadsheet mode now)
- Simple: fetch grid-editor data → pass to SpreadsheetView

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:54:55 +02:00
Benjamin Admin
d4353d76fb SpreadsheetView: multi-sheet tabs instead of unified single sheet
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 36s
CI / test-go-edu-search (push) Successful in 36s
CI / test-python-klausur (push) Failing after 2m21s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 31s
Each zone becomes its own Excel sheet tab with independent column widths:
- Sheet "Vokabeln": main content zone with EN/DE/example columns
- Sheet "Pounds and euros": Box 1 with its own 4-column layout
- Sheet "German leihen": Box 2 with single column for flowing text

This solves the column-width conflict: boxes have different column
widths optimized for their content, which is impossible in a single
unified sheet (Excel limitation: column width is per-column, not per-cell).

Sheet tabs visible at bottom (showSheetTabs: true).
Box sheets get colored tab (from box_bg_hex).
First sheet active by default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:51:21 +02:00
Benjamin Admin
b42f394833 Integrate Fortune Sheet spreadsheet editor in StepAnsicht
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 36s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m40s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 33s
Install @fortune-sheet/react (MIT, v1.0.4) as Excel-like spreadsheet
component. New SpreadsheetView.tsx converts unified grid data to
Fortune Sheet format (celldata, merge config, column/row sizes).

StepAnsicht now has Spreadsheet/Grid toggle:
- Spreadsheet mode: full Fortune Sheet with toolbar (bold, italic,
  color, borders, merge cells, text wrap, undo/redo)
- Grid mode: existing GridTable for quick editing

Box-origin cells get light tinted background in spreadsheet view.
Colspan cells converted to Fortune Sheet merge format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:08:03 +02:00
Benjamin Admin
c1a903537b Unified Grid: merge all zones into single Excel-like grid
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m35s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 33s
Backend (unified_grid.py):
- build_unified_grid(): merges content + box zones into one zone
- Dominant row height from median of content row spacings
- Full-width boxes: rows integrated directly
- Partial-width boxes: extra rows inserted when box has more text
  lines than standard rows fit (e.g., 7 lines in 5-row height)
- Box-origin cells tagged with source_zone_type + box_region metadata

Backend (grid_editor_api.py):
- POST /sessions/{id}/build-unified-grid → persists as unified_grid_result
- GET /sessions/{id}/unified-grid → retrieve persisted result

Frontend:
- GridEditorCell: added source_zone_type, box_region fields
- GridTable: box-origin cells get tinted background + left border
- StepAnsicht: split-view with original image (left) + editable
  unified GridTable (right). Auto-builds on first load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 23:37:55 +02:00
Benjamin Admin
7085c87618 StepAnsicht: dominant row height for content + proportional box rows
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 33s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m35s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 31s
Content sections: use dominant (median) row height from all content
rows instead of per-section average. This ensures uniform row height
above and below boxes (the standard case on textbook pages).

Box sections: distribute height proportionally by text line count
per row. A header (1 line) gets 1/7 of box height, a bullet with
3 lines gets 3/7. Fixes Box 2 where row 3 was cut off because
even distribution didn't account for multi-line cells.

Removed overflow:hidden from box container to prevent clipping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:43:02 +02:00
Benjamin Admin
1b7e095176 StepAnsicht: fix row filtering for partial-width boxes
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 45s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m34s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 36s
Content rows were incorrectly filtered out when their Y overlapped
with a box, even if the box only covered the right half of the page.
Now checks both Y AND X overlap — rows are only excluded if they
start within the box's horizontal range.

Fixes: rows next to Box 2 (lend, coconut, taste) were missing from
reconstruction because Box 2 (x=871, w=525) only covers the right
side, but left-side content rows at x≈148 were being filtered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:00:28 +02:00
Benjamin Admin
dcb873db35 StepAnsicht: section-based layout with averaged row heights
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 38s
CI / test-go-edu-search (push) Successful in 38s
CI / test-python-klausur (push) Failing after 2m28s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 40s
Major rewrite of reconstruction rendering:
- Page split into vertical sections (content/box) around box boundaries
- Content sections: uniform row height = (last_row - first_row) / (n-1)
- Box sections: rows evenly distributed within box height
- Content rows positioned absolutely at original y-coordinates
- Font size derived from row height (55% of row height)
- Multi-line cells (bullets) get expanded height with indentation
- Boxes render at exact bbox position with colored border
- Preparation for unified grid where boxes become part of main grid

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:29:40 +02:00
Benjamin Admin
fd39d13d06 StepAnsicht: use server-rendered OCR overlay image
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 40s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m38s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 24s
Replace manual word_box positioning (wild/unsnapped) with the
server-rendered words-overlay image from the OCR step endpoint.
This shows the same cleanly snapped red letters as the OCR step.

Endpoint: /sessions/{id}/image/words-overlay

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:26:54 +02:00
Benjamin Admin
c5733a171b StepAnsicht: fix font size and row spacing to match original
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 40s
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-agent-core (push) Has been cancelled
CI / test-python-klausur (push) Has been cancelled
- Font: use font_size_suggestion_px * scale directly (removed 0.85 factor)
- Row height: calculate from row-to-row spacing (y_min of next row
  minus y_min of current row) instead of text height (y_max - y_min).
  This produces correct line spacing matching the original layout.
- Multi-line cells: height multiplied by line count

Content zone should now span from ~250 to ~2050 matching the original.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:24:27 +02:00
Benjamin Admin
18213f0bde StepAnsicht: split-view with coordinate grid for comparison
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 45s
CI / test-go-edu-search (push) Successful in 40s
CI / test-python-klausur (push) Failing after 2m37s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 36s
Left panel: Original scan + OCR word overlay (red text at exact
word_box positions) + coordinate grid
Right panel: Reconstructed layout + same coordinate grid

Features:
- Coordinate grid toggle with 50/100/200px spacing options
- Grid lines labeled with pixel coordinates in original image space
- Both panels share the same scale for direct visual comparison
- OCR overlay shows detected text in red mono font at original positions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:00:22 +02:00
Benjamin Admin
cd8eb6ce46 Add Ansicht step (Step 12) — read-only page layout preview
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 49s
CI / test-python-klausur (push) Failing after 2m33s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 36s
New pipeline step showing the reconstructed page with all zones
positioned at their original coordinates:
- Content zones with vocabulary grid cells
- Box zones with colored borders (from structure detection)
- Colspan cells rendered across multiple columns
- Multi-line cells (bullets) with pre-wrap whitespace
- Toggle to overlay original scan image at 15% opacity
- Proportionally scaled to viewport width
- Pure CSS positioning (no canvas/Fabric.js)

Pipeline: 14 steps (0-13), Ground Truth moved to Step 13.
Added colspan field to GridEditorCell type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:42:33 +02:00
Benjamin Admin
2c2bdf903a Fix GridTable: replace ternary chain with IIFE for cell rendering
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 44s
CI / test-go-edu-search (push) Successful in 36s
CI / test-python-klausur (push) Failing after 2m28s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 31s
Chained ternary (colored ? div : multiline ? textarea : input) caused
webpack SWC parser issues. Replaced with IIFE {(() => { if/return })()}
which is more robust and readable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:10:22 +02:00
Benjamin Admin
947ff6bdcb Fix JSX ternary nesting for textarea/input in GridTable
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m32s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 28s
Remove extra curly braces around the textarea/input ternary that
caused webpack syntax error. The ternary is now a chained condition:
hasColoredWords ? <div> : text.includes('\n') ? <textarea> : <input>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:02:22 +02:00
Benjamin Admin
92e4021898 Fix GridTable JSX syntax error in colspan rendering
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 42s
CI / test-python-klausur (push) Failing after 2m43s
CI / test-python-agent-core (push) Successful in 33s
CI / test-nodejs-website (push) Successful in 39s
Mismatched closing tags from previous colspan edit caused webpack
build failure. Cleaned up spanning cell map() return structure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:52:26 +02:00
Benjamin Admin
108f1b1a2a GridTable: render multi-line cells with textarea
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 46s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-klausur (push) Failing after 2m53s
CI / test-python-agent-core (push) Successful in 32s
CI / test-nodejs-website (push) Successful in 34s
Cells containing \n (bullet items with continuation lines) now use
<textarea> instead of <input type=text>, making all lines visible.
Row height auto-expands based on line count in the cell.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:17:29 +02:00
Benjamin Admin
48de4d98cd Fix infinite loop in StepBoxGridReview auto-build
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 35s
CI / test-python-klausur (push) Failing after 2m41s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 35s
Auto-build was triggering on every grid.zones.length change, which
happens on every rebuild (zone indices increment). Now uses a ref
to ensure auto-build fires only once. Also removed boxZones.length===0
condition that could trigger unnecessary builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:06:11 +02:00
Benjamin Admin
709e41e050 GridTable: support partial colspan (2-of-4 columns)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 40s
CI / test-python-klausur (push) Failing after 2m16s
CI / test-python-agent-core (push) Successful in 28s
CI / test-nodejs-website (push) Successful in 31s
Previously GridTable only supported full-row spanning (one cell across
all columns). Now renders each spanning_header cell with its actual
colspan, positioned at the correct grid column. This allows rows like
"In Britain..." (colspan=2) + "In Germany..." (colspan=2) to render
side by side instead of only showing the first cell.

Also fix box row fields: is_header always set (was undefined for
flowing/bullet_list), y_min_px/y_max_px for header_only rows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:47:14 +02:00
Benjamin Admin
8b29d20940 StepBoxGridReview: show box border color from structure detection
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m46s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 35s
- Use box_bg_hex for border color (from Step 7 structure detection)
- Numbered color badges per box
- Show color name in box header
- Add box_bg_color/box_bg_hex to GridZone type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:18:36 +02:00
Benjamin Admin
12b194ad1a Fix StepBoxGridReview: match GridTable props interface
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 46s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m50s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 38s
GridTable expects zone (singular), onSelectCell, onCellTextChange,
onToggleColumnBold, onToggleRowHeader, onNavigate — not the
incorrect prop names from the first version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 22:39:38 +02:00
Benjamin Admin
5da9a550bf Add Box-Grid-Review step (Step 11) to OCR pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 44s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m52s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 37s
New pipeline step between Gutter Repair and Ground Truth that processes
embedded boxes (grammar tips, exercises) independently from the main grid.

Backend:
- cv_box_layout.py: classify_box_layout() detects flowing/columnar/
  bullet_list/header_only layout types per box
- build_box_zone_grid(): layout-aware grid building (single-column for
  flowing text, independent columns for tabular content)
- POST /sessions/{id}/build-box-grids endpoint with SmartSpellChecker
- Layout type overridable per box via request body

Frontend:
- StepBoxGridReview.tsx: shows each box with cropped image + editable
  GridTable. Layout type dropdown per box. Auto-builds on first load.
- Auto-skip when no boxes detected on page
- Pipeline steps updated: 13 steps (0-12), Ground Truth moved to 12

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:26:06 +02:00
Benjamin Admin
5244e10728 Fix IPA/syllable race condition: loadGrid no longer depends on buildGrid
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m55s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Has been cancelled
loadGrid depended on buildGrid (for 404 fallback), which depended on
ipaMode/syllableMode. Every mode change created a new loadGrid ref,
triggering StepGridReview's useEffect to load the OLD saved grid,
overwriting the freshly rebuilt one.

Now loadGrid only depends on sessionId. The 404 fallback builds inline
with current modes. Mode changes are handled exclusively by the
separate rebuild useEffect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:59:49 +02:00
Benjamin Admin
54b1c7d7d7 Fix IPA/syllable first-click not working (off-by-one in initialLoadDone)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 46s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m52s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 38s
The old guard checked if grid was loaded AND set initialLoadDone in
the same pass, then returned without rebuilding. This meant the first
user-triggered mode change was always swallowed.

Simplified to a mount-skip ref: skip exactly the first useEffect trigger
(component mount), rebuild on every subsequent trigger (user changes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:40:57 +02:00
Benjamin Admin
d8a2331038 Fix IPA/syllable mode change requiring double-click
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m58s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 38s
The useEffect for mode changes called buildGrid() which was a
useCallback closing over stale ipaMode/syllableMode values due to
React's asynchronous state batching. The first click triggered a
rebuild with the OLD mode; only the second click used the new one.

Now inlines the API call directly in the useEffect, reading ipaMode
and syllableMode from the effect's closure which always has the
current values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 09:32:02 +02:00
Benjamin Admin
8c482ce8dd Fix Grid Build step: show grid-editor summary instead of word_result
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 31s
CI / test-go-edu-search (push) Successful in 31s
CI / test-python-klausur (push) Failing after 2m31s
CI / test-python-agent-core (push) Successful in 21s
CI / test-nodejs-website (push) Successful in 23s
The Grid Build step was showing word_result.grid_shape (from the initial
OCR word clustering, often just 1 column) instead of the grid-editor
summary (zone-based, with correct column/row/cell counts). Now reads
summary.total_rows/total_columns/total_cells from the grid-editor result.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:01:18 +02:00
Benjamin Admin
2828871e42 Show detected page number in session header
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 42s
CI / test-python-klausur (push) Failing after 2m21s
CI / test-python-agent-core (push) Successful in 27s
CI / test-nodejs-website (push) Successful in 28s
Extracts page_number from grid_editor_result when opening a session
and displays it as "S. 233" badge in the SessionHeader, next to the
category and GT badges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 00:20:53 +02:00
Benjamin Admin
611e1ee33d Add GT badge to grouped sessions and sub-pages in session list
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 39s
CI / test-go-edu-search (push) Successful in 41s
CI / test-python-klausur (push) Failing after 2m29s
CI / test-python-agent-core (push) Successful in 28s
CI / test-nodejs-website (push) Successful in 34s
The GT badge was only shown on ungrouped SessionRow items. Now also
visible on document group rows (e.g. "GT 1/2") and individual pages
within expanded groups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 23:54:55 +02:00
Benjamin Admin
e6f8e12f44 Show full Grid-Review in Ground Truth step + GT badge in session list
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 34s
CI / test-go-edu-search (push) Successful in 37s
CI / test-python-klausur (push) Failing after 2m18s
CI / test-python-agent-core (push) Successful in 22s
CI / test-nodejs-website (push) Successful in 27s
- StepGroundTruth now shows the split view (original image + table)
  so the user can verify the final result before marking as GT
- Backend session list now returns is_ground_truth flag
- SessionList shows amber "GT" badge for marked sessions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:34:32 +02:00
Benjamin Admin
d1e7dd1c4a Fix gutter repair: detect short fragments + show spell alternatives
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 48s
CI / test-go-edu-search (push) Successful in 49s
CI / test-python-klausur (push) Failing after 2m37s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 35s
- Lower min word length from 3→2 for hyphen-join candidates so fragments
  like "ve" (from "ver-künden") are no longer skipped
- Return all spellchecker candidates instead of just top-1, so user can
  pick the correct form (e.g. "stammeln" vs "stammelt")
- Frontend shows clickable alternative buttons for spell_fix suggestions
- Backend accepts text_overrides in apply endpoint for user-selected alternatives

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:09:12 +02:00
Benjamin Admin
71e1b10ac7 Add gutter repair step to OCR Kombi pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 36s
CI / test-python-klausur (push) Failing after 2m31s
CI / test-python-agent-core (push) Successful in 28s
CI / test-nodejs-website (push) Successful in 29s
New step "Wortkorrektur" between Grid-Review and Ground Truth that detects
and fixes words truncated or blurred at the book gutter (binding area) of
double-page scans. Uses pyspellchecker (DE+EN) for validation.

Two repair strategies:
- hyphen_join: words split across rows with missing chars (ve + künden → verkünden)
- spell_fix: garbled trailing chars from gutter blur (stammeli → stammeln)

Interactive frontend with per-suggestion accept/reject and batch controls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 18:50:16 +02:00
Benjamin Admin
0168ab1a67 Remove Hauptseite/Box tabs from Kombi pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m15s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 20s
Page-split now creates independent sessions that appear directly in
the session list. After split, the UI switches to the first child
session. BoxSessionTabs, sub-session state, and parent-child tracking
removed from Kombi code. Legacy ocr-overlay still uses BoxSessionTabs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 17:43:58 +01:00
Benjamin Admin
9f68bd3425 feat: Implement page-split step with auto-detection and sub-session naming
StepPageSplit now:
- Auto-calls POST /page-split on step entry
- Shows oriented image + detection result
- If double page: creates sub-sessions named "Title — S. 1/2"
- If single page: green badge "keine Trennung noetig"
- Manual "Weiter" button (no auto-advance)

Also:
- StepOrientation wrapper simplified (no page-split in orientation)
- StepUpload passes name back via onUploaded(sid, name)
- page.tsx: after page-split "Weiter" switches to first sub-session
- useKombiPipeline exposes setSessionName

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 17:56:45 +01:00
Benjamin Admin
469f09d1e1 fix: Redesign StepUpload for manual step control
StepUpload now has 3 phases:
1. File selection: drop zone / file picker → shows preview
2. Review: title input, category, file info → "Hochladen" button
3. Uploaded: shows session image → "Weiter" button

No more auto-advance after upload. User controls every step.
openSession() removed from onUploaded callback to prevent
step-reset race condition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 17:35:36 +01:00
Benjamin Admin
3bb04b25ab fix: OCR Kombi upload race condition — openSession was resetting step to 0
openSession mapped dbStep=1 to uiStep=0 (upload), overriding handleNext's
advancement to step 1. Fix: sessions always exist post-upload, so always
skip past the upload step in openSession.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 17:10:04 +01:00
Benjamin Admin
eaade3cad2 feat: Maschinenbau-Branche + INDUSTRY_REGULATION_MAP erweitert
- Neue Branche "Maschinenbau" mit 15 Regularien (MACHINERY_REG, BLUE_GUIDE, CRA, etc.)
- BDSG zu allen DE-relevanten Branchen hinzugefuegt
- Nationale Gesetze (HGB, AO, BGB, UrhG, etc.) branchenspezifisch gemapped
- IoT erweitert: MACHINERY_REG, BLUE_GUIDE, NIS2, DE_ELEKTROG
- THEMATIC_GROUPS: Produktsicherheit um MACHINERY_REG + BLUE_GUIDE erweitert

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:59:31 +01:00
Benjamin Admin
d26a9f60ab Add OCR Kombi Pipeline: modular 11-step architecture with multi-page support
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m24s
CI / test-python-agent-core (push) Successful in 22s
CI / test-nodejs-website (push) Successful in 20s
Phase 1 of the clean architecture refactor: Replaces the 751-line ocr-overlay
monolith with a modular pipeline. Each step gets its own component file.

Frontend: /ai/ocr-kombi route with 11 steps (Upload, Orientation, PageSplit,
Deskew, Dewarp, ContentCrop, OCR, Structure, GridBuild, GridReview, GroundTruth).
Session list supports document grouping for multi-page uploads.

Backend: New ocr_kombi/ module with multi-page PDF upload (splits PDF into N
sessions with shared document_group_id). DB migration adds document_group_id
and page_number columns.

Old /ai/ocr-overlay remains fully functional for A/B testing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 15:55:28 +01:00
Benjamin Admin
d26233b5b3 Add page number display to StepGridReview summary bar
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 48s
CI / test-go-edu-search (push) Successful in 36s
CI / test-python-klausur (push) Failing after 2m17s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 24s
The page_number was only shown in GridEditor.tsx (ocr-overlay) but
the OCR pipeline uses StepGridReview.tsx which has its own summary bar.
Display the extracted page number (e.g. "S. 233") next to the
dictionary detection badge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 11:21:44 +01:00
Benjamin Admin
e019dde01b Extract page number as metadata instead of silently removing it
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 36s
CI / test-python-klausur (push) Failing after 2m9s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 21s
_filter_footer_words now returns page number info (text, y_pct, number)
instead of just removing footer words. The page number is included in
the grid result as `page_number` and displayed in the frontend summary
bar as "S. 233".

This preserves page numbers for later page concatenation in the
customer frontend while still removing them from the grid content.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 08:52:09 +01:00
Benjamin Admin
f860eb66e6 Add German IPA support (wiki-pronunciation-dict + epitran)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 33s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m12s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 17s
Hybrid approach mirroring English IPA:
- Primary: wiki-pronunciation-dict (636k entries, CC-BY-SA, Wiktionary)
- Fallback: epitran rule-based G2P (MIT license)

IPA modes now use language-appropriate dictionaries:
- auto/en: English IPA (Britfone + eng_to_ipa)
- de: German IPA (wiki-pronunciation-dict + epitran)
- all: EN column gets English IPA, other columns get German IPA
- none: disabled

Frontend shows CC-BY-SA attribution when German IPA is active.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 22:18:20 +01:00
Benjamin Admin
47e83d90bd Remove IPA:DE option — no German IPA dictionary available
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m54s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 16s
Our IPA system only has English dictionaries (Britfone MIT, eng_to_ipa
MIT). The "IPA: nur DE" option was useless at best and misleading.
Removed from dropdown, type definition, and API validation.

Syllable DE mode stays — pyphen has a German hyphenation dictionary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 21:53:43 +01:00
Benjamin Admin
256df820cd Auto-rebuild grid when IPA or syllable mode dropdown changes
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m0s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 15s
The dropdowns only updated state but didn't trigger buildGrid().
Now a useEffect watches ipaMode/syllableMode and rebuilds
automatically (skipping the initial mount).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 20:43:20 +01:00
Benjamin Admin
83c058e400 Add language-specific IPA and syllable modes (de/en)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m50s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 15s
Extend ipa_mode and syllable_mode toggles with language options:
- auto: smart detection (default)
- en: only English headword column
- de: only German definition columns
- all: all content columns
- none: skip entirely

Also improve English column auto-detection: use garbled IPA patterns
(apostrophes, colons) in addition to bracket patterns. This correctly
identifies English dictionary pages where OCR produces garbled ASCII
instead of bracket IPA.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 08:16:29 +01:00
Benjamin Admin
34680732f8 Add IPA and syllable mode toggles, fix false IPA on German documents
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 2m1s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 15s
Backend: Remove en_col_type fallback heuristic (longest avg text) that
incorrectly identified German columns as English. IPA now only applied
when OCR bracket patterns are actually found. Add ipa_mode (auto/all/none)
and syllable_mode (auto/all/none) query params to build-grid API.

Frontend: Add IPA and Silben dropdown selects to GridToolbar. Modes
are passed as query params on rebuild. Auto = current smart detection,
All = force for all words, Aus = skip entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 08:04:44 +01:00
Benjamin Admin
7fbcae954b fix: auto-trigger orientation for page-split sessions without result
Page-split sessions (start_step=1) have no orientation_result stored.
StepOrientation now auto-runs orientation detection when loading an
existing session that lacks a result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 17:19:56 +01:00
Benjamin Admin
f931091b57 refactor: independent sessions for page-split + URL-based pipeline navigation
Page-split now creates independent sessions (no parent_session_id),
parent marked as status='split' and hidden from list. Navigation uses
useSearchParams for URL-based step tracking (browser back/forward works).
page.tsx reduced from 684 to 443 lines via usePipelineNavigation hook.

Box sub-sessions (column detection) remain unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 17:05:33 +01:00
Benjamin Admin
f34340de9c Fix sub-session completion flow: navigate to next incomplete sub-session
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m51s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 15s
Instead of returning to parent (which creates a redirect loop), the
handleNext function now finds the next incomplete sub-session and opens
it directly. When all sub-sessions are done, returns to session list.

Also fixes openSession auto-redirect to prefer the first incomplete
sub-session over the most advanced one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 16:33:56 +01:00
Benjamin Admin
55de6c21d2 Fix session resume: auto-open most advanced sub-session on parent click
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m46s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 15s
When reopening a parent session that has page-split sub-sessions,
the UI was showing the parent's pipeline step (always step 1/Orientation)
instead of navigating to the sub-sessions. Now automatically opens the
most advanced sub-session, matching the behavior of handleOrientationComplete.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 16:04:53 +01:00
Benjamin Admin
424e5c51d4 fix: remove nested scrollbar in grid editor
Removed overflow-y-auto and maxHeight from the grid container div.
The page itself handles scrolling — nested scroll containers caused
the bottom rows to be cut off after editing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 15:06:28 +01:00
Benjamin Admin
d9b2aa82e9 fix: CV-gated syllable insertion + grid editor scroll
1. Syllable dividers now require CV validation: morphological vertical
   line detection checks if word_box image actually shows thin isolated
   pipe lines before applying pyphen. Only first word per cell gets
   pipes (matching dictionary print layout).

2. Grid editor scroll: changed maxHeight from 80vh to calc(100vh-200px)
   so editor remains scrollable after edits.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 14:31:16 +01:00
Benjamin Admin
19a5f69272 fix: make Grid Editor vertically scrollable so all rows are visible
The right panel (grid area) had no vertical overflow handling, causing
the last ~5 rows to be clipped and invisible. Added overflow-y-auto
with max-height 80vh, and removed overflow-hidden from the GridTable
wrapper that was cutting off content.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 13:33:52 +01:00