breakpilot-lehrer

Author	SHA1	Message	Date
Benjamin Admin	d4353d76fb	SpreadsheetView: multi-sheet tabs instead of unified single sheet CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 36s Details CI / test-go-edu-search (push) Successful in 36s Details CI / test-python-klausur (push) Failing after 2m21s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 31s Details Each zone becomes its own Excel sheet tab with independent column widths: - Sheet "Vokabeln": main content zone with EN/DE/example columns - Sheet "Pounds and euros": Box 1 with its own 4-column layout - Sheet "German leihen": Box 2 with single column for flowing text This solves the column-width conflict: boxes have different column widths optimized for their content, which is impossible in a single unified sheet (Excel limitation: column width is per-column, not per-cell). Sheet tabs visible at bottom (showSheetTabs: true). Box sheets get colored tab (from box_bg_hex). First sheet active by default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:51:21 +02:00
Benjamin Admin	b42f394833	Integrate Fortune Sheet spreadsheet editor in StepAnsicht CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 36s Details CI / test-go-edu-search (push) Successful in 31s Details CI / test-python-klausur (push) Failing after 2m40s Details CI / test-python-agent-core (push) Successful in 32s Details CI / test-nodejs-website (push) Successful in 33s Details Install @fortune-sheet/react (MIT, v1.0.4) as Excel-like spreadsheet component. New SpreadsheetView.tsx converts unified grid data to Fortune Sheet format (celldata, merge config, column/row sizes). StepAnsicht now has Spreadsheet/Grid toggle: - Spreadsheet mode: full Fortune Sheet with toolbar (bold, italic, color, borders, merge cells, text wrap, undo/redo) - Grid mode: existing GridTable for quick editing Box-origin cells get light tinted background in spreadsheet view. Colspan cells converted to Fortune Sheet merge format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:08:03 +02:00
Benjamin Admin	c1a903537b	Unified Grid: merge all zones into single Excel-like grid CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 32s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-klausur (push) Failing after 2m35s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 33s Details Backend (unified_grid.py): - build_unified_grid(): merges content + box zones into one zone - Dominant row height from median of content row spacings - Full-width boxes: rows integrated directly - Partial-width boxes: extra rows inserted when box has more text lines than standard rows fit (e.g., 7 lines in 5-row height) - Box-origin cells tagged with source_zone_type + box_region metadata Backend (grid_editor_api.py): - POST /sessions/{id}/build-unified-grid → persists as unified_grid_result - GET /sessions/{id}/unified-grid → retrieve persisted result Frontend: - GridEditorCell: added source_zone_type, box_region fields - GridTable: box-origin cells get tinted background + left border - StepAnsicht: split-view with original image (left) + editable unified GridTable (right). Auto-builds on first load. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 23:37:55 +02:00
Benjamin Admin	7085c87618	StepAnsicht: dominant row height for content + proportional box rows CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 33s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m35s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 31s Details Content sections: use dominant (median) row height from all content rows instead of per-section average. This ensures uniform row height above and below boxes (the standard case on textbook pages). Box sections: distribute height proportionally by text line count per row. A header (1 line) gets 1/7 of box height, a bullet with 3 lines gets 3/7. Fixes Box 2 where row 3 was cut off because even distribution didn't account for multi-line cells. Removed overflow:hidden from box container to prevent clipping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 17:43:02 +02:00
Benjamin Admin	1b7e095176	StepAnsicht: fix row filtering for partial-width boxes CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 45s Details CI / test-go-edu-search (push) Successful in 27s Details CI / test-python-klausur (push) Failing after 2m34s Details CI / test-python-agent-core (push) Successful in 32s Details CI / test-nodejs-website (push) Successful in 36s Details Content rows were incorrectly filtered out when their Y overlapped with a box, even if the box only covered the right half of the page. Now checks both Y AND X overlap — rows are only excluded if they start within the box's horizontal range. Fixes: rows next to Box 2 (lend, coconut, taste) were missing from reconstruction because Box 2 (x=871, w=525) only covers the right side, but left-side content rows at x≈148 were being filtered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 17:00:28 +02:00
Benjamin Admin	dcb873db35	StepAnsicht: section-based layout with averaged row heights CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 38s Details CI / test-go-edu-search (push) Successful in 38s Details CI / test-python-klausur (push) Failing after 2m28s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 40s Details Major rewrite of reconstruction rendering: - Page split into vertical sections (content/box) around box boundaries - Content sections: uniform row height = (last_row - first_row) / (n-1) - Box sections: rows evenly distributed within box height - Content rows positioned absolutely at original y-coordinates - Font size derived from row height (55% of row height) - Multi-line cells (bullets) get expanded height with indentation - Boxes render at exact bbox position with colored border - Preparation for unified grid where boxes become part of main grid Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:29:40 +02:00
Benjamin Admin	fd39d13d06	StepAnsicht: use server-rendered OCR overlay image CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 40s Details CI / test-go-edu-search (push) Successful in 41s Details CI / test-python-klausur (push) Failing after 2m38s Details CI / test-python-agent-core (push) Successful in 32s Details CI / test-nodejs-website (push) Successful in 24s Details Replace manual word_box positioning (wild/unsnapped) with the server-rendered words-overlay image from the OCR step endpoint. This shows the same cleanly snapped red letters as the OCR step. Endpoint: /sessions/{id}/image/words-overlay Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 23:26:54 +02:00
Benjamin Admin	c5733a171b	StepAnsicht: fix font size and row spacing to match original CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 43s Details CI / test-go-edu-search (push) Successful in 40s Details CI / test-nodejs-website (push) Has been cancelled Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-python-klausur (push) Has been cancelled Details - Font: use font_size_suggestion_px * scale directly (removed 0.85 factor) - Row height: calculate from row-to-row spacing (y_min of next row minus y_min of current row) instead of text height (y_max - y_min). This produces correct line spacing matching the original layout. - Multi-line cells: height multiplied by line count Content zone should now span from ~250 to ~2050 matching the original. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 23:24:27 +02:00
Benjamin Admin	18213f0bde	StepAnsicht: split-view with coordinate grid for comparison CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 45s Details CI / test-go-edu-search (push) Successful in 40s Details CI / test-python-klausur (push) Failing after 2m37s Details CI / test-python-agent-core (push) Successful in 32s Details CI / test-nodejs-website (push) Successful in 36s Details Left panel: Original scan + OCR word overlay (red text at exact word_box positions) + coordinate grid Right panel: Reconstructed layout + same coordinate grid Features: - Coordinate grid toggle with 50/100/200px spacing options - Grid lines labeled with pixel coordinates in original image space - Both panels share the same scale for direct visual comparison - OCR overlay shows detected text in red mono font at original positions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 23:00:22 +02:00
Benjamin Admin	cd8eb6ce46	Add Ansicht step (Step 12) — read-only page layout preview CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 39s Details CI / test-go-edu-search (push) Successful in 49s Details CI / test-python-klausur (push) Failing after 2m33s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 36s Details New pipeline step showing the reconstructed page with all zones positioned at their original coordinates: - Content zones with vocabulary grid cells - Box zones with colored borders (from structure detection) - Colspan cells rendered across multiple columns - Multi-line cells (bullets) with pre-wrap whitespace - Toggle to overlay original scan image at 15% opacity - Proportionally scaled to viewport width - Pure CSS positioning (no canvas/Fabric.js) Pipeline: 14 steps (0-13), Ground Truth moved to Step 13. Added colspan field to GridEditorCell type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 22:42:33 +02:00
Benjamin Admin	2c2bdf903a	Fix GridTable: replace ternary chain with IIFE for cell rendering CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 44s Details CI / test-go-edu-search (push) Successful in 36s Details CI / test-python-klausur (push) Failing after 2m28s Details CI / test-python-agent-core (push) Successful in 36s Details CI / test-nodejs-website (push) Successful in 31s Details Chained ternary (colored ? div : multiline ? textarea : input) caused webpack SWC parser issues. Replaced with IIFE {(() => { if/return })()} which is more robust and readable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 18:10:22 +02:00
Benjamin Admin	947ff6bdcb	Fix JSX ternary nesting for textarea/input in GridTable CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 41s Details CI / test-go-edu-search (push) Successful in 41s Details CI / test-python-klausur (push) Failing after 2m32s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 28s Details Remove extra curly braces around the textarea/input ternary that caused webpack syntax error. The ternary is now a chained condition: hasColoredWords ? <div> : text.includes('\n') ? <textarea> : <input> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 18:02:22 +02:00
Benjamin Admin	92e4021898	Fix GridTable JSX syntax error in colspan rendering CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 31s Details CI / test-go-edu-search (push) Successful in 42s Details CI / test-python-klausur (push) Failing after 2m43s Details CI / test-python-agent-core (push) Successful in 33s Details CI / test-nodejs-website (push) Successful in 39s Details Mismatched closing tags from previous colspan edit caused webpack build failure. Cleaned up spanning cell map() return structure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 17:52:26 +02:00
Benjamin Admin	108f1b1a2a	GridTable: render multi-line cells with textarea CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 46s Details CI / test-go-edu-search (push) Successful in 46s Details CI / test-python-klausur (push) Failing after 2m53s Details CI / test-python-agent-core (push) Successful in 32s Details CI / test-nodejs-website (push) Successful in 34s Details Cells containing \n (bullet items with continuation lines) now use <textarea> instead of <input type=text>, making all lines visible. Row height auto-expands based on line count in the cell. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 17:17:29 +02:00
Benjamin Admin	48de4d98cd	Fix infinite loop in StepBoxGridReview auto-build CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 41s Details CI / test-go-edu-search (push) Successful in 35s Details CI / test-python-klausur (push) Failing after 2m41s Details CI / test-python-agent-core (push) Successful in 37s Details CI / test-nodejs-website (push) Successful in 35s Details Auto-build was triggering on every grid.zones.length change, which happens on every rebuild (zone indices increment). Now uses a ref to ensure auto-build fires only once. Also removed boxZones.length===0 condition that could trigger unnecessary builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 17:06:11 +02:00
Benjamin Admin	b5900f1aff	Bullet indentation detection: group continuation lines into bullets CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 45s Details CI / test-go-edu-search (push) Successful in 41s Details CI / test-python-klausur (push) Failing after 2m49s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 34s Details Flowing/bullet_list layout now analyzes left-edge indentation: - Lines at minimum indent = bullet start / main level - Lines indented >15px more = continuation (belongs to previous bullet) - Continuation lines merged with \n into parent bullet cell - Missing bullet markers (•) auto-added when pattern is clear Example: 7 OCR lines → 3 items (1 header + 2 bullets × 3 lines each) "German leihen" header, then two bullet groups with indented examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:57:16 +02:00
Benjamin Admin	baac98f837	Filter false-positive boxes in header/footer margins CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 55s Details CI / test-go-edu-search (push) Successful in 1m0s Details CI / test-python-klausur (push) Failing after 2m35s Details CI / test-python-agent-core (push) Successful in 27s Details CI / test-nodejs-website (push) Successful in 27s Details Boxes whose vertical center falls within top/bottom 7% of image height are filtered out (page numbers, unit headers, running footers). At typical scan resolutions, 7% ≈ 2.5cm margin. Fixes: "Box 1" containing just "3" from "Unit 3" page header being incorrectly treated as an embedded box. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 14:38:53 +02:00
Benjamin Admin	496d34d822	Fix box empty rows: add x_min_px/x_max_px to flowing/header columns CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 55s Details CI / test-go-edu-search (push) Successful in 51s Details CI / test-python-klausur (push) Failing after 2m7s Details CI / test-python-agent-core (push) Successful in 26s Details CI / test-nodejs-website (push) Successful in 31s Details GridTable calculates column widths from col.x_max_px - col.x_min_px. Flowing and header_only layouts were missing these fields, producing NaN widths which collapsed the CSS grid layout and showed empty rows with only row numbers visible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 13:01:11 +02:00
Benjamin Admin	709e41e050	GridTable: support partial colspan (2-of-4 columns) CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 39s Details CI / test-go-edu-search (push) Successful in 40s Details CI / test-python-klausur (push) Failing after 2m16s Details CI / test-python-agent-core (push) Successful in 28s Details CI / test-nodejs-website (push) Successful in 31s Details Previously GridTable only supported full-row spanning (one cell across all columns). Now renders each spanning_header cell with its actual colspan, positioned at the correct grid column. This allows rows like "In Britain..." (colspan=2) + "In Germany..." (colspan=2) to render side by side instead of only showing the first cell. Also fix box row fields: is_header always set (was undefined for flowing/bullet_list), y_min_px/y_max_px for header_only rows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:47:14 +02:00
Benjamin Admin	7b3e8c576d	Fix NameError: span_cells removed but still referenced in log CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 43s Details CI / test-go-edu-search (push) Successful in 51s Details CI / test-python-klausur (push) Failing after 2m42s Details CI / test-python-agent-core (push) Successful in 39s Details CI / test-nodejs-website (push) Successful in 38s Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:20:11 +02:00
Benjamin Admin	868f99f109	Fix colspan text + box row fields for GridTable compatibility CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 41s Details CI / test-python-klausur (push) Failing after 2m49s Details CI / test-python-agent-core (push) Successful in 42s Details CI / test-nodejs-website (push) Successful in 33s Details Colspan: use original word-block text instead of split cell texts. Prevents "euros a nd cents" from split_cross_column_words. Box rows: add is_header field (was undefined, causing GridTable rendering issues). Add y_min_px/y_max_px to header_only rows. These missing fields caused empty rows with only row numbers visible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:08:49 +02:00
Benjamin Admin	dc25f243a4	Fix colspan: use original words before split_cross_column_words CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 47s Details CI / test-python-klausur (push) Failing after 2m33s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 35s Details _split_cross_column_words was destroying the colspan information by cutting word-blocks at column boundaries BEFORE _detect_colspan_cells could analyze them. Now passes original (pre-split) words to colspan detection while using split words for cell building. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:58:32 +02:00
Benjamin Admin	c62ff7cd31	Generic colspan detection for merged cells in grids and boxes CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 33s Details CI / test-go-edu-search (push) Successful in 38s Details CI / test-python-klausur (push) Failing after 2m45s Details CI / test-python-agent-core (push) Successful in 38s Details CI / test-nodejs-website (push) Successful in 34s Details New _detect_colspan_cells() in grid_editor_helpers.py: - Runs after _build_cells() for every zone (content + box) - Detects word-blocks that extend across column boundaries - Merges affected cells into spanning_header with colspan=N - Uses column midpoints to determine which columns are covered - Works for full-page scans and box zones equally Also fixes box flowing/bullet_list row height fields (y_min_px/y_max_px). Removed duplicate spanning logic from cv_box_layout.py — now uses the generic _detect_colspan_cells from grid_editor_helpers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:38:03 +02:00
Benjamin Admin	5d91698c3b	Fix box grid: row height fields + spanning cell detection CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 46s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m36s Details CI / test-python-agent-core (push) Successful in 33s Details CI / test-nodejs-website (push) Successful in 37s Details Box 3 empty rows: flowing/bullet_list rows were missing y_min_px/ y_max_px fields that GridTable uses for row height calculation. Added _px and _pct variants. Box 2 spanning cells: rows with fewer word-blocks than columns (e.g., "In Britain..." spanning 2 columns) are now detected and merged into spanning_header cells. GridTable already renders spanning_header cells across the full row width. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 09:46:43 +02:00
Benjamin Admin	5fa5767c9a	Fix box column detection: use low gap_threshold for small zones CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 39s Details CI / test-python-klausur (push) Failing after 2m48s Details CI / test-python-agent-core (push) Successful in 38s Details CI / test-nodejs-website (push) Successful in 30s Details PaddleOCR returns multi-word blocks (whole phrases), so ALL inter-word gaps in small zones (boxes, ≤60 words) are column boundaries. Previous 3x-median approach produced thresholds too high to detect real columns. New approach for small zones: gap_threshold = max(median_h * 1.0, 25). This correctly detects 4 columns in "Pounds and euros" box where gaps range from 50-297px and word height is ~31px. Also includes SmartSpellChecker fixes from previous commits: - Frequency-based scoring, IPA protection, slash→l, rare-word threshold Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 07:55:29 +02:00
Benjamin Admin	693803fb7c	SmartSpellChecker: frequency scoring, IPA protection, slash→l fix CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 42s Details CI / test-python-klausur (push) Failing after 2m55s Details CI / test-python-agent-core (push) Successful in 37s Details CI / test-nodejs-website (push) Successful in 31s Details Major improvements: - Frequency-based boundary repair: always tries repair, uses word frequency product to decide (Pound sand→Pounds and: 2000x better) - IPA bracket protection: words inside [brackets] are never modified, even when brackets land in tokenizer separators - Slash→l substitution: "p/" → "pl" for italic l misread as slash - Abbreviation guard uses rare-word threshold (freq < 1e-6) instead of binary known/unknown — prevents "Can I" → "Ca nI" while still fixing "ats th." → "at sth." - Tokenizer includes / character for slash-word detection 43 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 07:36:39 +02:00
Benjamin Admin	31089df36f	SmartSpellChecker: frequency-based boundary repair for valid word pairs CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 43s Details CI / test-go-edu-search (push) Successful in 40s Details CI / test-python-klausur (push) Failing after 2m42s Details CI / test-python-agent-core (push) Successful in 37s Details CI / test-nodejs-website (push) Successful in 35s Details Previously, boundary repair was skipped when both words were valid dictionary words (e.g., "Pound sand", "wit hit", "done euro"). Now uses word-frequency scoring (product of bigram frequencies) to decide if the repair produces a more common word pair. Threshold: repair accepted when new pair is >5x more frequent, or when repair produces a known abbreviation. New fixes: Pound sand→Pounds and (2000x), wit hit→with it (100000x), done euro→one euro (7x). 43 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 07:00:22 +02:00
Benjamin Admin	7b294f9150	Cap gap_threshold at 25% of zone_w for column detection CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 46s Details CI / test-go-edu-search (push) Successful in 52s Details CI / test-python-klausur (push) Failing after 2m51s Details CI / test-python-agent-core (push) Successful in 40s Details CI / test-nodejs-website (push) Successful in 34s Details In small zones (boxes), intra-phrase gaps inflate the median gap, causing gap_threshold to become too large to detect real column boundaries. Cap at 25% of zone width to prevent this. Example: Box "Pounds and euros" has 4 columns at x≈148,534,751,1137 but gap_threshold was 531 (larger than the column gaps themselves). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 23:58:15 +02:00
Benjamin Admin	8b29d20940	StepBoxGridReview: show box border color from structure detection CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 41s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-klausur (push) Failing after 2m46s Details CI / test-python-agent-core (push) Successful in 35s Details CI / test-nodejs-website (push) Successful in 35s Details - Use box_bg_hex for border color (from Step 7 structure detection) - Numbered color badges per box - Show color name in box header - Add box_bg_color/box_bg_hex to GridZone type Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 23:18:36 +02:00
Benjamin Admin	12b194ad1a	Fix StepBoxGridReview: match GridTable props interface CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 46s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m50s Details CI / test-python-agent-core (push) Successful in 37s Details CI / test-nodejs-website (push) Successful in 38s Details GridTable expects zone (singular), onSelectCell, onCellTextChange, onToggleColumnBold, onToggleRowHeader, onNavigate — not the incorrect prop names from the first version. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 22:39:38 +02:00
Benjamin Admin	058eadb0e4	Fix build-box-grids: use structure_result boxes + raw OCR words CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 48s Details CI / test-go-edu-search (push) Successful in 44s Details CI / test-python-klausur (push) Failing after 2m47s Details CI / test-python-agent-core (push) Successful in 33s Details CI / test-nodejs-website (push) Successful in 36s Details - Source boxes from structure_result (Step 7) instead of grid zones - Use raw_paddle_words (top/left/width/height) instead of grid cells - Create new box zones from all detected boxes (not just existing zones) - Sort zones by y-position for correct reading order - Include box background color metadata Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 21:50:28 +02:00
Benjamin Admin	5da9a550bf	Add Box-Grid-Review step (Step 11) to OCR pipeline CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 44s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m52s Details CI / test-python-agent-core (push) Successful in 36s Details CI / test-nodejs-website (push) Successful in 37s Details New pipeline step between Gutter Repair and Ground Truth that processes embedded boxes (grammar tips, exercises) independently from the main grid. Backend: - cv_box_layout.py: classify_box_layout() detects flowing/columnar/ bullet_list/header_only layout types per box - build_box_zone_grid(): layout-aware grid building (single-column for flowing text, independent columns for tabular content) - POST /sessions/{id}/build-box-grids endpoint with SmartSpellChecker - Layout type overridable per box via request body Frontend: - StepBoxGridReview.tsx: shows each box with cropped image + editable GridTable. Layout type dropdown per box. Auto-builds on first load. - Auto-skip when no boxes detected on page - Pipeline steps updated: 13 steps (0-12), Ground Truth moved to 12 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:26:06 +02:00
Benjamin Admin	52637778b9	SmartSpellChecker: boundary repair + context split + abbreviation awareness CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 51s Details CI / test-go-edu-search (push) Successful in 47s Details CI / test-python-klausur (push) Failing after 2m54s Details CI / test-python-agent-core (push) Successful in 35s Details CI / test-nodejs-website (push) Successful in 35s Details New features: - Boundary repair: "ats th." → "at sth." (shifted OCR word boundaries) Tries shifting 1-2 chars between adjacent words, accepts if result includes a known abbreviation or produces better dictionary matches - Context split: "anew book" → "a new book" (ambiguous word merges) Explicit allow/deny list for article+word patterns (alive, alone, etc.) - Abbreviation awareness: 120+ known abbreviations (sth, sb, adj, etc.) are now recognized as valid words, preventing false corrections - Quality gate: boundary repairs only accepted when result scores higher than original (known words + abbreviations) 40 tests passing, all edge cases covered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 15:41:17 +02:00
Benjamin Admin	f6372b8c69	Integrate SmartSpellChecker into build-grid finalization CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 47s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m45s Details CI / test-python-agent-core (push) Successful in 36s Details CI / test-nodejs-website (push) Successful in 40s Details SmartSpellChecker now runs during grid build (not just LLM review), so corrections are visible immediately in the grid editor. Language detection per column: - EN column detected via IPA signals (existing logic) - All other columns assumed German for vocab tables - Auto-detection for single/two-column layouts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:54:01 +02:00
Benjamin Admin	909d0729f6	Add SmartSpellChecker + refactor vocab-worksheet page.tsx CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 45s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m51s Details CI / test-python-agent-core (push) Successful in 36s Details CI / test-nodejs-website (push) Successful in 37s Details SmartSpellChecker (klausur-service): - Language-aware OCR post-correction without LLMs - Dual-dictionary heuristic for EN/DE language detection - Context-based a/I disambiguation via bigram lookup - Multi-digit substitution (sch00l→school) - Cross-language guard (don't false-correct DE words in EN column) - Umlaut correction (Schuler→Schüler, uber→über) - Integrated into spell_review_entries_sync() pipeline - 31 tests, 9ms/100 corrections Vocab-worksheet refactoring (studio-v2): - Split 2337-line page.tsx into 14 files - Custom hook useVocabWorksheet.ts (all state + logic) - 9 components in components/ directory - types.ts, constants.ts for shared definitions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 12:25:01 +02:00
Benjamin Admin	04fa01661c	Move IPA/syllable toggles to vocabulary tab toolbar CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 49s Details CI / test-go-edu-search (push) Successful in 43s Details CI / test-python-klausur (push) Failing after 2m51s Details CI / test-python-agent-core (push) Successful in 34s Details CI / test-nodejs-website (push) Successful in 36s Details Dropdowns are now in the vocabulary table header (after processing), not in the worksheet settings (before processing). Changing a mode automatically reprocesses all successful pages with the new settings. Same dropdown options as the OCR pipeline grid editor. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:17:14 +02:00
Benjamin Admin	bf9d24e108	Replace IPA/syllable checkboxes with full dropdowns in vocab-worksheet CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 47s Details CI / test-go-edu-search (push) Successful in 47s Details CI / test-python-klausur (push) Failing after 2m41s Details CI / test-python-agent-core (push) Successful in 39s Details CI / test-nodejs-website (push) Successful in 42s Details Vocab worksheet now has the same IPA/syllable mode options as the OCR pipeline grid editor: Auto, nur EN, nur DE, Alle, Aus. Previously only had on/off checkboxes mapping to auto/none. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:10:22 +02:00
Benjamin Admin	0f17eb3cd9	Fix IPA:Aus — strip all brackets before skipping IPA block CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 49s Details CI / test-go-edu-search (push) Successful in 35s Details CI / test-python-klausur (push) Failing after 2m53s Details CI / test-nodejs-website (push) Has been cancelled Details CI / test-python-agent-core (push) Has started running Details When ipa_mode=none, the entire IPA processing block was skipped, including the bracket-stripping logic. Now strips ALL square brackets from content columns BEFORE the skip, so IPA:Aus actually removes all IPA from the display. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:05:22 +02:00
Benjamin Admin	5244e10728	Fix IPA/syllable race condition: loadGrid no longer depends on buildGrid CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 43s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-klausur (push) Failing after 2m55s Details CI / test-python-agent-core (push) Successful in 35s Details CI / test-nodejs-website (push) Has been cancelled Details loadGrid depended on buildGrid (for 404 fallback), which depended on ipaMode/syllableMode. Every mode change created a new loadGrid ref, triggering StepGridReview's useEffect to load the OLD saved grid, overwriting the freshly rebuilt one. Now loadGrid only depends on sessionId. The 404 fallback builds inline with current modes. Mode changes are handled exclusively by the separate rebuild useEffect. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 09:59:49 +02:00
Benjamin Admin	a6c5f56003	Fix IPA strip: match all square brackets, not just Unicode IPA CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 45s Details CI / test-go-edu-search (push) Successful in 41s Details CI / test-python-klausur (push) Failing after 2m49s Details CI / test-python-agent-core (push) Successful in 29s Details CI / test-nodejs-website (push) Successful in 23s Details OCR text contains ASCII IPA approximations like [kompa'tifn] instead of Unicode [kˈɒmpətɪʃən]. The strip regex required Unicode IPA chars inside brackets and missed the ASCII ones. Now strips all [bracket] content from excluded columns since square brackets in vocab columns are always IPA. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 09:53:16 +02:00
Benjamin Admin	584e07eb21	Strip English IPA when mode excludes EN (nur DE / Aus) CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 47s Details CI / test-go-edu-search (push) Successful in 46s Details CI / test-python-agent-core (push) Has been cancelled Details CI / test-nodejs-website (push) Has been cancelled Details CI / test-python-klausur (push) Has been cancelled Details English IPA from the original OCR scan (e.g. [ˈgrænˌdæd]) was always shown because fix_cell_phonetics only ADDS/CORRECTS but never removes. Now strips IPA brackets containing Unicode IPA chars from the EN column when ipa_mode is "de" or "none". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 09:49:22 +02:00
Benjamin Admin	54b1c7d7d7	Fix IPA/syllable first-click not working (off-by-one in initialLoadDone) CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 46s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-klausur (push) Failing after 2m52s Details CI / test-python-agent-core (push) Successful in 36s Details CI / test-nodejs-website (push) Successful in 38s Details The old guard checked if grid was loaded AND set initialLoadDone in the same pass, then returned without rebuilding. This meant the first user-triggered mode change was always swallowed. Simplified to a mount-skip ref: skip exactly the first useEffect trigger (component mount), rebuild on every subsequent trigger (user changes). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 09:40:57 +02:00
Benjamin Admin	d8a2331038	Fix IPA/syllable mode change requiring double-click CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 47s Details CI / test-go-edu-search (push) Successful in 45s Details CI / test-python-klausur (push) Failing after 2m58s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 38s Details The useEffect for mode changes called buildGrid() which was a useCallback closing over stale ipaMode/syllableMode values due to React's asynchronous state batching. The first click triggered a rebuild with the OLD mode; only the second click used the new one. Now inlines the API call directly in the useEffect, reading ipaMode and syllableMode from the effect's closure which always has the current values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 09:32:02 +02:00
Benjamin Admin	ad78e26143	Fix word-split: handle IPA brackets, contractions, and tiebreaker CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 47s Details CI / test-go-edu-search (push) Successful in 46s Details CI / test-python-klausur (push) Failing after 2m57s Details CI / test-python-agent-core (push) Successful in 36s Details CI / test-nodejs-website (push) Successful in 41s Details 1. Strip IPA brackets [ipa] before attempting word split, so "makeadecision[dɪsˈɪʒən]" is processed as "makeadecision" 2. Handle contractions: "solet's" → split "solet" → "so let" + "'s" 3. DP tiebreaker: prefer longer first word when scores are equal ("task is" over "ta skis") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 09:13:02 +02:00
Benjamin Admin	4f4e6c31fa	Fix word-split tiebreaker: prefer longer first word CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 47s Details CI / test-go-edu-search (push) Successful in 39s Details CI / test-python-klausur (push) Failing after 2m44s Details CI / test-python-agent-core (push) Successful in 31s Details CI / test-nodejs-website (push) Successful in 35s Details "taskis" was split as "ta skis" instead of "task is" because both have the same DP score. Changed comparison from > to >= so that later candidates (with longer first words) win ties. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 09:05:14 +02:00
Benjamin Admin	7ffa4c90f9	Lower word-split threshold from 7 to 4 chars CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 50s Details CI / test-go-edu-search (push) Successful in 46s Details CI / test-python-klausur (push) Failing after 2m48s Details CI / test-python-agent-core (push) Successful in 37s Details CI / test-nodejs-website (push) Successful in 38s Details Short merged words like "anew" (a new), "Imadea" (I made a), "makeadecision" (make a decision) were missed because the split threshold was too high. Now processes tokens >= 4 chars. English single-letter words (a, I) are already handled by the DP algorithm which allows them as valid split points. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 08:59:02 +02:00
Benjamin Admin	656cadbb1e	Remove page-number footers from grid, promote to metadata CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 47s Details CI / test-go-edu-search (push) Successful in 40s Details CI / test-python-klausur (push) Failing after 2m55s Details CI / test-python-agent-core (push) Successful in 30s Details CI / test-nodejs-website (push) Successful in 37s Details Footer rows that are page numbers (digits or written-out like "two hundred and nine") are now removed from the grid entirely and promoted to the page_number metadata field. Non-page-number footer content stays as a visible footer row. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 08:50:20 +02:00
Benjamin Admin	757c8460c9	Detect written-out page numbers as footer rows CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 47s Details CI / test-go-edu-search (push) Successful in 44s Details CI / test-python-klausur (push) Failing after 2m46s Details CI / test-python-agent-core (push) Successful in 32s Details CI / test-nodejs-website (push) Successful in 39s Details "two hundred and nine" (22 chars) was kept as a content row because the footer detection only accepted text ≤20 chars. Now recognizes written-out number words (English + German) as page numbers regardless of length. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 08:39:43 +02:00
Benjamin Admin	501de4374a	Keep page references as visible column cells CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 41s Details CI / test-python-klausur (push) Failing after 2m49s Details CI / test-python-agent-core (push) Successful in 37s Details CI / test-nodejs-website (push) Successful in 35s Details Step 5g was extracting page refs (p.55, p.70) as zone metadata and removing them from the cell table. Users want to see them as a separate column. Now keeps cells in place while still extracting metadata for the frontend header display. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 08:27:44 +02:00
Benjamin Admin	774bbc50d3	Add debug logging for empty-column-removal CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 43s Details CI / test-go-edu-search (push) Successful in 54s Details CI / test-python-klausur (push) Failing after 2m53s Details CI / test-python-agent-core (push) Successful in 39s Details CI / test-nodejs-website (push) Successful in 39s Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 22:45:22 +02:00

1 2 3 4 5 ...

587 Commits