Commit Graph

19 Commits

Author SHA1 Message Date
Benjamin Admin
f860eb66e6 Add German IPA support (wiki-pronunciation-dict + epitran)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 33s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m12s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 17s
Hybrid approach mirroring English IPA:
- Primary: wiki-pronunciation-dict (636k entries, CC-BY-SA, Wiktionary)
- Fallback: epitran rule-based G2P (MIT license)

IPA modes now use language-appropriate dictionaries:
- auto/en: English IPA (Britfone + eng_to_ipa)
- de: German IPA (wiki-pronunciation-dict + epitran)
- all: EN column gets English IPA, other columns get German IPA
- none: disabled

Frontend shows CC-BY-SA attribution when German IPA is active.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 22:18:20 +01:00
Benjamin Admin
47e83d90bd Remove IPA:DE option — no German IPA dictionary available
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m54s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 16s
Our IPA system only has English dictionaries (Britfone MIT, eng_to_ipa
MIT). The "IPA: nur DE" option was useless at best and misleading.
Removed from dropdown, type definition, and API validation.

Syllable DE mode stays — pyphen has a German hyphenation dictionary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 21:53:43 +01:00
Benjamin Admin
256df820cd Auto-rebuild grid when IPA or syllable mode dropdown changes
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m0s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 15s
The dropdowns only updated state but didn't trigger buildGrid().
Now a useEffect watches ipaMode/syllableMode and rebuilds
automatically (skipping the initial mount).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 20:43:20 +01:00
Benjamin Admin
83c058e400 Add language-specific IPA and syllable modes (de/en)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m50s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 15s
Extend ipa_mode and syllable_mode toggles with language options:
- auto: smart detection (default)
- en: only English headword column
- de: only German definition columns
- all: all content columns
- none: skip entirely

Also improve English column auto-detection: use garbled IPA patterns
(apostrophes, colons) in addition to bracket patterns. This correctly
identifies English dictionary pages where OCR produces garbled ASCII
instead of bracket IPA.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 08:16:29 +01:00
Benjamin Admin
34680732f8 Add IPA and syllable mode toggles, fix false IPA on German documents
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 2m1s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 15s
Backend: Remove en_col_type fallback heuristic (longest avg text) that
incorrectly identified German columns as English. IPA now only applied
when OCR bracket patterns are actually found. Add ipa_mode (auto/all/none)
and syllable_mode (auto/all/none) query params to build-grid API.

Frontend: Add IPA and Silben dropdown selects to GridToolbar. Modes
are passed as query params on rebuild. Auto = current smart detection,
All = force for all words, Aus = skip entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 08:04:44 +01:00
Benjamin Admin
9d34c5201e feat(grid-editor): add manual cell color control via right-click menu
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 23s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m53s
CI / test-python-agent-core (push) Successful in 13s
CI / test-nodejs-website (push) Successful in 15s
Users can now right-click any cell to set text color (red, green, blue,
orange, purple, black) or remove the color bar without changing text.
A "reset" option restores the OCR-detected color. This enables accurate
Ground Truth marking when OCR assigns colors to wrong cells.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 08:51:18 +01:00
Benjamin Admin
d54814fa70 feat: color bar respects edits + column pattern auto-correction
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 23s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m56s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 15s
- Color bar (red/colored indicator) now only shows when word_boxes
  text still matches the cell text — editing the cell hides stale colors
- New "Auto-Korrektur" button: detects dominant prefix+number patterns
  per column (e.g. p.70, p.71) and completes partial entries (.65 → p.65)
  — requires 3+ matching entries before correcting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 08:38:11 +01:00
Benjamin Admin
ee0d9c881e fix: column resize handle now accessible above add/delete buttons
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 2m1s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 15s
Resize handle: wider (9px), z-40 (above z-30 buttons).
Add-column button moved to bottom-right corner to avoid overlap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 08:20:04 +01:00
Benjamin Admin
65f4ce1947 feat: ImageLayoutEditor, arrow-key nav, multi-select bold, wider columns
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m52s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 18s
- New ImageLayoutEditor: SVG overlay on original scan with draggable
  column dividers, horizontal guidelines (margins/header/footer),
  double-click to add columns, x-button to delete
- GridTable: MIN_COL_WIDTH 40→80px for better readability
- Arrow up/down keys navigate between rows in the grid editor
- Ctrl+Click for multi-cell selection, Ctrl+B to toggle bold on selection
- getAdjacentCell works for cells that don't exist yet (new rows/cols)
- deleteColumn now merges x-boundaries correctly
- Session restore fix: grid_editor_result/structure_result in session GET
- Footer row 3-state cycle, auto-create cells for empty footer rows
- Grid save/build/GT-mark now advance current_step=11

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 07:45:39 +01:00
Benjamin Admin
4e668660a7 feat: add Woerterbuch category + column add/delete in grid editor
- New document category "Woerterbuch" (frontend type + backend validation)
- Column delete: hover column header → red "x" button (with confirmation)
- Column add: hover column header → "+" button inserts after that column
- Both operations support undo/redo, update cell IDs and summary
- Available in both GridEditor and StepGridReview (Kombi last step)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 16:27:12 +01:00
Benjamin Admin
bc1804ad18 Fix vsplit side-by-side rendering: invalid TypeScript type annotation
Changed `typeof grid.zones[][]` to `GridZone[][]` which was causing
a silent build error, preventing the vsplit zone grouping logic from
being compiled into the production bundle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 17:09:52 +01:00
Benjamin Admin
45b83560fd Vertical zone split: detect divider lines and create independent sub-zones
Pages with two side-by-side vocabulary columns separated by a vertical
black line are now split into independent sub-zones before row/column
detection. Each sub-zone gets its own rows, preventing misalignment from
different heading rhythms.

- _detect_vertical_dividers(): finds pipe word_boxes at consistent x
  positions spanning >50% of zone height
- _split_zone_at_vertical_dividers(): creates left/right PageZone objects
  with layout_hint and vsplit_group metadata
- Column union skips vsplit zones (independent column sets)
- Frontend renders vsplit zones side by side via flex layout
- PageZone gets layout_hint + vsplit_group fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 16:38:12 +01:00
Benjamin Admin
e4fa634a63 Fix GridTable: show cell.text when it diverges from word_boxes
Post-processing steps like 5h (slash-IPA conversion) modify cell.text
but not individual word_boxes. The colored per-word display showed
stale word_box text instead of the corrected cell text. Now falls
back to the plain input when texts don't match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 15:05:10 +01:00
Benjamin Admin
7dc00e737a Add footer row label (F) in grid editor, matching header (H) style
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m40s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 15s
Footer rows (e.g. page numbers) now show "F" in amber below the row
number, mirroring the blue "H" label for headers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 11:01:14 +01:00
Benjamin Admin
fd79d5e4fa fix: prevent grid table overflow when union columns exceed zone bbox
When union columns from multiple content zones are applied, column
boundaries can span wider than any single zone's bbox. Using
zone.bbox_px.w as the scale reference caused the total scaled width
to exceed the container, pushing the table off-screen.

Now uses the actual total column width sum as the scale reference,
guaranteeing columns always fit within the container.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 19:43:00 +01:00
Benjamin Admin
b1cdb2531c feat: CSS Grid editor with OCR-measured column widths and row heights
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 30s
CI / test-python-klausur (push) Failing after 1m59s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 18s
Backend: add layout_metrics (avg_row_height_px, font_size_suggestion_px)
to build-grid response for faithful grid reconstruction.

Frontend: rewrite GridTable from HTML <table> to CSS Grid layout.
Column widths are now proportional to the OCR-measured x_min/x_max
positions. Row heights use the average content row height from the
scan. Column and row resize via drag handles (Excel-like).

Font: add Noto Sans (supports IPA characters) via next/font/google.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 13:48:47 +01:00
Benjamin Admin
dfce8415d7 fix: show per-word colors in grid table instead of whole-cell coloring
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 38s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m1s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 19s
When a cell has colored words (red !, blue phonetics), render each
word as a separate span with its own color instead of coloring the
entire input text with the first non-black color found.

Switches to editable input on cell selection (click).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 08:55:43 +01:00
Benjamin Admin
4a8d43fd71 feat: display detected text colors in grid editor UI
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 2m8s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 18s
- Add color/color_name/recovered fields to OcrWordBox type
- GridTable: show colored text + left-edge color indicator strip
- GridEditor: show color stats and recovered count in summary bar

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 01:03:09 +01:00
Benjamin Admin
c3f1547e32 feat: add Excel-like grid editor for OCR overlay (Kombi mode step 6)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m1s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 17s
Backend: new grid_editor_api.py with build-grid endpoint that detects
bordered boxes, splits page into zones, clusters columns/rows per zone
from Kombi word positions. New DB column grid_editor_result JSONB.

Frontend: GridEditor component with editable HTML tables per zone,
column bold toggle, header row toggle, undo/redo, keyboard navigation
(Tab/Enter/Arrow), image overlay verification, and save/load.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 23:41:03 +01:00