Commit Graph

  • 21b69e06be Fix cross-column word assignment by splitting OCR merge artifacts main Benjamin Admin 2026-03-28 10:54:41 +01:00
  • 0168ab1a67 Remove Hauptseite/Box tabs from Kombi pipeline Benjamin Admin 2026-03-27 17:43:58 +01:00
  • 925f4356ce Use spellchecker instead of pyphen for pipe autocorrect validation Benjamin Admin 2026-03-27 16:47:42 +01:00
  • cc4cb3bc2f Add pipe auto-correction and graphic artifact filter for grid builder Benjamin Admin 2026-03-27 16:33:38 +01:00
  • 0685fb12da Fix Bug 3: recover OCR-lost prefixes via overlap merge + chain merging Benjamin Admin 2026-03-27 15:49:52 +01:00
  • 96ea23164d Fix word-gap merge: add missing pronouns to stop words, reduce threshold Benjamin Admin 2026-03-27 15:35:12 +01:00
  • a8773d5b00 Fix 4 Grid Editor bugs: syllable modes, heading detection, word gaps Benjamin Admin 2026-03-27 15:24:35 +01:00
  • 9f68bd3425 feat: Implement page-split step with auto-detection and sub-session naming Benjamin Admin 2026-03-26 17:56:45 +01:00
  • 469f09d1e1 fix: Redesign StepUpload for manual step control Benjamin Admin 2026-03-26 17:35:36 +01:00
  • 3bb04b25ab fix: OCR Kombi upload race condition — openSession was resetting step to 0 Benjamin Admin 2026-03-26 17:10:04 +01:00
  • 85fe0a73d6 docs: Add OCR Kombi Pipeline to MkDocs and cross-reference from OCR Pipeline Benjamin Admin 2026-03-26 16:09:40 +01:00
  • eaade3cad2 feat: Maschinenbau-Branche + INDUSTRY_REGULATION_MAP erweitert Benjamin Admin 2026-03-26 15:59:31 +01:00
  • d26a9f60ab Add OCR Kombi Pipeline: modular 11-step architecture with multi-page support Benjamin Admin 2026-03-26 15:55:28 +01:00
  • d26233b5b3 Add page number display to StepGridReview summary bar Benjamin Admin 2026-03-26 11:21:44 +01:00
  • e019dde01b Extract page number as metadata instead of silently removing it Benjamin Admin 2026-03-26 08:52:09 +01:00
  • 5af5d821a5 Fix 3 grid issues: artifact cells, connector col noise, footer false positive Benjamin Admin 2026-03-26 08:18:55 +01:00
  • 525de55791 Fix syllable+IPA combination: strip bracket content before IPA guard Benjamin Admin 2026-03-26 00:03:10 +01:00
  • f860eb66e6 Add German IPA support (wiki-pronunciation-dict + epitran) Benjamin Admin 2026-03-25 22:18:20 +01:00
  • a73ddce43d Fix missing PageZone import in grid_editor_helpers.py Benjamin Admin 2026-03-25 22:04:21 +01:00
  • 47e83d90bd Remove IPA:DE option — no German IPA dictionary available Benjamin Admin 2026-03-25 21:53:43 +01:00
  • 76cd1ac020 Fix false headers on sparse layouts and IPA corruption on German text Benjamin Admin 2026-03-25 21:49:05 +01:00
  • 256df820cd Auto-rebuild grid when IPA or syllable mode dropdown changes Benjamin Admin 2026-03-25 20:43:20 +01:00
  • 7773c51304 Fix en/de mode edge case on docs without detected English column Benjamin Admin 2026-03-25 08:37:15 +01:00
  • 83c058e400 Add language-specific IPA and syllable modes (de/en) Benjamin Admin 2026-03-25 08:16:29 +01:00
  • 34680732f8 Add IPA and syllable mode toggles, fix false IPA on German documents Benjamin Admin 2026-03-25 08:04:44 +01:00
  • c42924a94a Fix IPA correction persistence and false-positive prefix matching Benjamin Admin 2026-03-25 07:26:32 +01:00
  • 9ea217bdfc Fix IPA correction for dictionary pages (WIP) Benjamin Admin 2026-03-24 23:54:14 +01:00
  • 4feec7c7b7 Lower syllable pipe-ratio threshold from 5% to 1% Benjamin Admin 2026-03-24 23:17:08 +01:00
  • ed7fc99fc4 Improve syllable divider insertion for dictionary pages Benjamin Admin 2026-03-24 19:44:29 +01:00
  • 7fbcae954b fix: auto-trigger orientation for page-split sessions without result Benjamin Admin 2026-03-24 17:19:56 +01:00
  • f931091b57 refactor: independent sessions for page-split + URL-based pipeline navigation Benjamin Admin 2026-03-24 17:05:33 +01:00
  • f34340de9c Fix sub-session completion flow: navigate to next incomplete sub-session Benjamin Admin 2026-03-24 16:33:56 +01:00
  • 55de6c21d2 Fix session resume: auto-open most advanced sub-session on parent click Benjamin Admin 2026-03-24 16:04:53 +01:00
  • 52b66ebe07 Fix NameError: _text_has_garbled_ipa not imported in grid_editor_helpers Benjamin Admin 2026-03-24 15:11:29 +01:00
  • 424e5c51d4 fix: remove nested scrollbar in grid editor Benjamin Admin 2026-03-24 15:06:28 +01:00
  • 12b4c61bac refactor: extract grid helpers + generic CV-gated syllable insertion Benjamin Admin 2026-03-24 14:39:33 +01:00
  • d9b2aa82e9 fix: CV-gated syllable insertion + grid editor scroll Benjamin Admin 2026-03-24 14:31:16 +01:00
  • 364086b86e feat: auto-insert syllable dividers via pyphen on dictionary pages Benjamin Admin 2026-03-24 14:17:26 +01:00
  • fe754398c0 fix: Step 4f sidebar detection uses avg text length instead of fill ratio Benjamin Admin 2026-03-24 14:10:43 +01:00
  • be86a7d14d fix: preserve pipe syllable dividers + detect alphabet sidebar columns Benjamin Admin 2026-03-24 13:52:11 +01:00
  • 19a5f69272 fix: make Grid Editor vertically scrollable so all rows are visible Benjamin Admin 2026-03-24 13:33:52 +01:00
  • ea09fc75df fix: resolve circular import with lazy import for _build_reference_snapshot Benjamin Admin 2026-03-24 13:18:21 +01:00
  • 410d36f3de feat: save automatic grid snapshot before manual edits for GT comparison Benjamin Admin 2026-03-24 13:16:44 +01:00
  • 72ce4420cb fix: advance uiStep past skipped orientation for page-split sub-sessions Benjamin Admin 2026-03-24 12:59:36 +01:00
  • 63dfb4d06f fix: replace reset useEffects with key prop for step component remount Benjamin Admin 2026-03-24 12:20:50 +01:00
  • 08a91ba2be Fix sub-session tab switching: reset step state on sessionId change Benjamin Admin 2026-03-24 12:04:23 +01:00
  • 49a36364a8 Add double-page split support to OCR Overlay (Kombi 7 Schritte) Benjamin Admin 2026-03-24 11:48:26 +01:00
  • 14fd8e0b1e Fix page-split: fetch sub-sessions from API instead of React state Benjamin Admin 2026-03-24 11:22:15 +01:00
  • 247b79674d Add double-page spread detection to frontend pipeline Benjamin Admin 2026-03-24 11:09:44 +01:00
  • 40815dafd1 feat(ocr-pipeline): add page-split endpoint for double-page book spreads Benjamin Admin 2026-03-24 10:53:06 +01:00
  • 2a21127f01 fix(ocr-pipeline): improve page crop spine detection and cell assignment Benjamin Admin 2026-03-24 09:23:30 +01:00
  • 9d34c5201e feat(grid-editor): add manual cell color control via right-click menu Benjamin Admin 2026-03-24 08:51:18 +01:00
  • d54814fa70 feat: color bar respects edits + column pattern auto-correction Benjamin Admin 2026-03-24 08:38:11 +01:00
  • d6f4944bcc fix: remove maxHeight limit on grid editor — shows all rows Benjamin Admin 2026-03-24 08:24:50 +01:00
  • ee0d9c881e fix: column resize handle now accessible above add/delete buttons Benjamin Admin 2026-03-24 08:20:04 +01:00
  • 65f4ce1947 feat: ImageLayoutEditor, arrow-key nav, multi-select bold, wider columns Benjamin Admin 2026-03-24 07:45:39 +01:00
  • 4e668660a7 feat: add Woerterbuch category + column add/delete in grid editor Benjamin Admin 2026-03-23 16:27:12 +01:00
  • 7a6eadde8b feat: integrate Ground Truth review into Kombi Pipeline last step Benjamin Admin 2026-03-23 15:04:23 +01:00
  • 4e809c3860 fix: ground-truth crash on col_type + remove AIToolsSidebarResponsive from model-management Benjamin Admin 2026-03-23 10:14:02 +01:00
  • dccbb909bc fix: remove AIToolsSidebarResponsive wrapper from ground-truth and regression pages Benjamin Admin 2026-03-23 09:57:52 +01:00
  • be7f5f1872 feat: Sprint 2 — TrOCR ONNX, PP-DocLayout, Model Management Benjamin Admin 2026-03-23 09:53:02 +01:00
  • c695b659fb fix: PagePurpose props on ground-truth and regression pages Benjamin Admin 2026-03-23 09:43:10 +01:00
  • a1e079b911 feat: Sprint 1 — IPA hardening, regression framework, ground-truth review Benjamin Admin 2026-03-23 09:21:27 +01:00
  • f5d5d6c59c docs: add Vision, Roadmap, and Hardware strategy to MkDocs Benjamin Admin 2026-03-23 08:54:22 +01:00
  • 4a44ad7986 fix: hard-filter OCR words inside detected graphic regions Benjamin Admin 2026-03-22 10:18:23 +01:00
  • 7b3319be2e fix: merge syllable-split word_boxes + keep dictionary guide words Benjamin Admin 2026-03-22 08:21:00 +01:00
  • 882b177fc3 fix: remove image-area artifacts + fix heading false positive for dictionary entries Benjamin Admin 2026-03-22 07:59:24 +01:00
  • 1fae39dbb8 fix: lower secondary column threshold + strip pipe chars from word_boxes Benjamin Admin 2026-03-22 07:44:03 +01:00
  • 46c8c28d34 fix: border strip pre-filter + 3-column detection for vocabulary tables Benjamin Admin 2026-03-21 21:01:43 +01:00
  • 4000110501 fix: extend tiny symbol filter to all non-black colors, raise area to 200 Benjamin Admin 2026-03-21 18:05:31 +01:00
  • 2acf8696bf fix: correct border strip test data to avoid false internal gaps Benjamin Admin 2026-03-21 17:24:33 +01:00
  • c0e1118870 feat: detect and remove page-border decoration strip artifacts (Step 4e) Benjamin Admin 2026-03-21 17:20:45 +01:00
  • f31a7175a2 fix: normalize word_box order to reading order for frontend display (Step 5j) Benjamin Admin 2026-03-20 19:21:37 +01:00
  • bacbfd88f1 Fix word ordering in cell text rebuild (Steps 4c, 4d, 5i) Benjamin Admin 2026-03-20 18:45:33 +01:00
  • 2c63beff04 Fix bullet overlap disambiguation + raise red threshold to 90 Benjamin Admin 2026-03-20 18:21:00 +01:00
  • 82433b4bad Step 5i: Remove blue bullet/artifact and overlapping duplicate word_boxes Benjamin Admin 2026-03-20 18:17:07 +01:00
  • d889a6959e Fix red false-positive in color detection for scanned black text Benjamin Admin 2026-03-20 17:18:44 +01:00
  • bc1804ad18 Fix vsplit side-by-side rendering: invalid TypeScript type annotation Benjamin Admin 2026-03-20 17:09:52 +01:00
  • 45b83560fd Vertical zone split: detect divider lines and create independent sub-zones Benjamin Admin 2026-03-20 16:38:12 +01:00
  • e4fa634a63 Fix GridTable: show cell.text when it diverges from word_boxes Benjamin Admin 2026-03-20 15:05:10 +01:00
  • 76ba83eecb Tighten tertiary column detection: require 4+ rows and 5% coverage Benjamin Admin 2026-03-20 12:50:03 +01:00
  • 04092a0a66 Fix Step 5h: reject grammar patterns in slash-IPA, convert trailing variants Benjamin Admin 2026-03-20 12:40:28 +01:00
  • 7fafd297e7 Step 5h: convert slash-delimited IPA to bracket notation with dict lookup Benjamin Admin 2026-03-20 12:36:08 +01:00
  • 7ac09b5941 Filter pipe-character word_boxes from OCR column divider artifacts Benjamin Admin 2026-03-20 12:09:50 +01:00
  • 1f7989cfc2 Fix grammar bracket detection: split on spaces too, not just slashes Benjamin Admin 2026-03-20 11:45:35 +01:00
  • ef5aed6a98 Preserve grammar annotations (pl), (no pl) and skip articles in IPA Benjamin Admin 2026-03-20 11:42:44 +01:00
  • 7dc00e737a Add footer row label (F) in grid editor, matching header (H) style Benjamin Admin 2026-03-20 11:01:14 +01:00
  • a579c31ddb Fix IPA continuation: skip words with inline IPA, recover emptied cells Benjamin Admin 2026-03-20 09:31:54 +01:00
  • 0f9c0d2ad0 Keep footer rows in table, mark with is_footer + col_type=footer Benjamin Admin 2026-03-20 09:08:25 +01:00
  • 278067fe20 Fix page_ref extraction: only extract cells matching page-ref pattern Benjamin Admin 2026-03-20 08:55:55 +01:00
  • d76fb2a9c8 Fix page_ref + footer extraction: extract individual cells, skip IPA footers Benjamin Admin 2026-03-20 08:47:39 +01:00
  • 9681fcbd05 Strip IPA from headings + extract page_refs and footer from table Benjamin Admin 2026-03-20 08:42:53 +01:00
  • 4290f70885 Fix unbracketed IPA continuations: detect garbled IPA in single-cell rows Benjamin Admin 2026-03-20 08:30:44 +01:00
  • 5c935eec23 Refine garbled IPA filter: skip only pure-ASCII garbled text, not text with real IPA Benjamin Admin 2026-03-20 08:15:51 +01:00
  • c4a5cd2d8a Skip garbled IPA text in single-cell heading detection Benjamin Admin 2026-03-20 08:11:02 +01:00
  • bc5ab29c06 Fix false positive: exclude first/last rows from single-cell heading detection Benjamin Admin 2026-03-20 08:06:05 +01:00
  • 7c5d95b858 Fix heading col_index + detect black single-cell headings like "Theme" Benjamin Admin 2026-03-20 08:00:06 +01:00
  • 65059471cf Update OCR Pipeline docs: Grid Editor v4.7.0 with zone merging, heading detection, IPA fixes Benjamin Admin 2026-03-20 07:05:14 +01:00
  • 58c9565ba5 Fix en_col_type detection: use bracket IPA count instead of longest avg text Benjamin Admin 2026-03-20 06:50:47 +01:00
  • 92a7b85c2d Fix IPA continuation: only process fully-bracketed cells, keep phrasal verb particles Benjamin Admin 2026-03-20 00:43:51 +01:00