-
21b69e06be
Fix cross-column word assignment by splitting OCR merge artifacts
main
Benjamin Admin
2026-03-28 10:54:41 +01:00
-
0168ab1a67
Remove Hauptseite/Box tabs from Kombi pipeline
Benjamin Admin
2026-03-27 17:43:58 +01:00
-
925f4356ce
Use spellchecker instead of pyphen for pipe autocorrect validation
Benjamin Admin
2026-03-27 16:47:42 +01:00
-
cc4cb3bc2f
Add pipe auto-correction and graphic artifact filter for grid builder
Benjamin Admin
2026-03-27 16:33:38 +01:00
-
0685fb12da
Fix Bug 3: recover OCR-lost prefixes via overlap merge + chain merging
Benjamin Admin
2026-03-27 15:49:52 +01:00
-
96ea23164d
Fix word-gap merge: add missing pronouns to stop words, reduce threshold
Benjamin Admin
2026-03-27 15:35:12 +01:00
-
a8773d5b00
Fix 4 Grid Editor bugs: syllable modes, heading detection, word gaps
Benjamin Admin
2026-03-27 15:24:35 +01:00
-
9f68bd3425
feat: Implement page-split step with auto-detection and sub-session naming
Benjamin Admin
2026-03-26 17:56:45 +01:00
-
469f09d1e1
fix: Redesign StepUpload for manual step control
Benjamin Admin
2026-03-26 17:35:36 +01:00
-
3bb04b25ab
fix: OCR Kombi upload race condition — openSession was resetting step to 0
Benjamin Admin
2026-03-26 17:10:04 +01:00
-
85fe0a73d6
docs: Add OCR Kombi Pipeline to MkDocs and cross-reference from OCR Pipeline
Benjamin Admin
2026-03-26 16:09:40 +01:00
-
eaade3cad2
feat: Maschinenbau-Branche + INDUSTRY_REGULATION_MAP erweitert
Benjamin Admin
2026-03-26 15:59:31 +01:00
-
d26a9f60ab
Add OCR Kombi Pipeline: modular 11-step architecture with multi-page support
Benjamin Admin
2026-03-26 15:55:28 +01:00
-
d26233b5b3
Add page number display to StepGridReview summary bar
Benjamin Admin
2026-03-26 11:21:44 +01:00
-
e019dde01b
Extract page number as metadata instead of silently removing it
Benjamin Admin
2026-03-26 08:52:09 +01:00
-
5af5d821a5
Fix 3 grid issues: artifact cells, connector col noise, footer false positive
Benjamin Admin
2026-03-26 08:18:55 +01:00
-
525de55791
Fix syllable+IPA combination: strip bracket content before IPA guard
Benjamin Admin
2026-03-26 00:03:10 +01:00
-
f860eb66e6
Add German IPA support (wiki-pronunciation-dict + epitran)
Benjamin Admin
2026-03-25 22:18:20 +01:00
-
a73ddce43d
Fix missing PageZone import in grid_editor_helpers.py
Benjamin Admin
2026-03-25 22:04:21 +01:00
-
47e83d90bd
Remove IPA:DE option — no German IPA dictionary available
Benjamin Admin
2026-03-25 21:53:43 +01:00
-
76cd1ac020
Fix false headers on sparse layouts and IPA corruption on German text
Benjamin Admin
2026-03-25 21:49:05 +01:00
-
256df820cd
Auto-rebuild grid when IPA or syllable mode dropdown changes
Benjamin Admin
2026-03-25 20:43:20 +01:00
-
7773c51304
Fix en/de mode edge case on docs without detected English column
Benjamin Admin
2026-03-25 08:37:15 +01:00
-
83c058e400
Add language-specific IPA and syllable modes (de/en)
Benjamin Admin
2026-03-25 08:16:29 +01:00
-
34680732f8
Add IPA and syllable mode toggles, fix false IPA on German documents
Benjamin Admin
2026-03-25 08:04:44 +01:00
-
c42924a94a
Fix IPA correction persistence and false-positive prefix matching
Benjamin Admin
2026-03-25 07:26:32 +01:00
-
9ea217bdfc
Fix IPA correction for dictionary pages (WIP)
Benjamin Admin
2026-03-24 23:54:14 +01:00
-
4feec7c7b7
Lower syllable pipe-ratio threshold from 5% to 1%
Benjamin Admin
2026-03-24 23:17:08 +01:00
-
ed7fc99fc4
Improve syllable divider insertion for dictionary pages
Benjamin Admin
2026-03-24 19:44:29 +01:00
-
7fbcae954b
fix: auto-trigger orientation for page-split sessions without result
Benjamin Admin
2026-03-24 17:19:56 +01:00
-
f931091b57
refactor: independent sessions for page-split + URL-based pipeline navigation
Benjamin Admin
2026-03-24 17:05:33 +01:00
-
f34340de9c
Fix sub-session completion flow: navigate to next incomplete sub-session
Benjamin Admin
2026-03-24 16:33:56 +01:00
-
55de6c21d2
Fix session resume: auto-open most advanced sub-session on parent click
Benjamin Admin
2026-03-24 16:04:53 +01:00
-
52b66ebe07
Fix NameError: _text_has_garbled_ipa not imported in grid_editor_helpers
Benjamin Admin
2026-03-24 15:11:29 +01:00
-
424e5c51d4
fix: remove nested scrollbar in grid editor
Benjamin Admin
2026-03-24 15:06:28 +01:00
-
12b4c61bac
refactor: extract grid helpers + generic CV-gated syllable insertion
Benjamin Admin
2026-03-24 14:39:33 +01:00
-
d9b2aa82e9
fix: CV-gated syllable insertion + grid editor scroll
Benjamin Admin
2026-03-24 14:31:16 +01:00
-
364086b86e
feat: auto-insert syllable dividers via pyphen on dictionary pages
Benjamin Admin
2026-03-24 14:17:26 +01:00
-
fe754398c0
fix: Step 4f sidebar detection uses avg text length instead of fill ratio
Benjamin Admin
2026-03-24 14:10:43 +01:00
-
be86a7d14d
fix: preserve pipe syllable dividers + detect alphabet sidebar columns
Benjamin Admin
2026-03-24 13:52:11 +01:00
-
19a5f69272
fix: make Grid Editor vertically scrollable so all rows are visible
Benjamin Admin
2026-03-24 13:33:52 +01:00
-
ea09fc75df
fix: resolve circular import with lazy import for _build_reference_snapshot
Benjamin Admin
2026-03-24 13:18:21 +01:00
-
410d36f3de
feat: save automatic grid snapshot before manual edits for GT comparison
Benjamin Admin
2026-03-24 13:16:44 +01:00
-
72ce4420cb
fix: advance uiStep past skipped orientation for page-split sub-sessions
Benjamin Admin
2026-03-24 12:59:36 +01:00
-
63dfb4d06f
fix: replace reset useEffects with key prop for step component remount
Benjamin Admin
2026-03-24 12:20:50 +01:00
-
08a91ba2be
Fix sub-session tab switching: reset step state on sessionId change
Benjamin Admin
2026-03-24 12:04:23 +01:00
-
49a36364a8
Add double-page split support to OCR Overlay (Kombi 7 Schritte)
Benjamin Admin
2026-03-24 11:48:26 +01:00
-
14fd8e0b1e
Fix page-split: fetch sub-sessions from API instead of React state
Benjamin Admin
2026-03-24 11:22:15 +01:00
-
247b79674d
Add double-page spread detection to frontend pipeline
Benjamin Admin
2026-03-24 11:09:44 +01:00
-
40815dafd1
feat(ocr-pipeline): add page-split endpoint for double-page book spreads
Benjamin Admin
2026-03-24 10:53:06 +01:00
-
2a21127f01
fix(ocr-pipeline): improve page crop spine detection and cell assignment
Benjamin Admin
2026-03-24 09:23:30 +01:00
-
9d34c5201e
feat(grid-editor): add manual cell color control via right-click menu
Benjamin Admin
2026-03-24 08:51:18 +01:00
-
d54814fa70
feat: color bar respects edits + column pattern auto-correction
Benjamin Admin
2026-03-24 08:38:11 +01:00
-
d6f4944bcc
fix: remove maxHeight limit on grid editor — shows all rows
Benjamin Admin
2026-03-24 08:24:50 +01:00
-
ee0d9c881e
fix: column resize handle now accessible above add/delete buttons
Benjamin Admin
2026-03-24 08:20:04 +01:00
-
65f4ce1947
feat: ImageLayoutEditor, arrow-key nav, multi-select bold, wider columns
Benjamin Admin
2026-03-24 07:45:39 +01:00
-
4e668660a7
feat: add Woerterbuch category + column add/delete in grid editor
Benjamin Admin
2026-03-23 16:27:12 +01:00
-
7a6eadde8b
feat: integrate Ground Truth review into Kombi Pipeline last step
Benjamin Admin
2026-03-23 15:04:23 +01:00
-
4e809c3860
fix: ground-truth crash on col_type + remove AIToolsSidebarResponsive from model-management
Benjamin Admin
2026-03-23 10:14:02 +01:00
-
dccbb909bc
fix: remove AIToolsSidebarResponsive wrapper from ground-truth and regression pages
Benjamin Admin
2026-03-23 09:57:52 +01:00
-
be7f5f1872
feat: Sprint 2 — TrOCR ONNX, PP-DocLayout, Model Management
Benjamin Admin
2026-03-23 09:53:02 +01:00
-
c695b659fb
fix: PagePurpose props on ground-truth and regression pages
Benjamin Admin
2026-03-23 09:43:10 +01:00
-
a1e079b911
feat: Sprint 1 — IPA hardening, regression framework, ground-truth review
Benjamin Admin
2026-03-23 09:21:27 +01:00
-
f5d5d6c59c
docs: add Vision, Roadmap, and Hardware strategy to MkDocs
Benjamin Admin
2026-03-23 08:54:22 +01:00
-
4a44ad7986
fix: hard-filter OCR words inside detected graphic regions
Benjamin Admin
2026-03-22 10:18:23 +01:00
-
7b3319be2e
fix: merge syllable-split word_boxes + keep dictionary guide words
Benjamin Admin
2026-03-22 08:21:00 +01:00
-
882b177fc3
fix: remove image-area artifacts + fix heading false positive for dictionary entries
Benjamin Admin
2026-03-22 07:59:24 +01:00
-
1fae39dbb8
fix: lower secondary column threshold + strip pipe chars from word_boxes
Benjamin Admin
2026-03-22 07:44:03 +01:00
-
46c8c28d34
fix: border strip pre-filter + 3-column detection for vocabulary tables
Benjamin Admin
2026-03-21 21:01:43 +01:00
-
4000110501
fix: extend tiny symbol filter to all non-black colors, raise area to 200
Benjamin Admin
2026-03-21 18:05:31 +01:00
-
2acf8696bf
fix: correct border strip test data to avoid false internal gaps
Benjamin Admin
2026-03-21 17:24:33 +01:00
-
c0e1118870
feat: detect and remove page-border decoration strip artifacts (Step 4e)
Benjamin Admin
2026-03-21 17:20:45 +01:00
-
f31a7175a2
fix: normalize word_box order to reading order for frontend display (Step 5j)
Benjamin Admin
2026-03-20 19:21:37 +01:00
-
bacbfd88f1
Fix word ordering in cell text rebuild (Steps 4c, 4d, 5i)
Benjamin Admin
2026-03-20 18:45:33 +01:00
-
2c63beff04
Fix bullet overlap disambiguation + raise red threshold to 90
Benjamin Admin
2026-03-20 18:21:00 +01:00
-
82433b4bad
Step 5i: Remove blue bullet/artifact and overlapping duplicate word_boxes
Benjamin Admin
2026-03-20 18:17:07 +01:00
-
d889a6959e
Fix red false-positive in color detection for scanned black text
Benjamin Admin
2026-03-20 17:18:44 +01:00
-
bc1804ad18
Fix vsplit side-by-side rendering: invalid TypeScript type annotation
Benjamin Admin
2026-03-20 17:09:52 +01:00
-
45b83560fd
Vertical zone split: detect divider lines and create independent sub-zones
Benjamin Admin
2026-03-20 16:38:12 +01:00
-
e4fa634a63
Fix GridTable: show cell.text when it diverges from word_boxes
Benjamin Admin
2026-03-20 15:05:10 +01:00
-
76ba83eecb
Tighten tertiary column detection: require 4+ rows and 5% coverage
Benjamin Admin
2026-03-20 12:50:03 +01:00
-
04092a0a66
Fix Step 5h: reject grammar patterns in slash-IPA, convert trailing variants
Benjamin Admin
2026-03-20 12:40:28 +01:00
-
7fafd297e7
Step 5h: convert slash-delimited IPA to bracket notation with dict lookup
Benjamin Admin
2026-03-20 12:36:08 +01:00
-
7ac09b5941
Filter pipe-character word_boxes from OCR column divider artifacts
Benjamin Admin
2026-03-20 12:09:50 +01:00
-
1f7989cfc2
Fix grammar bracket detection: split on spaces too, not just slashes
Benjamin Admin
2026-03-20 11:45:35 +01:00
-
ef5aed6a98
Preserve grammar annotations (pl), (no pl) and skip articles in IPA
Benjamin Admin
2026-03-20 11:42:44 +01:00
-
7dc00e737a
Add footer row label (F) in grid editor, matching header (H) style
Benjamin Admin
2026-03-20 11:01:14 +01:00
-
a579c31ddb
Fix IPA continuation: skip words with inline IPA, recover emptied cells
Benjamin Admin
2026-03-20 09:31:54 +01:00
-
0f9c0d2ad0
Keep footer rows in table, mark with is_footer + col_type=footer
Benjamin Admin
2026-03-20 09:08:25 +01:00
-
278067fe20
Fix page_ref extraction: only extract cells matching page-ref pattern
Benjamin Admin
2026-03-20 08:55:55 +01:00
-
d76fb2a9c8
Fix page_ref + footer extraction: extract individual cells, skip IPA footers
Benjamin Admin
2026-03-20 08:47:39 +01:00
-
9681fcbd05
Strip IPA from headings + extract page_refs and footer from table
Benjamin Admin
2026-03-20 08:42:53 +01:00
-
4290f70885
Fix unbracketed IPA continuations: detect garbled IPA in single-cell rows
Benjamin Admin
2026-03-20 08:30:44 +01:00
-
5c935eec23
Refine garbled IPA filter: skip only pure-ASCII garbled text, not text with real IPA
Benjamin Admin
2026-03-20 08:15:51 +01:00
-
c4a5cd2d8a
Skip garbled IPA text in single-cell heading detection
Benjamin Admin
2026-03-20 08:11:02 +01:00
-
bc5ab29c06
Fix false positive: exclude first/last rows from single-cell heading detection
Benjamin Admin
2026-03-20 08:06:05 +01:00
-
7c5d95b858
Fix heading col_index + detect black single-cell headings like "Theme"
Benjamin Admin
2026-03-20 08:00:06 +01:00
-
65059471cf
Update OCR Pipeline docs: Grid Editor v4.7.0 with zone merging, heading detection, IPA fixes
Benjamin Admin
2026-03-20 07:05:14 +01:00
-
58c9565ba5
Fix en_col_type detection: use bracket IPA count instead of longest avg text
Benjamin Admin
2026-03-20 06:50:47 +01:00
-
92a7b85c2d
Fix IPA continuation: only process fully-bracketed cells, keep phrasal verb particles
Benjamin Admin
2026-03-20 00:43:51 +01:00