-
d6f4944bcc
fix: remove maxHeight limit on grid editor — shows all rows
Benjamin Admin
2026-03-24 08:24:50 +01:00
-
ee0d9c881e
fix: column resize handle now accessible above add/delete buttons
Benjamin Admin
2026-03-24 08:20:04 +01:00
-
65f4ce1947
feat: ImageLayoutEditor, arrow-key nav, multi-select bold, wider columns
Benjamin Admin
2026-03-24 07:45:39 +01:00
-
4e668660a7
feat: add Woerterbuch category + column add/delete in grid editor
Benjamin Admin
2026-03-23 16:27:12 +01:00
-
7a6eadde8b
feat: integrate Ground Truth review into Kombi Pipeline last step
Benjamin Admin
2026-03-23 15:04:23 +01:00
-
4e809c3860
fix: ground-truth crash on col_type + remove AIToolsSidebarResponsive from model-management
Benjamin Admin
2026-03-23 10:14:02 +01:00
-
dccbb909bc
fix: remove AIToolsSidebarResponsive wrapper from ground-truth and regression pages
Benjamin Admin
2026-03-23 09:57:52 +01:00
-
be7f5f1872
feat: Sprint 2 — TrOCR ONNX, PP-DocLayout, Model Management
Benjamin Admin
2026-03-23 09:53:02 +01:00
-
c695b659fb
fix: PagePurpose props on ground-truth and regression pages
Benjamin Admin
2026-03-23 09:43:10 +01:00
-
a1e079b911
feat: Sprint 1 — IPA hardening, regression framework, ground-truth review
Benjamin Admin
2026-03-23 09:21:27 +01:00
-
f5d5d6c59c
docs: add Vision, Roadmap, and Hardware strategy to MkDocs
Benjamin Admin
2026-03-23 08:54:22 +01:00
-
4a44ad7986
fix: hard-filter OCR words inside detected graphic regions
Benjamin Admin
2026-03-22 10:18:23 +01:00
-
7b3319be2e
fix: merge syllable-split word_boxes + keep dictionary guide words
Benjamin Admin
2026-03-22 08:21:00 +01:00
-
882b177fc3
fix: remove image-area artifacts + fix heading false positive for dictionary entries
Benjamin Admin
2026-03-22 07:59:24 +01:00
-
1fae39dbb8
fix: lower secondary column threshold + strip pipe chars from word_boxes
Benjamin Admin
2026-03-22 07:44:03 +01:00
-
46c8c28d34
fix: border strip pre-filter + 3-column detection for vocabulary tables
Benjamin Admin
2026-03-21 21:01:43 +01:00
-
4000110501
fix: extend tiny symbol filter to all non-black colors, raise area to 200
Benjamin Admin
2026-03-21 18:05:31 +01:00
-
2acf8696bf
fix: correct border strip test data to avoid false internal gaps
Benjamin Admin
2026-03-21 17:24:33 +01:00
-
c0e1118870
feat: detect and remove page-border decoration strip artifacts (Step 4e)
Benjamin Admin
2026-03-21 17:20:45 +01:00
-
f31a7175a2
fix: normalize word_box order to reading order for frontend display (Step 5j)
Benjamin Admin
2026-03-20 19:21:37 +01:00
-
bacbfd88f1
Fix word ordering in cell text rebuild (Steps 4c, 4d, 5i)
Benjamin Admin
2026-03-20 18:45:33 +01:00
-
2c63beff04
Fix bullet overlap disambiguation + raise red threshold to 90
Benjamin Admin
2026-03-20 18:21:00 +01:00
-
82433b4bad
Step 5i: Remove blue bullet/artifact and overlapping duplicate word_boxes
Benjamin Admin
2026-03-20 18:17:07 +01:00
-
d889a6959e
Fix red false-positive in color detection for scanned black text
Benjamin Admin
2026-03-20 17:18:44 +01:00
-
bc1804ad18
Fix vsplit side-by-side rendering: invalid TypeScript type annotation
Benjamin Admin
2026-03-20 17:09:52 +01:00
-
45b83560fd
Vertical zone split: detect divider lines and create independent sub-zones
Benjamin Admin
2026-03-20 16:38:12 +01:00
-
e4fa634a63
Fix GridTable: show cell.text when it diverges from word_boxes
Benjamin Admin
2026-03-20 15:05:10 +01:00
-
76ba83eecb
Tighten tertiary column detection: require 4+ rows and 5% coverage
Benjamin Admin
2026-03-20 12:50:03 +01:00
-
04092a0a66
Fix Step 5h: reject grammar patterns in slash-IPA, convert trailing variants
Benjamin Admin
2026-03-20 12:40:28 +01:00
-
7fafd297e7
Step 5h: convert slash-delimited IPA to bracket notation with dict lookup
Benjamin Admin
2026-03-20 12:36:08 +01:00
-
7ac09b5941
Filter pipe-character word_boxes from OCR column divider artifacts
Benjamin Admin
2026-03-20 12:09:50 +01:00
-
1f7989cfc2
Fix grammar bracket detection: split on spaces too, not just slashes
Benjamin Admin
2026-03-20 11:45:35 +01:00
-
ef5aed6a98
Preserve grammar annotations (pl), (no pl) and skip articles in IPA
Benjamin Admin
2026-03-20 11:42:44 +01:00
-
7dc00e737a
Add footer row label (F) in grid editor, matching header (H) style
Benjamin Admin
2026-03-20 11:01:14 +01:00
-
a579c31ddb
Fix IPA continuation: skip words with inline IPA, recover emptied cells
Benjamin Admin
2026-03-20 09:31:54 +01:00
-
0f9c0d2ad0
Keep footer rows in table, mark with is_footer + col_type=footer
Benjamin Admin
2026-03-20 09:08:25 +01:00
-
278067fe20
Fix page_ref extraction: only extract cells matching page-ref pattern
Benjamin Admin
2026-03-20 08:55:55 +01:00
-
d76fb2a9c8
Fix page_ref + footer extraction: extract individual cells, skip IPA footers
Benjamin Admin
2026-03-20 08:47:39 +01:00
-
9681fcbd05
Strip IPA from headings + extract page_refs and footer from table
Benjamin Admin
2026-03-20 08:42:53 +01:00
-
4290f70885
Fix unbracketed IPA continuations: detect garbled IPA in single-cell rows
Benjamin Admin
2026-03-20 08:30:44 +01:00
-
5c935eec23
Refine garbled IPA filter: skip only pure-ASCII garbled text, not text with real IPA
Benjamin Admin
2026-03-20 08:15:51 +01:00
-
c4a5cd2d8a
Skip garbled IPA text in single-cell heading detection
Benjamin Admin
2026-03-20 08:11:02 +01:00
-
bc5ab29c06
Fix false positive: exclude first/last rows from single-cell heading detection
Benjamin Admin
2026-03-20 08:06:05 +01:00
-
7c5d95b858
Fix heading col_index + detect black single-cell headings like "Theme"
Benjamin Admin
2026-03-20 08:00:06 +01:00
-
65059471cf
Update OCR Pipeline docs: Grid Editor v4.7.0 with zone merging, heading detection, IPA fixes
Benjamin Admin
2026-03-20 07:05:14 +01:00
-
58c9565ba5
Fix en_col_type detection: use bracket IPA count instead of longest avg text
Benjamin Admin
2026-03-20 06:50:47 +01:00
-
92a7b85c2d
Fix IPA continuation: only process fully-bracketed cells, keep phrasal verb particles
Benjamin Admin
2026-03-20 00:43:51 +01:00
-
5f89913a9a
Fix IPA continuation to check all columns, not just en_col_type
Benjamin Admin
2026-03-19 23:34:41 +01:00
-
3c7fc43f43
Fix test expectation: valid IPA in brackets also triggers detection
Benjamin Admin
2026-03-19 23:30:24 +01:00
-
6bfa9eed86
Fix garbled IPA detection for bracket-notation like [n, nn] and [1uedtX,1]
Benjamin Admin
2026-03-19 23:28:00 +01:00
-
7750b2a05f
Fix ghost filter for borderless boxes + remove oversized graphic artifacts
Benjamin Admin
2026-03-19 23:04:00 +01:00
-
e3395ae8cf
Fix overlay word leak, ghost filter false positive, merged zone header
Benjamin Admin
2026-03-19 13:56:04 +01:00
-
df30d4eae3
Add zone merging across images + heading detection by color/height
Benjamin Admin
2026-03-19 12:22:11 +01:00
-
2e6ab3a646
Fix IPA marker split: walk back max 3 chars for onset cluster
Benjamin Admin
2026-03-19 10:57:15 +01:00
-
cc5ee74921
Use OCR-recognized IPA when word not in dictionary
Benjamin Admin
2026-03-19 10:55:36 +01:00
-
21d37b5da1
Fix prefix matching: use alpha-only chars, min 4-char prefix
Benjamin Admin
2026-03-19 10:40:37 +01:00
-
19cbbf310a
Improve garbled IPA cleanup: trailing strip, prefix match, broader guard
Benjamin Admin
2026-03-19 10:36:25 +01:00
-
fc0ab84e40
Fix garbled IPA in continuation rows using headword lookup
Benjamin Admin
2026-03-19 10:28:14 +01:00
-
050d410ba0
Preserve IPA continuation rows in grid output
Benjamin Admin
2026-03-19 10:22:58 +01:00
-
038eaf783c
Only insert IPA when garbled phonetics exist in OCR text
Benjamin Admin
2026-03-19 09:59:21 +01:00
-
432eee3694
Auto-filter decorative margin strips and header junk
Benjamin Admin
2026-03-19 09:38:24 +01:00
-
8e4cbd84c2
Invalidate grid_editor_result when exclude regions change
Benjamin Admin
2026-03-19 09:19:09 +01:00
-
f9d71d50d1
Add exclude region marking in Structure step
Benjamin Admin
2026-03-19 09:08:30 +01:00
-
c09838e91c
Fix spine shadow false positives: require dark valley, brightness rise, trim convolution edges
Benjamin Admin
2026-03-19 08:23:50 +01:00
-
3fd6523872
Cut at spine center (darkest point) instead of shadow edge
Benjamin Admin
2026-03-19 07:54:33 +01:00
-
e56391b0c3
Add right-edge spine shadow detection for book scans
Benjamin Admin
2026-03-19 07:41:13 +01:00
-
a3e2a7f994
Add GT button to OCR overlay, prominent category picker, track pipeline
Benjamin Admin
2026-03-18 14:49:02 +01:00
-
f655db30e4
Add Ground Truth regression test system for OCR pipeline
Benjamin Admin
2026-03-18 13:46:48 +01:00
-
c894a0feeb
Improve IPA continuation row detection with phonetic heuristics
Benjamin Admin
2026-03-18 12:08:21 +01:00
-
8ef4c089cf
Remove IPA continuation rows and support hyphenated word lookup
Benjamin Admin
2026-03-18 12:05:38 +01:00
-
821e5481c2
Only apply IPA correction on vocabulary tables (≥3 columns)
Benjamin Admin
2026-03-18 11:50:03 +01:00
-
b98ea33a3a
Strip garbled OCR phonetics after IPA insertion
Benjamin Admin
2026-03-18 11:15:14 +01:00
-
f139d0903e
Preserve alphabetic marker columns, broaden junk filter, enable IPA in grid
Benjamin Admin
2026-03-18 11:08:23 +01:00
-
962bbbe9f6
Remove scattered debris rows and disable spanning header detection
Benjamin Admin
2026-03-18 10:47:17 +01:00
-
9da45c2a59
Fix false header detection and add decorative margin/footer filters
Benjamin Admin
2026-03-18 10:38:20 +01:00
-
64447ad352
Raise color sat_threshold from 50 to 55 to avoid scanner blue artifacts
Benjamin Admin
2026-03-18 09:13:09 +01:00
-
00cbf266cb
Add oversized-stub filter for large page numbers/marks in grid rows
Benjamin Admin
2026-03-18 09:05:07 +01:00
-
f9bad7beaa
Filter phantom rows from recovered color artifacts and low-conf OCR noise
Benjamin Admin
2026-03-18 09:00:43 +01:00
-
143e41ec76
add: ocr_pipeline_overlays.py for overlay rendering functions
Benjamin Admin
2026-03-18 08:46:49 +01:00
-
ec287fd12e
refactor: split ocr_pipeline_api.py (5426 lines) into 8 modules
Benjamin Admin
2026-03-18 08:42:00 +01:00
-
98f7f7d7d5
fix: NameError in paddle_kombi/rapid_kombi cache update
Benjamin Admin
2026-03-18 08:12:01 +01:00
-
a19bca6060
fix: lower color sat_threshold from 70 to 50 for green text detection
Benjamin Admin
2026-03-18 08:00:35 +01:00
-
7a76697f95
fix: always re-run structure detection instead of using cached result
Benjamin Admin
2026-03-18 07:43:44 +01:00
-
5359a4cc2b
fix: cache word_result in paddle_kombi/rapid_kombi for detect-structure
Benjamin Admin
2026-03-18 07:29:02 +01:00
-
a25214126d
fix: merge overlapping OCR words with different text (Stick/Stück)
Benjamin Admin
2026-03-18 07:00:57 +01:00
-
fd79d5e4fa
fix: prevent grid table overflow when union columns exceed zone bbox
Benjamin Admin
2026-03-17 19:43:00 +01:00
-
19b93f7762
fix: conservative column detection + smart graphic word filter
Benjamin Admin
2026-03-17 18:19:25 +01:00
-
a079ffe8e9
fix: robust colored-text detection in graphic filter
Benjamin Admin
2026-03-17 18:09:16 +01:00
-
6e1d715d0d
fix: prevent colored text from being falsely detected as graphics
Benjamin Admin
2026-03-17 17:30:35 +01:00
-
d66efdecf5
fix: NameError in detect_page_splits — 'gaps' var removed in rewrite
Benjamin Admin
2026-03-17 17:01:34 +01:00
-
d36972b464
fix: detect spine by brightness, not ink density
Benjamin Admin
2026-03-17 16:52:29 +01:00
-
f30e526917
fix: merge nearby spine gaps + handle multi-page crop in frontend
Benjamin Admin
2026-03-17 16:44:32 +01:00
-
438a4495c7
fix: swap 90°/270° rotation direction in orientation detection
Benjamin Admin
2026-03-17 16:39:15 +01:00
-
902de027f4
feat: auto-detect multi-page spreads and split into sub-sessions
Benjamin Admin
2026-03-17 16:34:06 +01:00
-
b1cdb2531c
feat: CSS Grid editor with OCR-measured column widths and row heights
Benjamin Admin
2026-03-17 13:48:47 +01:00
-
ab30e8b17a
feat: apply IPA phonetic correction in build-grid combo mode
Benjamin Admin
2026-03-17 12:53:58 +01:00
-
b0e1fbc8d6
feat: box zone artifact filter, spanning headers, parenthesis fix
Benjamin Admin
2026-03-17 11:31:55 +01:00
-
872b47f691
fix: filter words and color recoveries inside graphic/image regions
Benjamin Admin
2026-03-17 11:20:07 +01:00
-
bbf0a5720e
fix: require both horizontal AND vertical overlap for word dedup
Benjamin Admin
2026-03-17 10:57:44 +01:00
-
29d3c1caf5
fix: deduplicate overlapping words after Paddle+Tesseract merge
Benjamin Admin
2026-03-17 10:47:42 +01:00