Commit Graph

  • d6f4944bcc fix: remove maxHeight limit on grid editor — shows all rows Benjamin Admin 2026-03-24 08:24:50 +01:00
  • ee0d9c881e fix: column resize handle now accessible above add/delete buttons Benjamin Admin 2026-03-24 08:20:04 +01:00
  • 65f4ce1947 feat: ImageLayoutEditor, arrow-key nav, multi-select bold, wider columns Benjamin Admin 2026-03-24 07:45:39 +01:00
  • 4e668660a7 feat: add Woerterbuch category + column add/delete in grid editor Benjamin Admin 2026-03-23 16:27:12 +01:00
  • 7a6eadde8b feat: integrate Ground Truth review into Kombi Pipeline last step Benjamin Admin 2026-03-23 15:04:23 +01:00
  • 4e809c3860 fix: ground-truth crash on col_type + remove AIToolsSidebarResponsive from model-management Benjamin Admin 2026-03-23 10:14:02 +01:00
  • dccbb909bc fix: remove AIToolsSidebarResponsive wrapper from ground-truth and regression pages Benjamin Admin 2026-03-23 09:57:52 +01:00
  • be7f5f1872 feat: Sprint 2 — TrOCR ONNX, PP-DocLayout, Model Management Benjamin Admin 2026-03-23 09:53:02 +01:00
  • c695b659fb fix: PagePurpose props on ground-truth and regression pages Benjamin Admin 2026-03-23 09:43:10 +01:00
  • a1e079b911 feat: Sprint 1 — IPA hardening, regression framework, ground-truth review Benjamin Admin 2026-03-23 09:21:27 +01:00
  • f5d5d6c59c docs: add Vision, Roadmap, and Hardware strategy to MkDocs Benjamin Admin 2026-03-23 08:54:22 +01:00
  • 4a44ad7986 fix: hard-filter OCR words inside detected graphic regions Benjamin Admin 2026-03-22 10:18:23 +01:00
  • 7b3319be2e fix: merge syllable-split word_boxes + keep dictionary guide words Benjamin Admin 2026-03-22 08:21:00 +01:00
  • 882b177fc3 fix: remove image-area artifacts + fix heading false positive for dictionary entries Benjamin Admin 2026-03-22 07:59:24 +01:00
  • 1fae39dbb8 fix: lower secondary column threshold + strip pipe chars from word_boxes Benjamin Admin 2026-03-22 07:44:03 +01:00
  • 46c8c28d34 fix: border strip pre-filter + 3-column detection for vocabulary tables Benjamin Admin 2026-03-21 21:01:43 +01:00
  • 4000110501 fix: extend tiny symbol filter to all non-black colors, raise area to 200 Benjamin Admin 2026-03-21 18:05:31 +01:00
  • 2acf8696bf fix: correct border strip test data to avoid false internal gaps Benjamin Admin 2026-03-21 17:24:33 +01:00
  • c0e1118870 feat: detect and remove page-border decoration strip artifacts (Step 4e) Benjamin Admin 2026-03-21 17:20:45 +01:00
  • f31a7175a2 fix: normalize word_box order to reading order for frontend display (Step 5j) Benjamin Admin 2026-03-20 19:21:37 +01:00
  • bacbfd88f1 Fix word ordering in cell text rebuild (Steps 4c, 4d, 5i) Benjamin Admin 2026-03-20 18:45:33 +01:00
  • 2c63beff04 Fix bullet overlap disambiguation + raise red threshold to 90 Benjamin Admin 2026-03-20 18:21:00 +01:00
  • 82433b4bad Step 5i: Remove blue bullet/artifact and overlapping duplicate word_boxes Benjamin Admin 2026-03-20 18:17:07 +01:00
  • d889a6959e Fix red false-positive in color detection for scanned black text Benjamin Admin 2026-03-20 17:18:44 +01:00
  • bc1804ad18 Fix vsplit side-by-side rendering: invalid TypeScript type annotation Benjamin Admin 2026-03-20 17:09:52 +01:00
  • 45b83560fd Vertical zone split: detect divider lines and create independent sub-zones Benjamin Admin 2026-03-20 16:38:12 +01:00
  • e4fa634a63 Fix GridTable: show cell.text when it diverges from word_boxes Benjamin Admin 2026-03-20 15:05:10 +01:00
  • 76ba83eecb Tighten tertiary column detection: require 4+ rows and 5% coverage Benjamin Admin 2026-03-20 12:50:03 +01:00
  • 04092a0a66 Fix Step 5h: reject grammar patterns in slash-IPA, convert trailing variants Benjamin Admin 2026-03-20 12:40:28 +01:00
  • 7fafd297e7 Step 5h: convert slash-delimited IPA to bracket notation with dict lookup Benjamin Admin 2026-03-20 12:36:08 +01:00
  • 7ac09b5941 Filter pipe-character word_boxes from OCR column divider artifacts Benjamin Admin 2026-03-20 12:09:50 +01:00
  • 1f7989cfc2 Fix grammar bracket detection: split on spaces too, not just slashes Benjamin Admin 2026-03-20 11:45:35 +01:00
  • ef5aed6a98 Preserve grammar annotations (pl), (no pl) and skip articles in IPA Benjamin Admin 2026-03-20 11:42:44 +01:00
  • 7dc00e737a Add footer row label (F) in grid editor, matching header (H) style Benjamin Admin 2026-03-20 11:01:14 +01:00
  • a579c31ddb Fix IPA continuation: skip words with inline IPA, recover emptied cells Benjamin Admin 2026-03-20 09:31:54 +01:00
  • 0f9c0d2ad0 Keep footer rows in table, mark with is_footer + col_type=footer Benjamin Admin 2026-03-20 09:08:25 +01:00
  • 278067fe20 Fix page_ref extraction: only extract cells matching page-ref pattern Benjamin Admin 2026-03-20 08:55:55 +01:00
  • d76fb2a9c8 Fix page_ref + footer extraction: extract individual cells, skip IPA footers Benjamin Admin 2026-03-20 08:47:39 +01:00
  • 9681fcbd05 Strip IPA from headings + extract page_refs and footer from table Benjamin Admin 2026-03-20 08:42:53 +01:00
  • 4290f70885 Fix unbracketed IPA continuations: detect garbled IPA in single-cell rows Benjamin Admin 2026-03-20 08:30:44 +01:00
  • 5c935eec23 Refine garbled IPA filter: skip only pure-ASCII garbled text, not text with real IPA Benjamin Admin 2026-03-20 08:15:51 +01:00
  • c4a5cd2d8a Skip garbled IPA text in single-cell heading detection Benjamin Admin 2026-03-20 08:11:02 +01:00
  • bc5ab29c06 Fix false positive: exclude first/last rows from single-cell heading detection Benjamin Admin 2026-03-20 08:06:05 +01:00
  • 7c5d95b858 Fix heading col_index + detect black single-cell headings like "Theme" Benjamin Admin 2026-03-20 08:00:06 +01:00
  • 65059471cf Update OCR Pipeline docs: Grid Editor v4.7.0 with zone merging, heading detection, IPA fixes Benjamin Admin 2026-03-20 07:05:14 +01:00
  • 58c9565ba5 Fix en_col_type detection: use bracket IPA count instead of longest avg text Benjamin Admin 2026-03-20 06:50:47 +01:00
  • 92a7b85c2d Fix IPA continuation: only process fully-bracketed cells, keep phrasal verb particles Benjamin Admin 2026-03-20 00:43:51 +01:00
  • 5f89913a9a Fix IPA continuation to check all columns, not just en_col_type Benjamin Admin 2026-03-19 23:34:41 +01:00
  • 3c7fc43f43 Fix test expectation: valid IPA in brackets also triggers detection Benjamin Admin 2026-03-19 23:30:24 +01:00
  • 6bfa9eed86 Fix garbled IPA detection for bracket-notation like [n, nn] and [1uedtX,1] Benjamin Admin 2026-03-19 23:28:00 +01:00
  • 7750b2a05f Fix ghost filter for borderless boxes + remove oversized graphic artifacts Benjamin Admin 2026-03-19 23:04:00 +01:00
  • e3395ae8cf Fix overlay word leak, ghost filter false positive, merged zone header Benjamin Admin 2026-03-19 13:56:04 +01:00
  • df30d4eae3 Add zone merging across images + heading detection by color/height Benjamin Admin 2026-03-19 12:22:11 +01:00
  • 2e6ab3a646 Fix IPA marker split: walk back max 3 chars for onset cluster Benjamin Admin 2026-03-19 10:57:15 +01:00
  • cc5ee74921 Use OCR-recognized IPA when word not in dictionary Benjamin Admin 2026-03-19 10:55:36 +01:00
  • 21d37b5da1 Fix prefix matching: use alpha-only chars, min 4-char prefix Benjamin Admin 2026-03-19 10:40:37 +01:00
  • 19cbbf310a Improve garbled IPA cleanup: trailing strip, prefix match, broader guard Benjamin Admin 2026-03-19 10:36:25 +01:00
  • fc0ab84e40 Fix garbled IPA in continuation rows using headword lookup Benjamin Admin 2026-03-19 10:28:14 +01:00
  • 050d410ba0 Preserve IPA continuation rows in grid output Benjamin Admin 2026-03-19 10:22:58 +01:00
  • 038eaf783c Only insert IPA when garbled phonetics exist in OCR text Benjamin Admin 2026-03-19 09:59:21 +01:00
  • 432eee3694 Auto-filter decorative margin strips and header junk Benjamin Admin 2026-03-19 09:38:24 +01:00
  • 8e4cbd84c2 Invalidate grid_editor_result when exclude regions change Benjamin Admin 2026-03-19 09:19:09 +01:00
  • f9d71d50d1 Add exclude region marking in Structure step Benjamin Admin 2026-03-19 09:08:30 +01:00
  • c09838e91c Fix spine shadow false positives: require dark valley, brightness rise, trim convolution edges Benjamin Admin 2026-03-19 08:23:50 +01:00
  • 3fd6523872 Cut at spine center (darkest point) instead of shadow edge Benjamin Admin 2026-03-19 07:54:33 +01:00
  • e56391b0c3 Add right-edge spine shadow detection for book scans Benjamin Admin 2026-03-19 07:41:13 +01:00
  • a3e2a7f994 Add GT button to OCR overlay, prominent category picker, track pipeline Benjamin Admin 2026-03-18 14:49:02 +01:00
  • f655db30e4 Add Ground Truth regression test system for OCR pipeline Benjamin Admin 2026-03-18 13:46:48 +01:00
  • c894a0feeb Improve IPA continuation row detection with phonetic heuristics Benjamin Admin 2026-03-18 12:08:21 +01:00
  • 8ef4c089cf Remove IPA continuation rows and support hyphenated word lookup Benjamin Admin 2026-03-18 12:05:38 +01:00
  • 821e5481c2 Only apply IPA correction on vocabulary tables (≥3 columns) Benjamin Admin 2026-03-18 11:50:03 +01:00
  • b98ea33a3a Strip garbled OCR phonetics after IPA insertion Benjamin Admin 2026-03-18 11:15:14 +01:00
  • f139d0903e Preserve alphabetic marker columns, broaden junk filter, enable IPA in grid Benjamin Admin 2026-03-18 11:08:23 +01:00
  • 962bbbe9f6 Remove scattered debris rows and disable spanning header detection Benjamin Admin 2026-03-18 10:47:17 +01:00
  • 9da45c2a59 Fix false header detection and add decorative margin/footer filters Benjamin Admin 2026-03-18 10:38:20 +01:00
  • 64447ad352 Raise color sat_threshold from 50 to 55 to avoid scanner blue artifacts Benjamin Admin 2026-03-18 09:13:09 +01:00
  • 00cbf266cb Add oversized-stub filter for large page numbers/marks in grid rows Benjamin Admin 2026-03-18 09:05:07 +01:00
  • f9bad7beaa Filter phantom rows from recovered color artifacts and low-conf OCR noise Benjamin Admin 2026-03-18 09:00:43 +01:00
  • 143e41ec76 add: ocr_pipeline_overlays.py for overlay rendering functions Benjamin Admin 2026-03-18 08:46:49 +01:00
  • ec287fd12e refactor: split ocr_pipeline_api.py (5426 lines) into 8 modules Benjamin Admin 2026-03-18 08:42:00 +01:00
  • 98f7f7d7d5 fix: NameError in paddle_kombi/rapid_kombi cache update Benjamin Admin 2026-03-18 08:12:01 +01:00
  • a19bca6060 fix: lower color sat_threshold from 70 to 50 for green text detection Benjamin Admin 2026-03-18 08:00:35 +01:00
  • 7a76697f95 fix: always re-run structure detection instead of using cached result Benjamin Admin 2026-03-18 07:43:44 +01:00
  • 5359a4cc2b fix: cache word_result in paddle_kombi/rapid_kombi for detect-structure Benjamin Admin 2026-03-18 07:29:02 +01:00
  • a25214126d fix: merge overlapping OCR words with different text (Stick/Stück) Benjamin Admin 2026-03-18 07:00:57 +01:00
  • fd79d5e4fa fix: prevent grid table overflow when union columns exceed zone bbox Benjamin Admin 2026-03-17 19:43:00 +01:00
  • 19b93f7762 fix: conservative column detection + smart graphic word filter Benjamin Admin 2026-03-17 18:19:25 +01:00
  • a079ffe8e9 fix: robust colored-text detection in graphic filter Benjamin Admin 2026-03-17 18:09:16 +01:00
  • 6e1d715d0d fix: prevent colored text from being falsely detected as graphics Benjamin Admin 2026-03-17 17:30:35 +01:00
  • d66efdecf5 fix: NameError in detect_page_splits — 'gaps' var removed in rewrite Benjamin Admin 2026-03-17 17:01:34 +01:00
  • d36972b464 fix: detect spine by brightness, not ink density Benjamin Admin 2026-03-17 16:52:29 +01:00
  • f30e526917 fix: merge nearby spine gaps + handle multi-page crop in frontend Benjamin Admin 2026-03-17 16:44:32 +01:00
  • 438a4495c7 fix: swap 90°/270° rotation direction in orientation detection Benjamin Admin 2026-03-17 16:39:15 +01:00
  • 902de027f4 feat: auto-detect multi-page spreads and split into sub-sessions Benjamin Admin 2026-03-17 16:34:06 +01:00
  • b1cdb2531c feat: CSS Grid editor with OCR-measured column widths and row heights Benjamin Admin 2026-03-17 13:48:47 +01:00
  • ab30e8b17a feat: apply IPA phonetic correction in build-grid combo mode Benjamin Admin 2026-03-17 12:53:58 +01:00
  • b0e1fbc8d6 feat: box zone artifact filter, spanning headers, parenthesis fix Benjamin Admin 2026-03-17 11:31:55 +01:00
  • 872b47f691 fix: filter words and color recoveries inside graphic/image regions Benjamin Admin 2026-03-17 11:20:07 +01:00
  • bbf0a5720e fix: require both horizontal AND vertical overlap for word dedup Benjamin Admin 2026-03-17 10:57:44 +01:00
  • 29d3c1caf5 fix: deduplicate overlapping words after Paddle+Tesseract merge Benjamin Admin 2026-03-17 10:47:42 +01:00