-
f0726d9a2b
fix: shrink overlapping neighbors after narrow column expansion
Benjamin Admin
2026-03-04 11:12:13 +01:00
-
ae1f9f7494
fix: expand narrow columns into neighbor space, not just gaps
Benjamin Admin
2026-03-04 10:49:10 +01:00
-
e4aff2b27e
fix: rewrite Method D to measure vertical column drift instead of text-line slope
Benjamin Admin
2026-03-04 10:31:19 +01:00
-
9dd77ab54a
fix: move column expansion AFTER sub-column split
Benjamin Admin
2026-03-04 10:07:40 +01:00
-
e426de937c
fix: expand narrow columns + lower dewarp thresholds for small angles
Benjamin Admin
2026-03-04 09:32:47 +01:00
-
0d3f001acb
fix: always include detections in dewarp response, even when no correction applied
Benjamin Admin
2026-03-04 09:05:43 +01:00
-
c484a89b78
fix: dewarp UI shows detection details, quality gate status, confidence bars
Benjamin Admin
2026-03-04 08:39:55 +01:00
-
d5f2ce4659
fix: Fabric.js v6 API compatibility + CLAUDE.md SSH commands
Benjamin Admin
2026-03-03 23:01:19 +01:00
-
ab3ecc7c08
feat: OCR pipeline v2.1 – narrow column OCR, dewarp automation, Fabric.js editor
Benjamin Admin
2026-03-03 22:44:14 +01:00
-
970ec1f548
docs: OCR-Pipeline v2.0.0 – alle Optimierungen 2026-03-03 dokumentiert
Benjamin Admin
2026-03-03 18:42:25 +01:00
-
a610bc75ba
fix: rename LLM-Korrektur to Korrektur in wizard stepper and types
Benjamin Admin
2026-03-03 17:56:46 +01:00
-
153f41358b
fix: remove stale allCells dependency in emptyCellIds memo
Benjamin Admin
2026-03-03 17:39:14 +01:00
-
d1c8075da2
fix: three OCR pipeline UX improvements
Benjamin Admin
2026-03-03 17:31:55 +01:00
-
f3d61a9394
fix: extend initial Tesseract scan to full image width for word detection
Benjamin Admin
2026-03-03 17:08:03 +01:00
-
ab2423bd10
fix: protect numbered list prefixes from 1→I confusion in char fix step
Benjamin Admin
2026-03-03 16:46:45 +01:00
-
b914b6f49d
fix(columns): extend rightmost column to full image width (w) not content right_x
Benjamin Admin
2026-03-03 16:25:07 +01:00
-
123b7ada0b
fix(columns): filter phantom narrow columns + rename step to OCR-Zeichenkorrektur
Benjamin Admin
2026-03-03 16:06:59 +01:00
-
cb61fab77b
fix(rows): filter artifact rows and heal gaps for full OCR height
Benjamin Admin
2026-03-03 15:38:58 +01:00
-
6623a5d10e
fix(columns): extend rightmost column to content right edge (right_x)
Benjamin Admin
2026-03-03 15:26:38 +01:00
-
21ea458fcf
feat(ocr-review): replace LLM with rule-based spell-checker (REVIEW_ENGINE=spell)
Benjamin Admin
2026-03-03 15:04:27 +01:00
-
b1f7fee284
fix(ocr-review): add pipe→1 as valid OCR correction in _is_spurious_change
Benjamin Admin
2026-03-03 14:50:16 +01:00
-
dc5d76ecf5
fix(llm-review): think=false und Logging in Streaming-Version fehlten
Benjamin Admin
2026-03-03 14:43:42 +01:00
-
1ac47cd9b7
fix(llm-review): JSON-Parse-Fehler durch Control-Zeichen beheben
Benjamin Admin
2026-03-03 14:37:16 +01:00
-
fa8e38db2d
fix(llm-review): Pre-Filter entfernt — alle Einträge ans LLM senden
Benjamin Admin
2026-03-03 14:29:46 +01:00
-
f1b6246838
fix(llm-review): Diagnose-Logging + think=false + <think>-Tag-Stripping
Benjamin Admin
2026-03-03 14:13:08 +01:00
-
2fce92d7b1
fix(llm-review): LLM übersetzt nicht mehr — nur noch OCR-Ziffernfehler
Benjamin Admin
2026-03-03 13:48:54 +01:00
-
7eb03ca8d1
fix(ocr-pipeline): IndentationError in auto-mode deskew block
Benjamin Admin
2026-03-03 13:21:49 +01:00
-
50e1c964ee
feat(klausur-service): OCR-Pipeline Optimierungen (Improvements 2-4)
Benjamin Admin
2026-03-03 13:13:20 +01:00
-
2e0f8632f8
feat(klausur): Handschrift entfernen + Klausur-HTR implementiert
Benjamin Admin
2026-03-03 12:04:26 +01:00
-
606bef0591
fix(ocr-pipeline): overlap-based word assignment and empty row filtering
Benjamin Admin
2026-03-03 11:00:29 +01:00
-
ccba2bb887
fix(ocr-pipeline): show sub-columns in reconstruction and LLM review steps
Benjamin Admin
2026-03-03 10:36:27 +01:00
-
ef6237ffdf
refactor(coolify): externalize postgres, qdrant, S3
Sharang Parnerkar
2026-03-03 09:23:32 +01:00
-
75bca1f02d
fix(ocr-cells): align cell bboxes exactly to column/row coordinates
Benjamin Admin
2026-03-03 09:21:56 +01:00
-
4d428980c1
refactor(word-step): make table fully generic and fix marker-only row filter
Benjamin Admin
2026-03-03 08:45:24 +01:00
-
dea3349b23
fix(ocr-pipeline): preserve sub-column data in vocab table display
Benjamin Admin
2026-03-03 08:06:15 +01:00
-
0d72f2c836
fix(sub-columns): protect sub-columns from column_ignore pre-filter
Benjamin Admin
2026-03-03 07:55:53 +01:00
-
d6a8c1d821
fix(streaming): include page_ref columns in SSE metadata
Benjamin Admin
2026-03-03 07:48:07 +01:00
-
6527beae03
fix(sub-columns): exclude header/footer words from alignment clustering
Benjamin Admin
2026-03-03 07:33:54 +01:00
-
3904ddb493
fix(sub-columns): convert relative word positions to absolute coords for split
Benjamin Admin
2026-03-02 19:16:13 +01:00
-
6e1a349eed
fix(tests): adjust word counts so 10% threshold works correctly
Benjamin Admin
2026-03-02 19:00:14 +01:00
-
7252f9a956
refactor(ocr-pipeline): use left-edge alignment approach for sub-column detection
Benjamin Admin
2026-03-02 18:56:38 +01:00
-
f13116345b
fix(tests): use correct bbox_pct dict format in _cells_to_vocab_entries tests
Benjamin Admin
2026-03-02 18:26:24 +01:00
-
991984d9c3
fix(tests): pass columns_meta arg to _cells_to_vocab_entries tests
Benjamin Admin
2026-03-02 18:23:55 +01:00
-
1a246eb059
feat(ocr-pipeline): generic sub-column detection via left-edge clustering
Benjamin Admin
2026-03-02 18:18:02 +01:00
-
0532b2a797
fix(ocr-pipeline): skip edge-touching gaps in header/footer detection
Benjamin Admin
2026-03-02 17:54:49 +01:00
-
f1fcc67357
fix(ocr-pipeline): clamp gap detection to img_h to avoid dewarp padding
Benjamin Admin
2026-03-02 17:06:58 +01:00
-
c8981423d4
feat(ocr-pipeline): distinguish header/footer vs margin_top/margin_bottom
Benjamin Admin
2026-03-02 16:55:41 +01:00
-
f615c5f66d
feat(ocr-pipeline): generic header/footer detection via projection gap analysis
Benjamin Admin
2026-03-02 16:13:48 +01:00
-
a052f73de3
fix(ocr-pipeline): pass left_x/right_x to classify_column_types in API path
Benjamin Admin
2026-03-02 15:42:39 +01:00
-
34ccdd5fd1
feat(ocr-pipeline): filter scan artifacts in content bounds and add margin regions
Benjamin Admin
2026-03-02 15:29:18 +01:00
-
e718353d9f
feat(ocr-pipeline): 6 systematic improvements for robustness, performance & UX
Benjamin Admin
2026-03-02 14:46:38 +01:00
-
c3a924a620
fix(ocr-pipeline): merge phonetic-only rows and fix bracket noise filter
Benjamin Admin
2026-03-02 14:14:20 +01:00
-
650f15bc1b
fix(ocr-pipeline): tolerate dictionary punctuation in noise filter
Benjamin Admin
2026-03-02 13:12:40 +01:00
-
40a77a82f6
fix(ocr-pipeline): use midpoint boundaries for column word assignment
Benjamin Admin
2026-03-02 12:53:56 +01:00
-
87931c35e4
fix(ocr-pipeline): stop noise filter from stripping parenthesized words
Benjamin Admin
2026-03-02 12:51:28 +01:00
-
29b1d95acc
fix(ocr-pipeline): improve word-column assignment and LLM review accuracy
Benjamin Admin
2026-03-02 12:40:26 +01:00
-
dbf0db0c13
feat(ocr-pipeline): improve LLM review UI + add reconstruction step
Benjamin Admin
2026-03-02 12:19:21 +01:00
-
2a493890b6
feat(ocr-pipeline): add SSE streaming and phonetic filter to LLM review
Benjamin Admin
2026-03-02 11:46:06 +01:00
-
e171a736e7
fix(ocr-pipeline): increase LLM timeout to 300s and disable qwen3 thinking
Benjamin Admin
2026-03-02 11:31:03 +01:00
-
938d1d69cf
feat(ocr-pipeline): add LLM-based OCR correction step (Step 6)
Benjamin Admin
2026-03-02 11:13:17 +01:00
-
e9f368d3ec
feat(ocr-pipeline): add abbreviation allowlist to noise filter
Benjamin Admin
2026-03-02 10:46:54 +01:00
-
3028f421b4
feat(ocr-pipeline): add cell text noise filter for OCR artifacts
Benjamin Admin
2026-03-02 10:19:31 +01:00
-
2b1c499d54
fix(ocr-pipeline): filter OCR noise from image areas and artifacts
Benjamin Admin
2026-03-02 09:56:54 +01:00
-
72cc77dcf4
fix(ocr-pipeline): cells = result, no post-processing content shuffling
Benjamin Admin
2026-03-02 09:41:30 +01:00
-
e3f939a628
refactor(ocr-pipeline): make post-processing fully generic
Benjamin Admin
2026-03-02 09:27:30 +01:00
-
6bca3370e0
fix(ocr-pipeline): fix vocab post-processing destroying correct cell results
Benjamin Admin
2026-03-02 09:16:50 +01:00
-
befc44d2dd
perf(ocr-pipeline): limit cell-OCR fallback to EN/DE columns only
Benjamin Admin
2026-03-02 09:01:08 +01:00
-
6db3c02db4
fix(admin-lehrer): force unique build ID to bust browser caches
Benjamin Admin
2026-03-02 08:54:05 +01:00
-
8f2c2e8f68
feat(ocr-pipeline): hybrid word-lookup with cell-OCR fallback
Benjamin Admin
2026-03-02 08:21:12 +01:00
-
50ad06f43a
fix(ocr-pipeline): always run fresh word detection, skip stale cache
Benjamin Admin
2026-03-02 08:05:13 +01:00
-
2c4160e4c4
fix(ocr-pipeline): exclusive word-to-column assignment prevents duplicates
Benjamin Admin
2026-03-02 07:54:45 +01:00
-
9bbde1c03e
fix(ocr-pipeline): re-populate row.words for word-lookup in Step 5
Benjamin Admin
2026-03-02 07:38:33 +01:00
-
77869e32f4
feat(ocr-pipeline): use word-lookup instead of cell-OCR for cell grid
Benjamin Admin
2026-03-02 07:24:46 +01:00
-
89b5f49918
fix(ocr-pipeline): filter phantom rows with word_count=0 from cell grid
Benjamin Admin
2026-03-01 18:40:13 +01:00
-
7f27783008
feat(ocr-pipeline): add SSE streaming for word recognition (Step 5)
Benjamin Admin
2026-03-01 17:54:20 +01:00
-
a666e883da
fix(ocr-pipeline): exclude header/footer/page_ref from cell grid columns
Benjamin Admin
2026-03-01 17:33:48 +01:00
-
27b895a848
feat(ocr-pipeline): generic cell-grid with optional vocab mapping
Benjamin Admin
2026-03-01 17:22:56 +01:00
-
3bcb7aa638
fix(ocr-pipeline): remove overzealous grid row count validation
Benjamin Admin
2026-03-01 13:01:27 +01:00
-
c4f2e6554e
fix(ocr-pipeline): prevent grid from producing more rows than gap-based
Benjamin Admin
2026-03-01 12:52:41 +01:00
-
8e861e5a4d
fix(ocr-pipeline): use gap-based row height for cluster tolerance
Benjamin Admin
2026-03-01 12:34:15 +01:00
-
4970ca903e
fix(ocr-pipeline): invalidate downstream results when steps are re-run
Benjamin Admin
2026-03-01 12:24:44 +01:00
-
97d4355aa9
fix(ocr-pipeline): group words by vertical center, merge close clusters
Benjamin Admin
2026-03-01 12:14:42 +01:00
-
8ad5823fd8
feat(ocr-pipeline): word-center grid with section-break detection
Benjamin Admin
2026-03-01 12:04:08 +01:00
-
ec47045c15
feat(ocr-pipeline): uniform grid regularization for row detection (Step 7)
Benjamin Admin
2026-03-01 11:50:50 +01:00
-
ba65e47654
feat(ocr-pipeline): move oversized row splitting from Step 5 to Step 4
Benjamin Admin
2026-03-01 11:46:18 +01:00
-
8507e2e035
fix(ocr-pipeline): split oversized cells before OCR to capture all text
Benjamin Admin
2026-03-01 11:32:10 +01:00
-
854d8b431b
feat(rag-qa): add 14 missing PDF mappings for EDPB, ENISA, EDPS, TMG, UrhG
Benjamin Admin
2026-03-01 11:10:09 +01:00
-
f2521d2b9e
feat(ocr-pipeline): British/American IPA pronunciation choice
Benjamin Admin
2026-03-01 11:08:52 +01:00
-
954d21e469
fix: use local Inter font to avoid Google Fonts timeout in Docker build
Benjamin Admin
2026-02-28 21:26:34 +01:00
-
010616be5a
fix(ocr-pipeline): generic example attachment + cell padding
Benjamin Admin
2026-02-28 21:24:28 +01:00
-
e3aa8e899e
feat(rag-qa): add fullscreen mode for split-view chunk browser
Benjamin Admin
2026-02-28 21:23:32 +01:00
-
266b9dfad3
Fix PDF 404: default to bp_compliance_ce collection, add PDF existence check
Benjamin Admin
2026-02-28 21:13:26 +01:00
-
ab294d5a6f
feat(ocr-pipeline): deterministic post-processing pipeline
Benjamin Admin
2026-02-28 21:00:09 +01:00
-
b48cd8bb46
Fix ChunkBrowserQA layout: proper height constraints, remove bottom nav duplication
Benjamin Admin
2026-02-28 20:24:50 +01:00
-
d481e0087b
deps: add eng-to-ipa for IPA dictionary lookup
Benjamin Admin
2026-02-28 20:23:40 +01:00
-
f7e0f2bb4f
feat(ocr-pipeline): line breaks, hyphen rejoin & oversized row splitting
Benjamin Admin
2026-02-28 18:49:28 +01:00
-
e7fb9d59f1
Fix ChunkBrowserQA: use regulation_id from Qdrant payload instead of regulation_code
Benjamin Admin
2026-02-28 18:22:12 +01:00
-
859342300e
fix(ocr-pipeline): configure RapidOCR for German + tighter word detection
Benjamin Admin
2026-02-28 18:17:49 +01:00
-
8c42fefa77
feat(rag): add QA Split-View Chunk-Browser for ingestion verification
Benjamin Admin
2026-02-28 17:46:11 +01:00
-
984dfab975
fix(ocr-pipeline): add libgl1 for RapidOCR OpenCV dependency
Benjamin Admin
2026-02-28 17:30:12 +01:00