fix: leere Spalten als strukturell behandeln + 2-Spalten-Layout korrekt labeln
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m50s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 16s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m50s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 16s
Spalten mit <=2 Woertern und <15% Breite werden jetzt als column_marker statt als content-Spalte klassifiziert. Bei 2 breiten Content-Spalten wird die rechte als column_example statt column_de gelabelt, da die linke Spalte EN+DE kombiniert enthaelt. OSD-Zoom von 1.0 auf 2.0 erhoeht fuer zuverlaessigere Orientierungserkennung. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1177,7 +1177,7 @@ async def upload_pdf_get_info(
|
||||
if OCR_PIPELINE_AVAILABLE:
|
||||
for pg in range(page_count):
|
||||
try:
|
||||
img_bgr = render_pdf_high_res(content, pg, zoom=1.0)
|
||||
img_bgr = render_pdf_high_res(content, pg, zoom=2.0)
|
||||
_, rotation = detect_and_fix_orientation(img_bgr)
|
||||
if rotation:
|
||||
page_rotations[pg] = rotation
|
||||
|
||||
Reference in New Issue
Block a user