75dda9ac92c657d13e3f74d107bb4dbadadacba4
EU Official Journal PDFs (AI Act, CRA, NIS2, DSGVO, etc.) use
multi-column layouts that pypdf breaks into fragmented words
("Ar tik el" instead of "Artikel"). pdfplumber handles these correctly.
Backend priority: unstructured > pdfplumber > pypdf (auto mode).
Also increases D5 re-ingestion timeout to 3600s for large PDFs.
58 embedding-service tests passing. pdfplumber: MIT license.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
38.3%
TypeScript
37.8%
Go
18.9%
HTML
3.2%
Shell
0.7%
Other
1.1%