75dda9ac92c657d13e3f74d107bb4dbadadacba4
EU Official Journal PDFs (AI Act, CRA, NIS2, DSGVO, etc.) use
multi-column layouts that pypdf breaks into fragmented words
("Ar tik el" instead of "Artikel"). pdfplumber handles these correctly.
Backend priority: unstructured > pdfplumber > pypdf (auto mode).
Also increases D5 re-ingestion timeout to 3600s for large PDFs.
58 embedding-service tests passing. pdfplumber: MIT license.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
38.7%
TypeScript
33.8%
Go
22.8%
HTML
2.9%
Shell
0.8%
Other
1%