When the same URL is used for multiple document types (e.g. /datenschutz
for DSI + Cookie + DSB), the section splitter now:
- Detects duplicate URLs and fetches text only once
- Splits text at classified headings (Cookie, Google Analytics, etc.)
- Assigns matching sections to each doc_type
- DSI always keeps the full text
Extracted to section_splitter.py (170 LOC) to keep routes under 500.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>