Files
breakpilot-core/embedding-service
Benjamin Admin 6ab10415d8 feat(embedding): add structural metadata to legal chunking (Block D1)
chunk_text_legal_structured() returns metadata per chunk:
- section: "§ 312k", "Art. 5"
- section_title: "Kündigungsbutton"
- paragraph: "Abs. 1", "Nr. 3"
- paragraph_num: 1, 3
- page: (prepared for PDF integration)
- index: sequential position

/chunk endpoint now returns chunks_with_metadata alongside plain chunks.
Backward compatible — existing consumers use chunks field unchanged.

New regex: _PARAGRAPH_RE (Abs/Nr/Satz/lit), _SECTION_NUMBER_RE
New functions: _parse_section_metadata(), _extract_paragraph_ref()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 15:25:23 +02:00
..