Add services/recital_ingester.py — parses EU act recitals (Erwägungsgründe)
from the eur-lex/CELLAR preamble via the id="rct_N" markers (the table layout
that defeats a naive article parser) and tags them as a SEPARATE interpretative
source: source_class=recital, authority_weight=60, use_for_primary=false, so
they rank below binding articles and surface only as interpretation context.
Reuses the Parser-1 download + helpers. Add scripts/ingest_recitals.py
(skip-by-existing, no auto re-ingest) + tests/fixture.
Tested: 4 unit tests over a synthetic rct_N fixture, ruff + mypy clean, real
CELLAR parse of DORA verified end-to-end (106 recitals, interpretative metadata).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add services/legal_act_ingester.py — the EU eur-lex LegalActIngester engine:
CELLAR download (with eur-lex fallback, bypassing the HTTP 202 web block on
large acts like DORA), parse into articles + annexes with full authority
metadata + forward citation edges (references_out), and a self-test gate before
upload. Refactor scripts/ingest_eu_regulations.py to use it: parse-based,
per-unit upload with a skip-by-CELEX guard (no automatic re-ingest). Recitals
are intentionally left to a separate ingester (Parser 2).
Tested: parser / metadata / self-test / refs_out over a synthetic eur-lex
fixture (7 tests), ruff + mypy clean, real CELLAR fetch of DORA verified
end-to-end (64 articles, full authority metadata).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>