feat(control-pipeline): RecitalIngester for EU act recitals (Parser 2)
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 38s
CI / test-bqas (push) Successful in 40s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 38s
CI / test-bqas (push) Successful in 40s
Add services/recital_ingester.py — parses EU act recitals (Erwägungsgründe) from the eur-lex/CELLAR preamble via the id="rct_N" markers (the table layout that defeats a naive article parser) and tags them as a SEPARATE interpretative source: source_class=recital, authority_weight=60, use_for_primary=false, so they rank below binding articles and surface only as interpretation context. Reuses the Parser-1 download + helpers. Add scripts/ingest_recitals.py (skip-by-existing, no auto re-ingest) + tests/fixture. Tested: 4 unit tests over a synthetic rct_N fixture, ruff + mypy clean, real CELLAR parse of DORA verified end-to-end (106 recitals, interpretative metadata). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,19 @@
|
||||
<!DOCTYPE html>
|
||||
<html><body>
|
||||
<p class="oj-normal">DAS EUROPÄISCHE PARLAMENT — in Erwägung nachstehender Gründe:</p>
|
||||
<div class="eli-subdivision" id="rct_1">
|
||||
<table><tbody><tr>
|
||||
<td><p class="oj-normal">(1)</p></td>
|
||||
<td><p class="oj-normal">Dieser erste Erwaegungsgrund erklaert den Hintergrund der Verordnung ausfuehrlich und verweist auf Artikel 5.</p></td>
|
||||
</tr></tbody></table>
|
||||
</div>
|
||||
<div class="eli-subdivision" id="rct_2">
|
||||
<table><tbody><tr>
|
||||
<td><p class="oj-normal">(2)</p></td>
|
||||
<td><p class="oj-normal">Der zweite Erwaegungsgrund ergaenzt den ersten und nennt weitere Ziele der Regelung im Detail.</p></td>
|
||||
</tr></tbody></table>
|
||||
</div>
|
||||
<p class="oj-ti-art">Artikel 1</p>
|
||||
<p class="oj-sti-art">Gegenstand</p>
|
||||
<p class="oj-normal">Der eigentliche Artikeltext, der KEIN Erwaegungsgrund ist und nicht als solcher geparst werden darf.</p>
|
||||
</body></html>
|
||||
Reference in New Issue
Block a user