Files
breakpilot-core/control-pipeline/tests
Benjamin Admin 24c618ca2e
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 28s
CI / test-python-voice (push) Successful in 42s
CI / test-bqas (push) Successful in 39s
feat(control-pipeline): GuidanceIngester engine for supervisory guidance (Parser 3)
Add services/guidance_ingester.py — extracts guidance documents (pdfplumber for
PDF, an HTML stripper otherwise; pdfplumber is imported lazily so the module and
its tests load without it) and tags them as a SEPARATE interpretative source:
source_class=supervisory_guidance / authority_weight=70 / bindingness=
interpretative / use_for_primary=false, with references_out to the binding norms
they interpret (Art. N DSGVO / § N BDSG). Guidance therefore ranks below binding
law for obligation questions yet stays retrievable as interpretation context.

supervisory_guidance is reused deliberately: the live re-ranker already weights
it 70 and 8k+ chunks use it (no classifier change, no schema drift). EDPB is the
first target; technical standards (weight 80) are a later separate class.

Tested: 6 unit tests on the text + metadata path (PDF extraction is exercised in
the container), ruff + mypy clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-24 09:41:14 +02:00
..