feat(ai-sdk): legal-corpus coverage + Phase-2 citation-graph assessment #33

Merged
Benjamin_Boenisch merged 3 commits from feat/coverage-legal-corpus-structure into main 2026-06-24 06:37:23 +00:00
Owner

Summary

  • Coverage: GET /sdk/v1/rag/legal-corpus + a structure section on /sdk/coverage — per-act distinct articles/annexes/recitals + chunk count, so the ingested corpus is no longer a black box.
  • Phase 2 assessment: additive assessment object on /sdk/v1/rag/searchprimary_norm, connected_norms (citation graph references_out/in), cross_regime, human_review_flag, NORM-level winner_margin, score_reasoning. Per-result schema frozen (graph fields internal).
  • Opt-in graph expansion (RAG_GRAPH_EXPANSION=true, default off): top hits pull referenced norms into the pool via the precise edge. Measured to add no rank gain over binding augmentation → ships as an off-by-default recall safety net. The graph EXPLAINS retrieval, it does not expand it by default.

Test plan

  • go build + go test (ucca + handlers) green
  • gofmt + vet clean, LOC gate exit 0
  • Live-validated on dev (MaschinenVO Anhang III + connected norms; DORA cross-regime)
  • go-lint green on CI
  • dev smoke after merge: assessment present in search response
## Summary - **Coverage**: `GET /sdk/v1/rag/legal-corpus` + a structure section on `/sdk/coverage` — per-act distinct articles/annexes/recitals + chunk count, so the ingested corpus is no longer a black box. - **Phase 2 assessment**: additive `assessment` object on `/sdk/v1/rag/search` — `primary_norm`, `connected_norms` (citation graph references_out/in), `cross_regime`, `human_review_flag`, NORM-level `winner_margin`, `score_reasoning`. Per-result schema frozen (graph fields internal). - **Opt-in graph expansion** (`RAG_GRAPH_EXPANSION=true`, default off): top hits pull referenced norms into the pool via the precise edge. Measured to add no rank gain over binding augmentation → ships as an off-by-default recall safety net. The graph EXPLAINS retrieval, it does not expand it by default. ## Test plan - [x] go build + go test (ucca + handlers) green - [x] gofmt + vet clean, LOC gate exit 0 - [x] Live-validated on dev (MaschinenVO Anhang III + connected norms; DORA cross-regime) - [ ] go-lint green on CI - [ ] dev smoke after merge: `assessment` present in search response
Benjamin_Boenisch added 2 commits 2026-06-23 17:49:42 +00:00
Expose GET /sdk/v1/rag/legal-corpus, which scrolls the eur-lex legal
corpus (filtered to a few hundred points regardless of total size) and
aggregates each ingested act's composition: distinct articles, annexes,
recitals and chunk count. Surface it as a new section on /sdk/coverage so
the ingested corpus is no longer a black box — a developer SEES what each
act actually contains, not only its name.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
feat(ai-sdk): citation-graph assessment + opt-in graph expansion (Phase 2)
CI / detect-changes (pull_request) Successful in 14s
CI / branch-name (pull_request) Successful in 1s
CI / guardrail-integrity (pull_request) Successful in 16s
CI / secret-scan (pull_request) Successful in 18s
CI / dep-audit (pull_request) Failing after 1m2s
CI / sbom-scan (pull_request) Failing after 1m10s
CI / build-sha-integrity (pull_request) Successful in 13s
CI / validate-canonical-controls (pull_request) Successful in 14s
CI / loc-budget (pull_request) Successful in 23s
CI / go-lint (pull_request) Successful in 50s
CI / python-lint (pull_request) Failing after 18s
CI / nodejs-lint (pull_request) Failing after 1m8s
CI / nodejs-build (pull_request) Successful in 3m7s
CI / test-go (pull_request) Successful in 1m6s
CI / iace-gt-coverage (pull_request) Successful in 26s
CI / test-python-backend (pull_request) Successful in 33s
CI / test-python-document-crawler (pull_request) Successful in 21s
CI / test-python-dsms-gateway (pull_request) Successful in 21s
989d9f6f91
Add an `assessment` object to the legal RAG search response: primary norm,
connected norms (from the citation graph references_out/in of the primary),
cross_regime, human_review_flag, a norm-level winner_margin and a short
reasoning string. The margin is computed over DISTINCT norms, so a long
article split into several chunks no longer fabricates uncertainty. The
per-result schema stays frozen — graph fields are internal (json:"-").

Also wire optional citation-graph expansion (RAG_GRAPH_EXPANSION=true,
default off): top hits pull their referenced norms into the candidate pool
via the precise edge (e.g. Art. 13 CRA -> Anhang I). Measured to add no
rank gain over the existing binding-law augmentation, with +1 Qdrant call
per search and reverse-edge fan-out risk, so it ships off-by-default as a
recall safety net. The graph EXPLAINS retrieval (assessment), it does not
expand it by default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Benjamin_Boenisch added 1 commit 2026-06-23 22:22:51 +00:00
feat(ai-sdk): demote superseded pre-eu-v1 sources in authority rerank
CI / detect-changes (pull_request) Successful in 18s
CI / branch-name (pull_request) Successful in 1s
CI / guardrail-integrity (pull_request) Successful in 14s
CI / secret-scan (pull_request) Successful in 16s
CI / dep-audit (pull_request) Failing after 1m3s
CI / sbom-scan (pull_request) Failing after 1m8s
CI / build-sha-integrity (pull_request) Successful in 16s
CI / validate-canonical-controls (pull_request) Successful in 14s
CI / loc-budget (pull_request) Successful in 24s
CI / go-lint (pull_request) Successful in 57s
CI / python-lint (pull_request) Failing after 20s
CI / nodejs-lint (pull_request) Failing after 1m13s
CI / nodejs-build (pull_request) Successful in 3m9s
CI / test-go (pull_request) Successful in 1m3s
CI / iace-gt-coverage (pull_request) Successful in 26s
CI / test-python-backend (pull_request) Successful in 36s
CI / test-python-document-crawler (pull_request) Successful in 20s
CI / test-python-dsms-gateway (pull_request) Successful in 18s
c28c532958
The old pre-eu-v1 corpus chunks (un-annotated CRA/AI Act/DORA/NIS2/DSGVO
duplicates + the old Machinery Directive and its guide) are tagged
status=superseded / use_for_primary=false in the vector store. Honor that
in the rerank: a superseded result takes a fixed penalty so the eu-v1 norm
wins default questions, while the old source stays in the pool (demoted,
not hidden) and remains findable for history / transition questions.

Verified on dev: "CRA Sicherheitsupdates" now returns CRA Anhang I (eu-v1)
at #1 instead of an un-annotated old chunk; MaschinenVO outranks the old
Machinery Directive/guide; superseded chunks remain retrievable lower down.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Benjamin_Boenisch merged commit 230dc05287 into main 2026-06-24 06:37:23 +00:00
Sign in to join this conversation.