feat(ai-sdk): legal-corpus coverage + Phase-2 citation-graph assessment #33

2026-06-23T17:49:36Z

Benjamin_Boenisch commented

2026-06-23 17:49:36 +00:00

Summary

Coverage: GET /sdk/v1/rag/legal-corpus + a structure section on /sdk/coverage — per-act distinct articles/annexes/recitals + chunk count, so the ingested corpus is no longer a black box.
Phase 2 assessment: additive assessment object on /sdk/v1/rag/search — primary_norm, connected_norms (citation graph references_out/in), cross_regime, human_review_flag, NORM-level winner_margin, score_reasoning. Per-result schema frozen (graph fields internal).
Opt-in graph expansion (RAG_GRAPH_EXPANSION=true, default off): top hits pull referenced norms into the pool via the precise edge. Measured to add no rank gain over binding augmentation → ships as an off-by-default recall safety net. The graph EXPLAINS retrieval, it does not expand it by default.

Test plan

go build + go test (ucca + handlers) green
gofmt + vet clean, LOC gate exit 0
Live-validated on dev (MaschinenVO Anhang III + connected norms; DORA cross-regime)
go-lint green on CI
dev smoke after merge: assessment present in search response

## Summary - **Coverage**: `GET /sdk/v1/rag/legal-corpus` + a structure section on `/sdk/coverage` — per-act distinct articles/annexes/recitals + chunk count, so the ingested corpus is no longer a black box. - **Phase 2 assessment**: additive `assessment` object on `/sdk/v1/rag/search` — `primary_norm`, `connected_norms` (citation graph references_out/in), `cross_regime`, `human_review_flag`, NORM-level `winner_margin`, `score_reasoning`. Per-result schema frozen (graph fields internal). - **Opt-in graph expansion** (`RAG_GRAPH_EXPANSION=true`, default off): top hits pull referenced norms into the pool via the precise edge. Measured to add no rank gain over binding augmentation → ships as an off-by-default recall safety net. The graph EXPLAINS retrieval, it does not expand it by default. ## Test plan - [x] go build + go test (ucca + handlers) green - [x] gofmt + vet clean, LOC gate exit 0 - [x] Live-validated on dev (MaschinenVO Anhang III + connected norms; DORA cross-regime) - [ ] go-lint green on CI - [ ] dev smoke after merge: `assessment` present in search response

Benjamin_Boenisch added 2 commits 2026-06-23 17:49:42 +00:00

feat(ai-sdk): legal-corpus structure endpoint + coverage page 4c99773fa1

Expose GET /sdk/v1/rag/legal-corpus, which scrolls the eur-lex legal
corpus (filtered to a few hundred points regardless of total size) and
aggregates each ingested act's composition: distinct articles, annexes,
recitals and chunk count. Surface it as a new section on /sdk/coverage so
the ingested corpus is no longer a black box — a developer SEES what each
act actually contains, not only its name.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(ai-sdk): citation-graph assessment + opt-in graph expansion (Phase 2)

CI / detect-changes (pull_request) Successful in 14s

Details

CI / branch-name (pull_request) Successful in 1s

Details

CI / guardrail-integrity (pull_request) Successful in 16s

Details

CI / secret-scan (pull_request) Successful in 18s

Details

CI / dep-audit (pull_request) Failing after 1m2s

Details

CI / sbom-scan (pull_request) Failing after 1m10s

Details

CI / build-sha-integrity (pull_request) Successful in 13s

Details

CI / validate-canonical-controls (pull_request) Successful in 14s

Details

CI / loc-budget (pull_request) Successful in 23s

Details

CI / go-lint (pull_request) Successful in 50s

Details

CI / python-lint (pull_request) Failing after 18s

Details

CI / nodejs-lint (pull_request) Failing after 1m8s

Details

CI / nodejs-build (pull_request) Successful in 3m7s

Details

CI / test-go (pull_request) Successful in 1m6s

Details

CI / iace-gt-coverage (pull_request) Successful in 26s

Details

CI / test-python-backend (pull_request) Successful in 33s

Details

CI / test-python-document-crawler (pull_request) Successful in 21s

Details

CI / test-python-dsms-gateway (pull_request) Successful in 21s

Details

989d9f6f91

Add an `assessment` object to the legal RAG search response: primary norm,
connected norms (from the citation graph references_out/in of the primary),
cross_regime, human_review_flag, a norm-level winner_margin and a short
reasoning string. The margin is computed over DISTINCT norms, so a long
article split into several chunks no longer fabricates uncertainty. The
per-result schema stays frozen — graph fields are internal (json:"-").

Also wire optional citation-graph expansion (RAG_GRAPH_EXPANSION=true,
default off): top hits pull their referenced norms into the candidate pool
via the precise edge (e.g. Art. 13 CRA -> Anhang I). Measured to add no
rank gain over the existing binding-law augmentation, with +1 Qdrant call
per search and reverse-edge fan-out risk, so it ships off-by-default as a
recall safety net. The graph EXPLAINS retrieval (assessment), it does not
expand it by default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Benjamin_Boenisch added 1 commit 2026-06-23 22:22:51 +00:00

feat(ai-sdk): demote superseded pre-eu-v1 sources in authority rerank

CI / detect-changes (pull_request) Successful in 18s

Details

CI / branch-name (pull_request) Successful in 1s

Details

CI / guardrail-integrity (pull_request) Successful in 14s

Details

CI / secret-scan (pull_request) Successful in 16s

Details

CI / dep-audit (pull_request) Failing after 1m3s

Details

CI / sbom-scan (pull_request) Failing after 1m8s

Details

CI / build-sha-integrity (pull_request) Successful in 16s

Details

CI / validate-canonical-controls (pull_request) Successful in 14s

Details

CI / loc-budget (pull_request) Successful in 24s

Details

CI / go-lint (pull_request) Successful in 57s

Details

CI / python-lint (pull_request) Failing after 20s

Details

CI / nodejs-lint (pull_request) Failing after 1m13s

Details

CI / nodejs-build (pull_request) Successful in 3m9s

Details

CI / test-go (pull_request) Successful in 1m3s

Details

CI / iace-gt-coverage (pull_request) Successful in 26s

Details

CI / test-python-backend (pull_request) Successful in 36s

Details

CI / test-python-document-crawler (pull_request) Successful in 20s

Details

CI / test-python-dsms-gateway (pull_request) Successful in 18s

Details

c28c532958

The old pre-eu-v1 corpus chunks (un-annotated CRA/AI Act/DORA/NIS2/DSGVO
duplicates + the old Machinery Directive and its guide) are tagged
status=superseded / use_for_primary=false in the vector store. Honor that
in the rerank: a superseded result takes a fixed penalty so the eu-v1 norm
wins default questions, while the old source stays in the pool (demoted,
not hidden) and remains findable for history / transition questions.

Verified on dev: "CRA Sicherheitsupdates" now returns CRA Anhang I (eu-v1)
at #1 instead of an un-annotated old chunk; MaschinenVO outranks the old
Machinery Directive/guide; superseded chunks remain retrievable lower down.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Benjamin_Boenisch merged commit 230dc05287 into main

2026-06-24 06:37:23 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Benjamin_Boenisch/breakpilot-compliance#33