feat(ai-sdk): demote superseded pre-eu-v1 sources in authority rerank
CI / detect-changes (pull_request) Successful in 18s
CI / branch-name (pull_request) Successful in 1s
CI / guardrail-integrity (pull_request) Successful in 14s
CI / secret-scan (pull_request) Successful in 16s
CI / dep-audit (pull_request) Failing after 1m3s
CI / sbom-scan (pull_request) Failing after 1m8s
CI / build-sha-integrity (pull_request) Successful in 16s
CI / validate-canonical-controls (pull_request) Successful in 14s
CI / loc-budget (pull_request) Successful in 24s
CI / go-lint (pull_request) Successful in 57s
CI / python-lint (pull_request) Failing after 20s
CI / nodejs-lint (pull_request) Failing after 1m13s
CI / nodejs-build (pull_request) Successful in 3m9s
CI / test-go (pull_request) Successful in 1m3s
CI / iace-gt-coverage (pull_request) Successful in 26s
CI / test-python-backend (pull_request) Successful in 36s
CI / test-python-document-crawler (pull_request) Successful in 20s
CI / test-python-dsms-gateway (pull_request) Successful in 18s

The old pre-eu-v1 corpus chunks (un-annotated CRA/AI Act/DORA/NIS2/DSGVO
duplicates + the old Machinery Directive and its guide) are tagged
status=superseded / use_for_primary=false in the vector store. Honor that
in the rerank: a superseded result takes a fixed penalty so the eu-v1 norm
wins default questions, while the old source stays in the pool (demoted,
not hidden) and remains findable for history / transition questions.

Verified on dev: "CRA Sicherheitsupdates" now returns CRA Anhang I (eu-v1)
at #1 instead of an un-annotated old chunk; MaschinenVO outranks the old
Machinery Directive/guide; superseded chunks remain retrievable lower down.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-24 00:22:37 +02:00
parent 989d9f6f91
commit c28c532958
4 changed files with 50 additions and 8 deletions
@@ -4,14 +4,15 @@ import "sort"
// Re-ranking coefficients (validated in the offline golden harness; Phase A — conservative).
const (
authorityCoef = 0.40 // * weight/100
jurisdictionGain = 0.05 // binding/guidance from DE or EU
foreignPenalty = 0.60 // foreign law on a DE/EU question (demoted, not removed)
unknownPenalty = 0.08
domainMatchGain = 0.15
offDomainPenalty = 0.10 // off-domain binding (demoted, not removed)
scopePenalty = 0.25 // BDSG Teil 3 (law enforcement) on a general DP question
topicGain = 0.18 // amplifier only
authorityCoef = 0.40 // * weight/100
jurisdictionGain = 0.05 // binding/guidance from DE or EU
foreignPenalty = 0.60 // foreign law on a DE/EU question (demoted, not removed)
unknownPenalty = 0.08
domainMatchGain = 0.15
offDomainPenalty = 0.10 // off-domain binding (demoted, not removed)
scopePenalty = 0.25 // BDSG Teil 3 (law enforcement) on a general DP question
topicGain = 0.18 // amplifier only
supersededPenalty = 0.50 // superseded Alt-Quelle (pre-eu-v1): demoted, nicht versteckt
)
// authorityScore computes the normative relevance of a result for a query. It augments the
@@ -20,6 +21,12 @@ func authorityScore(query string, r LegalSearchResult, qDomain string, qForeign
info := classifyAuthority(r)
score := r.Score + authorityCoef*float64(info.weight)/100.0
if r.Superseded {
// Alt-Quelle (pre-eu-v1): Default-Fragen sollen die eu-v1-Norm sehen. Demoted,
// nicht entfernt — fuer Historie/Uebergangsfragen bleibt sie auffindbar.
score -= supersededPenalty
}
if info.jurisdiction == "CH" && !qForeign {
score -= foreignPenalty // Fremdrecht bei DE/EU-Frage: demoted, nicht geloescht
} else {