fix(ucca): Cross-Reg 0070 — beide Regelwerk-Domaenen im Router-Top-K (Known Defects 0)
CI / detect-changes (pull_request) Successful in 13s
CI / branch-name (pull_request) Successful in 1s
CI / guardrail-integrity (pull_request) Successful in 9s
CI / secret-scan (pull_request) Successful in 10s
CI / dep-audit (pull_request) Failing after 56s
CI / sbom-scan (pull_request) Failing after 59s
CI / build-sha-integrity (pull_request) Successful in 5s
CI / validate-canonical-controls (pull_request) Successful in 3s
CI / test-python-document-crawler (pull_request) Successful in 15s
CI / test-python-dsms-gateway (pull_request) Successful in 13s
CI / loc-budget (pull_request) Successful in 23s
CI / go-lint (pull_request) Failing after 51s
CI / python-lint (pull_request) Failing after 18s
CI / nodejs-lint (pull_request) Failing after 1m8s
CI / nodejs-build (pull_request) Successful in 3m6s
CI / test-go (pull_request) Successful in 1m3s
CI / iace-gt-coverage (pull_request) Successful in 18s
CI / test-python-backend (pull_request) Successful in 28s

Der einzige offene Retrieval-Haertefall: eine Query mit >=2 genannten Regelwerken
("CRA und Maschinenverordnung") lieferte nur die keyword-dominante Domaene (CRA),
MaschVO fiel raus. Drei zusammenwirkende Ursachen, alle behoben:

1. CodeValues-Mismatch: MaschVO heisst je Collection anders (Slice MASCHVO ·
   gesetze MVO · ce MACHINERY/MASCHINENVO), der Catalog hatte nur ["MASCHVO","MaschVO"]
   → Filter fand MaschVO nur in der Slice. Jetzt alle Varianten als CodeValues.
2. Per-Collection-Truncation: der Router gab perColl=3 → searchMultiRegulation holte
   3+3=6, schnitt auf 3 → konnte eine Domaene je Collection verlieren. Multi-Reg-Queries
   bekommen jetzt perColl = 3*len(regs).
3. Router-Score-Merge starvte die nicht-dominante Domaene. Neue balanceByRegulation()
   gruppiert den gemergten Pool per Regelwerk (exakter regulation_code-Match) und nimmt
   round-robin ueber die genannten Domaenen → jede Domaene mit Treffern ist im Top-K.
   Generisch ueber jede genannte Menge; Single-Domain-Pfad unveraendert.

Validierung: Go-Unit (balanceByRegulation: dominante CRA verdraengt MaschVO NICHT mehr);
0070-e2e gegen dev (Retrieve() → [CRA MVO CRA MVO CRA MVO CRA MASCHINENVO] = beide
Domaenen, vorher nur CRA); CB-100-Stichprobe REGR 0 (Gain-Profil unveraendert).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-30 15:08:18 +02:00
parent 08086ee75f
commit f2d445b891
4 changed files with 138 additions and 3 deletions
@@ -55,6 +55,15 @@ func (c *LegalRAGClient) Retrieve(ctx context.Context, query string, topK int) (
collections = append(collections, c.kbSliceCollection)
}
// Cross-regulation queries (>=2 explicitly named regulations) get a larger per-collection budget
// so each collection's multi-regulation search isn't truncated down to the keyword-dominant
// domain; the final per-regulation balancing then guarantees every named domain in the top-K.
regs := detectRegulations(query)
perColl := routerPerCollectionTopK
if len(regs) >= 2 {
perColl = routerPerCollectionTopK * len(regs)
}
// Warm the full-text indexes sequentially first so the concurrent fan-out below only READS the
// shared textIndexEnsured map (the writes happen here, serialized) — closes the cold-start map
// race deterministically. Best-effort: a missing collection just stays un-indexed (hybrid then
@@ -71,19 +80,25 @@ func (c *LegalRAGClient) Retrieve(ctx context.Context, query string, topK int) (
wg.Add(1)
go func(i int, coll string) {
defer wg.Done()
if res, err := c.searchInternal(ctx, coll, query, nil, routerPerCollectionTopK); err == nil {
if res, err := c.searchInternal(ctx, coll, query, nil, perColl); err == nil {
out[i] = res
}
}(i, coll)
}
wg.Wait()
merged := make([]LegalSearchResult, 0, len(collections)*routerPerCollectionTopK)
merged := make([]LegalSearchResult, 0, len(collections)*perColl)
for _, r := range out {
merged = append(merged, r...)
}
merged = dedupResults(merged)
sort.SliceStable(merged, func(a, b int) bool { return merged[a].Score > merged[b].Score })
// Cross-regulation: guarantee every named domain is represented (0070-class fix) instead of
// letting a global score-sort starve the non-dominant domain.
if len(regs) >= 2 {
return balanceByRegulation(merged, regs, topK), nil
}
if len(merged) > topK {
merged = merged[:topK]
}