Compare commits

...

7 Commits

Author SHA1 Message Date
Benjamin Admin 1d65d99d5f style(ucca): gocritic equalFold in balanceByRegulation (go-lint gruen)
CI / detect-changes (pull_request) Successful in 14s
CI / branch-name (pull_request) Successful in 2s
CI / guardrail-integrity (pull_request) Successful in 6s
CI / secret-scan (pull_request) Successful in 6s
CI / dep-audit (pull_request) Failing after 54s
CI / sbom-scan (pull_request) Failing after 58s
CI / build-sha-integrity (pull_request) Successful in 5s
CI / validate-canonical-controls (pull_request) Successful in 4s
CI / loc-budget (pull_request) Successful in 20s
CI / go-lint (pull_request) Successful in 43s
CI / python-lint (pull_request) Failing after 18s
CI / nodejs-lint (pull_request) Failing after 1m10s
CI / nodejs-build (pull_request) Successful in 3m1s
CI / test-go (pull_request) Successful in 1m4s
CI / iace-gt-coverage (pull_request) Successful in 16s
CI / test-python-backend (pull_request) Successful in 27s
CI / test-python-document-crawler (pull_request) Successful in 12s
CI / test-python-dsms-gateway (pull_request) Successful in 13s
strings.EqualFold(code, cv) statt code==strings.ToUpper(cv) — behebt den einzigen
gocritic-Befund auf der neuen Zeile (CI go-lint, new-from-merge-base). Verhalten
unveraendert (case-insensitive exakter regulation_code-Match); Unit + 0070-e2e bleiben gruen.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-30 15:30:58 +02:00
Benjamin Admin f2d445b891 fix(ucca): Cross-Reg 0070 — beide Regelwerk-Domaenen im Router-Top-K (Known Defects 0)
CI / detect-changes (pull_request) Successful in 13s
CI / branch-name (pull_request) Successful in 1s
CI / guardrail-integrity (pull_request) Successful in 9s
CI / secret-scan (pull_request) Successful in 10s
CI / dep-audit (pull_request) Failing after 56s
CI / sbom-scan (pull_request) Failing after 59s
CI / build-sha-integrity (pull_request) Successful in 5s
CI / validate-canonical-controls (pull_request) Successful in 3s
CI / test-python-document-crawler (pull_request) Successful in 15s
CI / test-python-dsms-gateway (pull_request) Successful in 13s
CI / loc-budget (pull_request) Successful in 23s
CI / go-lint (pull_request) Failing after 51s
CI / python-lint (pull_request) Failing after 18s
CI / nodejs-lint (pull_request) Failing after 1m8s
CI / nodejs-build (pull_request) Successful in 3m6s
CI / test-go (pull_request) Successful in 1m3s
CI / iace-gt-coverage (pull_request) Successful in 18s
CI / test-python-backend (pull_request) Successful in 28s
Der einzige offene Retrieval-Haertefall: eine Query mit >=2 genannten Regelwerken
("CRA und Maschinenverordnung") lieferte nur die keyword-dominante Domaene (CRA),
MaschVO fiel raus. Drei zusammenwirkende Ursachen, alle behoben:

1. CodeValues-Mismatch: MaschVO heisst je Collection anders (Slice MASCHVO ·
   gesetze MVO · ce MACHINERY/MASCHINENVO), der Catalog hatte nur ["MASCHVO","MaschVO"]
   → Filter fand MaschVO nur in der Slice. Jetzt alle Varianten als CodeValues.
2. Per-Collection-Truncation: der Router gab perColl=3 → searchMultiRegulation holte
   3+3=6, schnitt auf 3 → konnte eine Domaene je Collection verlieren. Multi-Reg-Queries
   bekommen jetzt perColl = 3*len(regs).
3. Router-Score-Merge starvte die nicht-dominante Domaene. Neue balanceByRegulation()
   gruppiert den gemergten Pool per Regelwerk (exakter regulation_code-Match) und nimmt
   round-robin ueber die genannten Domaenen → jede Domaene mit Treffern ist im Top-K.
   Generisch ueber jede genannte Menge; Single-Domain-Pfad unveraendert.

Validierung: Go-Unit (balanceByRegulation: dominante CRA verdraengt MaschVO NICHT mehr);
0070-e2e gegen dev (Retrieve() → [CRA MVO CRA MVO CRA MVO CRA MASCHINENVO] = beide
Domaenen, vorher nur CRA); CB-100-Stichprobe REGR 0 (Gain-Profil unveraendert).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-30 15:08:18 +02:00
Benjamin_Boenisch 08086ee75f feat: Authority Router — Advisor collection-agnostisch, KB-2026.1 live (#46)
CI / detect-changes (push) Successful in 6s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 5s
CI / validate-canonical-controls (push) Successful in 3s
CI / loc-budget (push) Successful in 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m58s
CI / test-go (push) Successful in 1m0s
CI / iace-gt-coverage (push) Successful in 15s
CI / test-python-backend (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
2026-06-30 12:26:53 +00:00
Benjamin Admin 1e5aaf7103 feat(advisor): Authority Router — Advisor collection-agnostisch, KB-2026.1-Gewinn im Produktpfad
CI / detect-changes (pull_request) Successful in 13s
CI / branch-name (pull_request) Successful in 2s
CI / guardrail-integrity (pull_request) Successful in 5s
CI / secret-scan (pull_request) Successful in 11s
CI / dep-audit (pull_request) Failing after 54s
CI / sbom-scan (pull_request) Failing after 1m1s
CI / build-sha-integrity (pull_request) Successful in 11s
CI / validate-canonical-controls (pull_request) Successful in 7s
CI / loc-budget (pull_request) Successful in 23s
CI / go-lint (pull_request) Successful in 53s
CI / python-lint (pull_request) Failing after 17s
CI / nodejs-lint (pull_request) Failing after 1m6s
CI / nodejs-build (pull_request) Successful in 2m59s
CI / test-go (pull_request) Successful in 1m0s
CI / iace-gt-coverage (pull_request) Successful in 17s
CI / test-python-backend (pull_request) Successful in 26s
CI / test-python-document-crawler (pull_request) Successful in 12s
CI / test-python-dsms-gateway (pull_request) Successful in 8s
Der Advisor fan-outete bisher selbst ueber eine feste Liste expliziter Collections
(advisor-rag.ts) und umging damit das #61-Scope-Routing (das nur den Default-Pfad
routet) → der gemessene +28-Retrieval-Gewinn (CB-100: 53→81, 0 Regr) kam nie beim
Antwort-LLM an. Dieser Router zieht den Fan-out in die Retriever-Schicht:

- SDK: LegalRAGClient.Retrieve() + POST /sdk/v1/rag/retrieve {query, top_k} —
  fan-outet server-seitig ueber die Broad-Authority-Base + die KB-2026.1-Slice bei
  inKBScope, merge+dedup, sortiert nach Authority-Score (rerankByAuthority je
  Collection), top-K. Index-Warmup vor dem nebenlaeufigen Fan-out (Map-Race-frei).
  Per-Env via RAG_ROUTER_COLLECTIONS.
- admin: advisor-rag.ts ruft EINMAL /retrieve statt 6-fach expliziter Collections.
  Advisor ist collection-agnostisch (Vertrag Compiler→Collections→Retriever→Advisor);
  COMPLIANCE_COLLECTIONS/searchCollection entfernt.

Validierung: Go-Unit (Router-Selektion, dedup); e2e gegen dev-Qdrant (echter
Retrieve(), CB-100-Stichprobe stride 5): OLD-hit 11/20 → NEW-hit 15/20, GAIN 4
(alle DS-Guidance), REGR 0 — reproduziert den +28/0-Regr durch den Produktionscode.
TS-Tests auf den Single-/retrieve-Call angepasst.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-30 14:13:09 +02:00
Benjamin_Boenisch af11d21f6e feat(ucca): Blue-Green KB-2026.1 Scope-Routing (authoritative slice) (#45)
CI / detect-changes (push) Successful in 5s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 59s
CI / build-sha-integrity (push) Successful in 4s
CI / validate-canonical-controls (push) Successful in 4s
CI / loc-budget (push) Successful in 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / iace-gt-coverage (push) Successful in 15s
CI / test-python-backend (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
2026-06-30 10:01:59 +00:00
Benjamin Admin e2c74fd243 feat(ucca): Blue-Green „authoritative slice promotion" — KB-2026.1 Scope-Routing
CI / detect-changes (pull_request) Successful in 12s
CI / branch-name (pull_request) Successful in 1s
CI / guardrail-integrity (pull_request) Successful in 9s
CI / secret-scan (pull_request) Successful in 10s
CI / dep-audit (pull_request) Failing after 56s
CI / sbom-scan (pull_request) Failing after 1m1s
CI / build-sha-integrity (pull_request) Successful in 6s
CI / validate-canonical-controls (pull_request) Successful in 3s
CI / loc-budget (pull_request) Successful in 18s
CI / go-lint (pull_request) Successful in 52s
CI / python-lint (pull_request) Failing after 15s
CI / nodejs-lint (pull_request) Failing after 1m12s
CI / nodejs-build (pull_request) Successful in 3m4s
CI / test-go (pull_request) Successful in 1m2s
CI / iace-gt-coverage (pull_request) Successful in 19s
CI / test-python-backend (pull_request) Successful in 27s
CI / test-python-document-crawler (pull_request) Successful in 19s
CI / test-python-dsms-gateway (pull_request) Successful in 15s
Additiv (KEIN CE-Ersatz): faellt eine Query in den KB-2026.1-Scope (DP/CRA/MaschVO/
NIS2/DataAct/DORA/AIAct + EDPB/DSK-Guidance), wird die hochwertige Slice-Collection
`kb_2026_1_build` abgefragt; sonst bleibt der breite Default `bp_compliance_ce`.
Damit werden die Guidance-Intent- + Multi-Reg-Fixes (PR #42/#43) fuer den Slice LIVE,
Broad-Corpus (OWASP/NIST/ENISA/IFRS/ISO) unangetastet -> 0 Regressionen by construction.

- resolveCollection(query, requested): explizit angefragte Collection unveraendert;
  Default-Request -> Slice bei inKBScope, sonst CE. Env RAG_KB_SCOPE_ROUTING=false = Rollback
  ohne Redeploy; RAG_KB_SLICE_COLLECTION ueberschreibt den Slice-Namen.
- inKBScope: detectRegulations (in-Slice-Regelwerke) + DP-Guidance-Marker (edpb/dsk/wp/gl) +
  DP/Compliance-Topics. Bewusst NICHT die generischen Verben aus guidanceIntentSignals
  (sagt/laut) und NICHT enisa/bsi/nist/owasp (die liegen in CE) -> konservativ, in-scope->Slice.

Validierung: Unit (Scoping + resolveCollection); dev-e2e (RUN_E2E, geroutetes Search() gegen
dev): WP248/MaschVO/CRA+MaschVO -> Slice (Treffer da, fehlen in dev-ce); NIST -> CE (NIST-Treffer).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-30 11:49:34 +02:00
Benjamin_Boenisch 8ed99c255d Merge pull request 'fix(api): F821-Regression (Extract-Service-Halb-Refactor) — 7 Route-Dateien' (#44) from fix/api-f821-extract-service-regression into main
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 9s
CI / validate-canonical-controls (push) Successful in 7s
CI / loc-budget (push) Successful in 22s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 27s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
2026-06-30 09:06:08 +00:00
11 changed files with 703 additions and 66 deletions
@@ -51,8 +51,8 @@ describe('advisor-rag', () => {
})
})
describe('queryAdvisorRAG', () => {
it('fragt alle 6 Collections ab und formatiert die Treffer', async () => {
describe('queryAdvisorRAG (Authority Router)', () => {
it('ruft den Router EINMAL auf und formatiert die Treffer', async () => {
mockFetch.mockResolvedValue({
ok: true,
json: async () => ({ results: [{ text: 'Inhalt A', regulation_short: 'DSGVO', score: 0.9 }] }),
@@ -60,19 +60,19 @@ describe('advisor-rag', () => {
const result = await mod.queryAdvisorRAG('Was ist eine DSFA?')
expect(result).toContain('[Quelle 1: DSGVO]')
expect(result).toContain('Inhalt A')
expect(mockFetch).toHaveBeenCalledTimes(mod.COMPLIANCE_COLLECTIONS.length)
expect(mockFetch).toHaveBeenCalledTimes(1)
})
it('ruft die ai-sdk /sdk/v1/rag/search mit collection + top_k auf', async () => {
it('ruft /sdk/v1/rag/retrieve mit query + top_k (ohne collection) auf', async () => {
mockFetch.mockResolvedValue({ ok: true, json: async () => ({ results: [] }) })
await mod.queryAdvisorRAG('test')
expect(mockFetch).toHaveBeenCalledWith(
expect.stringContaining('/sdk/v1/rag/search'),
expect.stringContaining('/sdk/v1/rag/retrieve'),
expect.objectContaining({ method: 'POST' }),
)
const body = JSON.parse(mockFetch.mock.calls[0][1].body)
expect(body).toMatchObject({ query: 'test', top_k: 3 })
expect(mod.COMPLIANCE_COLLECTIONS).toContain(body.collection)
expect(body).toMatchObject({ query: 'test', top_k: 8 })
expect(body.collection).toBeUndefined()
})
it('liefert leeren String wenn das RAG-Backend nicht erreichbar ist (graceful)', async () => {
@@ -80,10 +80,5 @@ describe('advisor-rag', () => {
const result = await mod.queryAdvisorRAG('test')
expect(result).toBe('')
})
it('umfasst genau die 6 Compliance-Collections', () => {
expect(mod.COMPLIANCE_COLLECTIONS).toHaveLength(6)
expect(mod.COMPLIANCE_COLLECTIONS).toContain('bp_compliance_recht')
})
})
})
+26 -38
View File
@@ -1,12 +1,13 @@
/**
* Compliance-Advisor RAG-Suche.
*
* Fragt die ai-compliance-sdk (`/sdk/v1/rag/search`) ab statt des frueheren
* `rag-service:8097` (auf prod nicht erreichbar). Die ai-sdk embeddet die Query
* mit bge-m3 (prod: ollama-embed) und sucht in den Qdrant-Compliance-Collections
* — damit profitiert der Advisor vom reicheren Embedding.
* Fragt den Authority Router der ai-compliance-sdk (`/sdk/v1/rag/retrieve`) mit NUR der
* Query ab — der Router waehlt selbst die Collections (Broad-Authority-Base + KB-2026.1-Slice
* bei in-scope), embeddet mit bge-m3 (prod: ollama-embed), merged + authority-ranked. Der
* Advisor bleibt damit collection-agnostisch (Vertrag: Compiler -> Collections -> Retriever
* -> Advisor); die fruehere Multi-Collection-Logik liegt jetzt im Retriever.
*
* Fehler je Collection werden geschluckt (graceful: Antwort ohne diesen Treffer).
* Fehler werden geschluckt (graceful: Antwort ohne RAG-Kontext).
* Fundstellen via article_label sind live ab dem Prod-Re-Ingest 2026-06.
*/
@@ -17,16 +18,6 @@ const DEFAULT_USER = '00000000-0000-0000-0000-000000000001'
const DEFAULT_TENANT =
process.env.DEFAULT_TENANT_ID || '9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'
// Compliance-relevante Collections (ai-sdk-Whitelist `AllowedCollections`).
export const COMPLIANCE_COLLECTIONS = [
'bp_compliance_gesetze',
'bp_compliance_ce',
'bp_compliance_datenschutz',
'bp_dsfa_corpus',
'bp_compliance_recht',
'bp_legal_templates',
] as const
interface SdkRagResult {
text?: string
regulation_code?: string
@@ -68,39 +59,36 @@ export function mapSdkResults(results: SdkRagResult[] | undefined): ScoredPassag
.filter((p) => p.content)
}
async function searchCollection(collection: string, query: string): Promise<ScoredPassage[]> {
/**
* Authority Router: EIN collection-agnostischer Aufruf an die ai-sdk (`/sdk/v1/rag/retrieve`).
* Der Router waehlt die Collections (Broad-Authority-Base + KB-2026.1-Slice bei in-scope),
* merged + authority-ranked sie und liefert die Top-Passagen. Der Advisor weiss damit nichts
* mehr ueber einzelne Collections — die fruehere Multi-Collection-Logik liegt jetzt im Retriever.
* Fehler werden geschluckt (graceful: Antwort ohne RAG-Kontext).
*/
export async function queryAdvisorRAG(query: string): Promise<string> {
let passages: ScoredPassage[] = []
try {
const res = await fetch(`${SDK_URL}/sdk/v1/rag/search`, {
const res = await fetch(`${SDK_URL}/sdk/v1/rag/retrieve`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-User-ID': DEFAULT_USER,
'X-Tenant-ID': DEFAULT_TENANT,
},
body: JSON.stringify({ query, collection, top_k: 3 }),
signal: AbortSignal.timeout(10000),
body: JSON.stringify({ query, top_k: 8 }),
signal: AbortSignal.timeout(15000),
})
if (!res.ok) return []
const data = await res.json()
return mapSdkResults(data.results)
if (res.ok) {
const data = await res.json()
passages = mapSdkResults(data.results)
}
} catch {
return []
// graceful: keine Verbindung -> Antwort ohne RAG-Kontext
}
}
/**
* Fragt alle Compliance-Collections parallel ab und liefert die Top-8-Passagen
* als formatierten Kontextblock (oder '' wenn nichts erreichbar/gefunden).
*/
export async function queryAdvisorRAG(query: string): Promise<string> {
const settled = await Promise.all(
COMPLIANCE_COLLECTIONS.map((c) => searchCollection(c, query)),
)
const all = settled.flat()
if (all.length === 0) return ''
all.sort((a, b) => b.score - a.score)
return all
.slice(0, 8)
// Der Router liefert bereits authority-geordnete Top-K; Reihenfolge bewahren.
if (passages.length === 0) return ''
return passages
.map((r, i) => `[Quelle ${i + 1}: ${r.source}]\n${r.content}`)
.join('\n\n---\n\n')
}
@@ -82,6 +82,43 @@ func (h *RAGHandlers) Search(c *gin.Context) {
})
}
// RetrieveRequest is the Authority Router request: a query only, no collection — the router decides
// which collections to query (broad authority base + the in-scope KB-2026.1 slice).
type RetrieveRequest struct {
Query string `json:"query" binding:"required"`
TopK int `json:"top_k,omitempty"`
}
// Retrieve is the Authority Router endpoint. The Advisor calls this with ONLY a query and stays
// collection-agnostic; the router fans out over the authority base + the in-scope slice, merges by
// authority score, and returns the unified top-K. Response shape matches Search (query/results/
// count/assessment) so existing consumers parse it unchanged.
// POST /sdk/v1/rag/retrieve
func (h *RAGHandlers) Retrieve(c *gin.Context) {
var req RetrieveRequest
if err := c.ShouldBindJSON(&req); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
if req.TopK <= 0 || req.TopK > 20 {
req.TopK = 8
}
results, err := h.ragClient.Retrieve(c.Request.Context(), req.Query, req.TopK)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "RAG retrieve failed: " + err.Error()})
return
}
c.JSON(http.StatusOK, gin.H{
"query": req.Query,
"results": results,
"count": len(results),
"assessment": ucca.Assess(results),
})
}
// ListRegulations returns the list of available regulations in the corpus.
// GET /sdk/v1/rag/regulations
func (h *RAGHandlers) ListRegulations(c *gin.Context) {
+1 -1
View File
@@ -159,6 +159,7 @@ func registerRAGRoutes(v1 *gin.RouterGroup, h *handlers.RAGHandlers) {
ragRoutes := v1.Group("/rag")
{
ragRoutes.POST("/search", h.Search)
ragRoutes.POST("/retrieve", h.Retrieve)
ragRoutes.GET("/regulations", h.ListRegulations)
ragRoutes.GET("/corpus-status", h.CorpusStatus)
ragRoutes.GET("/corpus-versions/:collection", h.CorpusVersionHistory)
@@ -358,7 +359,6 @@ func registerWhistleblowerRoutes(v1 *gin.RouterGroup, h *handlers.WhistleblowerH
}
}
func registerMaximizerRoutes(v1 *gin.RouterGroup, h *handlers.MaximizerHandlers) {
m := v1.Group("/maximizer")
{
@@ -0,0 +1,129 @@
package ucca
import (
"context"
"os"
"sort"
"strings"
"sync"
)
// routerBaseCollections is the broad authority base the Authority Router fans out over. It mirrors
// the Advisor's historical multi-collection set; the KB-2026.1 slice is added separately when the
// query is in scope. Override via RAG_ROUTER_COLLECTIONS (comma-separated) per environment.
func (c *LegalRAGClient) routerBaseCollections() []string {
if v := strings.TrimSpace(os.Getenv("RAG_ROUTER_COLLECTIONS")); v != "" {
var out []string
for _, p := range strings.Split(v, ",") {
if s := strings.TrimSpace(p); s != "" {
out = append(out, s)
}
}
if len(out) > 0 {
return out
}
}
return []string{
"bp_compliance_gesetze",
"bp_compliance_ce",
"bp_compliance_datenschutz",
"bp_dsfa_corpus",
"bp_compliance_recht",
"bp_legal_templates",
}
}
const routerPerCollectionTopK = 3
// Retrieve is the Authority Router entry point: callers (the Advisor) pass ONLY a query and stay
// collection-agnostic. The router fans out over the broad authority base and ADDS the KB-2026.1
// slice when the query is in scope (inKBScope), then merges all hits, deduplicates, and returns the
// top-K by authority score. This moves the former Advisor-side collection fan-out into the retrieval
// layer (the "Retriever" tier of the quality pyramid), so the proven KB-2026.1 slice gain reaches
// the product path without the Advisor knowing about individual collections.
//
// The merged set is ordered by the per-collection authority score that rerankByAuthority already
// produced inside searchInternal — i.e. binding-vs-guidance ordering is preserved across the merge.
// Per-collection failures (e.g. a collection absent on an environment) degrade gracefully.
func (c *LegalRAGClient) Retrieve(ctx context.Context, query string, topK int) ([]LegalSearchResult, error) {
if topK <= 0 {
topK = 8
}
collections := c.routerBaseCollections()
if c.kbScopeRoutingEnabled && c.kbSliceCollection != "" && inKBScope(query) {
collections = append(collections, c.kbSliceCollection)
}
// Cross-regulation queries (>=2 explicitly named regulations) get a larger per-collection budget
// so each collection's multi-regulation search isn't truncated down to the keyword-dominant
// domain; the final per-regulation balancing then guarantees every named domain in the top-K.
regs := detectRegulations(query)
perColl := routerPerCollectionTopK
if len(regs) >= 2 {
perColl = routerPerCollectionTopK * len(regs)
}
// Warm the full-text indexes sequentially first so the concurrent fan-out below only READS the
// shared textIndexEnsured map (the writes happen here, serialized) — closes the cold-start map
// race deterministically. Best-effort: a missing collection just stays un-indexed (hybrid then
// falls back to dense, or the per-collection search degrades to nothing).
if c.hybridEnabled {
for _, coll := range collections {
_ = c.ensureTextIndex(ctx, coll)
}
}
out := make([][]LegalSearchResult, len(collections))
var wg sync.WaitGroup
for i, coll := range collections {
wg.Add(1)
go func(i int, coll string) {
defer wg.Done()
if res, err := c.searchInternal(ctx, coll, query, nil, perColl); err == nil {
out[i] = res
}
}(i, coll)
}
wg.Wait()
merged := make([]LegalSearchResult, 0, len(collections)*perColl)
for _, r := range out {
merged = append(merged, r...)
}
merged = dedupResults(merged)
sort.SliceStable(merged, func(a, b int) bool { return merged[a].Score > merged[b].Score })
// Cross-regulation: guarantee every named domain is represented (0070-class fix) instead of
// letting a global score-sort starve the non-dominant domain.
if len(regs) >= 2 {
return balanceByRegulation(merged, regs, topK), nil
}
if len(merged) > topK {
merged = merged[:topK]
}
return merged, nil
}
// dedupResults removes duplicate passages that can appear when collections overlap, keeping the
// highest-scoring occurrence. Identity = regulation_code + article_label + a text prefix.
func dedupResults(in []LegalSearchResult) []LegalSearchResult {
pos := make(map[string]int, len(in))
out := make([]LegalSearchResult, 0, len(in))
for _, r := range in {
text := r.Text
if len(text) > 80 {
text = text[:80]
}
key := r.RegulationCode + "|" + r.ArticleLabel + "|" + text
if idx, ok := pos[key]; ok {
if r.Score > out[idx].Score {
out[idx] = r
}
continue
}
pos[key] = len(out)
out = append(out, r)
}
return out
}
@@ -0,0 +1,164 @@
package ucca
import (
"context"
"encoding/json"
"os"
"strconv"
"strings"
"testing"
)
type benchQ struct {
ID string `json:"id"`
Document string `json:"document"`
Question string `json:"question"`
}
// docTokens maps a bench question's expected document to acceptable regulation_code/label substrings.
func docTokens(document string) []string {
d := strings.ToUpper(document)
var t []string
for _, wp := range []string{"WP243", "WP248", "WP260"} {
if strings.Contains(d, wp) {
t = append(t, wp)
}
}
dns := strings.ReplaceAll(d, " ", "")
for _, gl := range []struct{ key, tok string }{{"07/2020", "GL07"}, {"05/2020", "GL05"}, {"09/2022", "GL09"}} {
if strings.Contains(dns, gl.key) {
t = append(t, gl.tok)
}
}
if strings.Contains(d, "TDDDG") {
t = append(t, "TDDDG")
}
if strings.Contains(d, "DSGVO") || strings.Contains(d, "ART. 13") || strings.Contains(d, "ART. 14") {
t = append(t, "DSGVO")
}
if strings.Contains(d, "BDSG") {
t = append(t, "BDSG")
}
if strings.Contains(d, "CRA") {
t = append(t, "CRA")
}
if strings.Contains(d, "MASCH") {
t = append(t, "MASCH", "MACHINERY", "MVO")
}
return t
}
func hitDoc(results []LegalSearchResult, toks []string) bool {
for _, r := range results {
s := strings.ReplaceAll(strings.ToUpper(r.RegulationCode+" "+r.ArticleLabel), " ", "")
for _, tk := range toks {
if strings.Contains(s, strings.ReplaceAll(tk, " ", "")) {
return true
}
}
}
return false
}
// TestMultiReg0070E2E (RUN_E2E=1) is the 0070 regression: a cross-regulation query (CRA + MaschVO)
// must return BOTH domains through the real Retrieve(), not just the keyword-dominant CRA.
func TestMultiReg0070E2E(t *testing.T) {
if os.Getenv("RUN_E2E") != "1" {
t.Skip("set RUN_E2E=1 + QDRANT_URL/OLLAMA_URL/QDRANT_API_KEY")
}
c := NewLegalRAGClient()
q := "Wie greifen CRA und Maschinenverordnung bei einer vernetzten Maschine ineinander?"
res, err := c.Retrieve(context.Background(), q, 8)
if err != nil {
t.Fatalf("retrieve: %v", err)
}
var hasCRA, hasMasch bool
var codes []string
for _, r := range res {
u := strings.ToUpper(r.RegulationCode)
codes = append(codes, u)
if strings.Contains(u, "CRA") {
hasCRA = true
}
if strings.Contains(u, "MASCH") || strings.Contains(u, "MACHIN") || u == "MVO" {
hasMasch = true
}
}
t.Logf("0070 top-8 codes: %v", codes)
if !hasCRA || !hasMasch {
t.Errorf("0070 must return BOTH domains via Retrieve(): CRA=%v MaschVO=%v", hasCRA, hasMasch)
}
}
// TestAuthorityRouterCB100 (RUN_E2E=1) drives the REAL Retrieve() over the ComplianceBench-100 against
// the live collections: NEW (scope routing on → slice added for in-scope queries) vs OLD (routing off
// → broad base only). It is the regression gate that the router actually delivers the proven slice
// gain (+28/0-regr in the offline simulation) through the production Go code path.
func TestAuthorityRouterCB100(t *testing.T) {
if os.Getenv("RUN_E2E") != "1" {
t.Skip("set RUN_E2E=1 + QDRANT_URL/OLLAMA_URL/QDRANT_API_KEY + BENCH_PATH")
}
path := os.Getenv("BENCH_PATH")
if path == "" {
path = "/tmp/compliance_bench.json"
}
raw, err := os.ReadFile(path)
if err != nil {
t.Fatalf("bench read: %v", err)
}
var doc struct {
Questions []benchQ `json:"questions"`
}
if err := json.Unmarshal(raw, &doc); err != nil {
t.Fatalf("bench parse: %v", err)
}
// BENCH_STRIDE samples every Kth question (stratified across DS/CRA/MaschVO) so the gate stays
// tractable against the remote dev Qdrant; default 1 = full CB-100.
stride := 1
if s := os.Getenv("BENCH_STRIDE"); s != "" {
if n, err := strconv.Atoi(s); err == nil && n > 0 {
stride = n
}
}
c := NewLegalRAGClient()
ctx := context.Background()
var n, oldHit, newHit, gain, regr int
for i, q := range doc.Questions {
if i%stride != 0 {
continue
}
n++
toks := docTokens(q.Document)
c.kbScopeRoutingEnabled = false
oldRes, _ := c.Retrieve(ctx, q.Question, 8)
c.kbScopeRoutingEnabled = true
newRes, _ := c.Retrieve(ctx, q.Question, 8)
oh, nh := hitDoc(oldRes, toks), hitDoc(newRes, toks)
if oh {
oldHit++
}
if nh {
newHit++
}
flip := "="
switch {
case !oh && nh:
gain++
flip = "GAIN"
case oh && !nh:
regr++
flip = "REGR"
}
t.Logf("%-9s [%-14s] OLD=%-5v NEW=%-5v %s", q.ID, q.Document, oh, nh, flip)
}
t.Logf("CB-100 sample (stride=%d) via Retrieve(): N=%d | OLD-hit %d | NEW-hit %d | GAIN %d | REGR %d",
stride, n, oldHit, newHit, gain, regr)
if newHit <= oldHit || gain < 3 {
t.Errorf("router must add slice gains: NEW(%d) must exceed OLD(%d), gain=%d", newHit, oldHit, gain)
}
if regr > 2 {
t.Errorf("too many regressions through the router: %d", regr)
}
}
@@ -0,0 +1,99 @@
package ucca
import (
"os"
"testing"
)
func TestRouterBaseCollections(t *testing.T) {
c := &LegalRAGClient{}
os.Unsetenv("RAG_ROUTER_COLLECTIONS")
def := c.routerBaseCollections()
if len(def) != 6 || def[1] != "bp_compliance_ce" {
t.Fatalf("default base collections unexpected: %v", def)
}
os.Setenv("RAG_ROUTER_COLLECTIONS", " bp_compliance_ce , kb_2026_1_build ,, ")
defer os.Unsetenv("RAG_ROUTER_COLLECTIONS")
got := c.routerBaseCollections()
if len(got) != 2 || got[0] != "bp_compliance_ce" || got[1] != "kb_2026_1_build" {
t.Fatalf("env override parse failed (trim/empty): %v", got)
}
}
func TestRouterSliceSelection(t *testing.T) {
// The router appends the slice exactly when the query is in scope (inKBScope) and routing is on.
// Mirror the selection logic so a regression in either is caught without a live Qdrant.
c := &LegalRAGClient{kbSliceCollection: "kb_2026_1_build", kbScopeRoutingEnabled: true}
sel := func(q string) bool {
colls := c.routerBaseCollections()
if c.kbScopeRoutingEnabled && c.kbSliceCollection != "" && inKBScope(q) {
colls = append(colls, c.kbSliceCollection)
}
for _, x := range colls {
if x == c.kbSliceCollection {
return true
}
}
return false
}
if !sel("Welche neun Kriterien nennt WP248 fuer ein hohes Risiko?") {
t.Error("in-scope guidance query must include the slice")
}
if sel("Was sagt NIST SP 800-53 zu Access Control?") {
t.Error("out-of-scope query must NOT include the slice")
}
c.kbScopeRoutingEnabled = false
if sel("Welche Kriterien nennt WP248?") {
t.Error("routing disabled => slice never included")
}
}
func TestBalanceByRegulation(t *testing.T) {
regs := []detectedRegulation{
{Canonical: "CRA", CodeValues: []string{"CRA"}},
{Canonical: "MaschVO", CodeValues: []string{"MASCHVO", "MVO", "MACHINERY"}},
}
// CRA dominates by score; without balancing the top-4 would be all CRA + NIST.
pool := []LegalSearchResult{
{RegulationCode: "CRA", Score: 0.99},
{RegulationCode: "CRA", Score: 0.98},
{RegulationCode: "CRA", Score: 0.97},
{RegulationCode: "NIST", Score: 0.96},
{RegulationCode: "MACHINERY", Score: 0.70},
{RegulationCode: "MVO", Score: 0.65},
}
out := balanceByRegulation(pool, regs, 4)
var hasCRA, hasMasch bool
for _, r := range out {
switch r.RegulationCode {
case "CRA":
hasCRA = true
case "MACHINERY", "MVO":
hasMasch = true
}
}
if !hasCRA || !hasMasch {
t.Errorf("both named domains must be represented: CRA=%v MaschVO=%v out=%v", hasCRA, hasMasch, out)
}
if out[0].RegulationCode != "CRA" || !(out[1].RegulationCode == "MACHINERY" || out[1].RegulationCode == "MVO") {
t.Errorf("round-robin should alternate domains, got %s then %s", out[0].RegulationCode, out[1].RegulationCode)
}
}
func TestDedupResults(t *testing.T) {
in := []LegalSearchResult{
{RegulationCode: "EDPB WP248", ArticleLabel: "III.B", Text: "lorem", Score: 0.7},
{RegulationCode: "EDPB WP248", ArticleLabel: "III.B", Text: "lorem", Score: 0.9}, // dup, higher score
{RegulationCode: "DSGVO", ArticleLabel: "Art. 35", Text: "ipsum", Score: 0.8},
}
out := dedupResults(in)
if len(out) != 2 {
t.Fatalf("expected 2 deduped, got %d", len(out))
}
for _, r := range out {
if r.RegulationCode == "EDPB WP248" && r.Score != 0.9 {
t.Errorf("dedup must keep highest score, got %v", r.Score)
}
}
}
@@ -0,0 +1,52 @@
package ucca
import "strings"
// kbScopeTopics are high-precision data-protection / compliance topic markers that place a query in
// the KB-2026.1 authoritative slice even when it does NOT name a regulation. Conservative by design:
// an unmatched query falls back to the broad CE default (no regression) — the slice is only used when
// the query is confidently in-scope.
var kbScopeTopics = []string{
// DP-Guidance-Marker, die IN der Slice liegen (EDPB/DSK/WP/GL) — bewusst NICHT die generischen
// Verben aus guidanceIntentSignals (sagt/laut/empfiehlt/auslegung) und NICHT enisa/bsi/nist/owasp
// (die liegen im breiten CE-Pool, nicht in der Slice).
"edpb", "dsk", "datenschutzausschuss", "orientierungshilfe",
"wp2", "wp 2", "wp29", "working paper", "gl 0",
"datenschutz", "dsgvo", "gdpr", "dsfa", "folgenabschätzung", "folgenabschaetzung",
"einwilligung", "auftragsverarbeit", "betroffenenrecht", "auskunftsrecht",
"verarbeitungsverzeichnis", "datenschutzbeauftragt", "verzeichnis von verarbeitung",
"cookie", "tracking", "transparenzpflicht", "datenpanne", "meldepflicht",
"technische und organisatorische maßnahmen",
"cyber resilience", "schwachstelle", "vulnerability", "sicherheitsupdate",
"maschinensicherheit", "wesentliche veränderung", "wesentliche veraenderung",
"konformitätsbewertung", "konformitaetsbewertung", "ce-kennzeichnung",
}
// inKBScope reports whether the query belongs to the KB-2026.1 authoritative slice. True when it
// names an in-slice regulation (detectRegulations), asks for guidance (EDPB/DSK/WP/GL), or hits a
// data-protection / compliance topic marker.
func inKBScope(query string) bool {
if len(detectRegulations(query)) > 0 {
return true
}
q := strings.ToLower(query)
for _, t := range kbScopeTopics {
if strings.Contains(q, t) {
return true
}
}
return false
}
// resolveCollection applies the Blue-Green „authoritative slice promotion" routing. An explicitly
// requested collection is honoured unchanged; the DEFAULT (empty) request is routed to the KB-2026.1
// slice when the query is in-scope, else to the broad CE default. Disable via RAG_KB_SCOPE_ROUTING=false.
func (c *LegalRAGClient) resolveCollection(query, requested string) string {
if requested != "" {
return requested
}
if c.kbScopeRoutingEnabled && c.kbSliceCollection != "" && inKBScope(query) {
return c.kbSliceCollection
}
return c.collection
}
@@ -0,0 +1,101 @@
package ucca
import (
"context"
"fmt"
"os"
"strings"
"testing"
)
func TestInKBScope(t *testing.T) {
inScope := []string{
"Welche neun Kriterien nennt WP248 fuer ein hohes Risiko?",
"Wie greifen CRA und Maschinenverordnung bei einer vernetzten Maschine ineinander?",
"Wann ist eine Datenschutz-Folgenabschaetzung erforderlich?",
"Welche Anforderungen stellt die DSGVO an die Einwilligung?",
"Brauche ich einen Datenschutzbeauftragten?",
"Wann muss eine aktiv ausgenutzte Schwachstelle gemeldet werden?",
}
outScope := []string{
"Welche OWASP-Kontrollen gibt es fuer Authentifizierung?",
"Was sagt NIST SP 800-53 zu Access Control?",
"Wie funktioniert ISO 27001 Zertifizierung?",
"Welche IFRS-Standards gelten fuer Leasing?",
}
for _, q := range inScope {
if !inKBScope(q) {
t.Errorf("inKBScope(%q) = false, want true", q)
}
}
for _, q := range outScope {
if inKBScope(q) {
t.Errorf("inKBScope(%q) = true, want false", q)
}
}
}
func TestResolveCollection(t *testing.T) {
c := &LegalRAGClient{collection: "bp_compliance_ce", kbSliceCollection: "kb_2026_1_build", kbScopeRoutingEnabled: true}
if got := c.resolveCollection("Welche Kriterien nennt WP248?", ""); got != "kb_2026_1_build" {
t.Errorf("in-scope default -> %s, want kb_2026_1_build", got)
}
if got := c.resolveCollection("Was sagt NIST SP 800-53?", ""); got != "bp_compliance_ce" {
t.Errorf("out-of-scope default -> %s, want bp_compliance_ce", got)
}
if got := c.resolveCollection("Welche Kriterien nennt WP248?", "explicit_coll"); got != "explicit_coll" {
t.Errorf("explicit request must be honoured -> %s", got)
}
c.kbScopeRoutingEnabled = false
if got := c.resolveCollection("Welche Kriterien nennt WP248?", ""); got != "bp_compliance_ce" {
t.Errorf("disabled routing -> %s, want bp_compliance_ce", got)
}
}
// TestKBScopeRoutingE2E (RUN_E2E=1) verifies the routing against the REAL collections: a default
// Search() of an in-scope query must hit the KB-2026.1 slice (WP248/MaschVO live there but NOT in
// the broad CE pool = clean discriminator); an out-of-scope query stays on CE.
func TestKBScopeRoutingE2E(t *testing.T) {
if os.Getenv("RUN_E2E") != "1" {
t.Skip("set RUN_E2E=1 + QDRANT_URL/OLLAMA_URL/QDRANT_API_KEY")
}
c := NewLegalRAGClient()
cases := []struct {
q string
wantToken string // expected in top-8 when routed to the slice
wantInKB bool
}{
{"Welche neun Kriterien nennt WP248 fuer ein voraussichtlich hohes Risiko?", "WP248", true},
{"Welche grundlegenden Sicherheits- und Gesundheitsschutzanforderungen enthaelt Anhang III der Maschinenverordnung?", "MASCH", true},
{"Wie greifen CRA und Maschinenverordnung bei einer vernetzten Maschine ineinander?", "MASCH", true},
{"Was sagt NIST SP 800-53 zu Access Control?", "", false},
}
for _, tc := range cases {
routed := c.resolveCollection(tc.q, "")
res, err := c.Search(context.Background(), tc.q, nil, 8)
if err != nil {
t.Fatalf("%q: %v", tc.q, err)
}
codes := map[string]bool{}
for _, r := range res {
codes[strings.ToUpper(r.RegulationCode)] = true
}
hit := false
if tc.wantToken != "" {
for cd := range codes {
if strings.Contains(cd, tc.wantToken) {
hit = true
break
}
}
}
col := make([]string, 0, len(codes))
for cd := range codes {
col = append(col, cd)
}
fmt.Printf("inKB=%-5v routed=%-16s wantTok=%-6s found=%-5v | %v\n", tc.wantInKB, routed, tc.wantToken, hit, col)
if tc.wantInKB && tc.wantToken != "" && !hit {
t.Errorf("%q routed to %s but %s not in top-8 (slice not active?)", tc.q, routed, tc.wantToken)
}
}
}
@@ -21,6 +21,12 @@ type LegalRAGClient struct {
textIndexEnsured map[string]bool
hybridEnabled bool
graphEnabled bool
// Blue-Green „authoritative slice promotion" (additiv, KEIN CE-Ersatz): faellt eine Query
// in den KB-2026.1-Scope (DP/CRA/MaschVO/NIS2/DataAct/DORA/AIAct + EDPB/DSK-Guidance), wird
// die hochwertige Slice-Collection abgefragt; sonst bleibt der breite Default (bp_compliance_ce).
kbSliceCollection string
kbScopeRoutingEnabled bool
}
// NewLegalRAGClient creates a new Legal RAG client using Ollama bge-m3 embeddings.
@@ -45,15 +51,25 @@ func NewLegalRAGClient() *LegalRAGClient {
// zur Begruendung/Vollstaendigkeit genutzt, nicht zur Pool-Expansion (Default).
graphEnabled := os.Getenv("RAG_GRAPH_EXPANSION") == "true"
// KB-2026.1 authoritative slice (Blue-Green, additiv). Routing default AN; Rollback ohne
// Redeploy ueber RAG_KB_SCOPE_ROUTING=false (dann faellt alles auf den CE-Default zurueck).
kbSlice := os.Getenv("RAG_KB_SLICE_COLLECTION")
if kbSlice == "" {
kbSlice = "kb_2026_1_build"
}
kbScopeRouting := os.Getenv("RAG_KB_SCOPE_ROUTING") != "false"
return &LegalRAGClient{
qdrantURL: qdrantURL,
qdrantAPIKey: qdrantAPIKey,
ollamaURL: ollamaURL,
embeddingModel: "bge-m3",
collection: "bp_compliance_ce",
textIndexEnsured: make(map[string]bool),
hybridEnabled: hybridEnabled,
graphEnabled: graphEnabled,
qdrantURL: qdrantURL,
qdrantAPIKey: qdrantAPIKey,
ollamaURL: ollamaURL,
embeddingModel: "bge-m3",
collection: "bp_compliance_ce",
textIndexEnsured: make(map[string]bool),
hybridEnabled: hybridEnabled,
graphEnabled: graphEnabled,
kbSliceCollection: kbSlice,
kbScopeRoutingEnabled: kbScopeRouting,
httpClient: &http.Client{
Timeout: 60 * time.Second,
},
@@ -63,15 +79,13 @@ func NewLegalRAGClient() *LegalRAGClient {
// SearchCollection queries a specific Qdrant collection for relevant passages.
// If collection is empty, it falls back to the default collection (bp_compliance_ce).
func (c *LegalRAGClient) SearchCollection(ctx context.Context, collection string, query string, regulationIDs []string, topK int) ([]LegalSearchResult, error) {
if collection == "" {
collection = c.collection
}
return c.searchInternal(ctx, collection, query, regulationIDs, topK)
return c.searchInternal(ctx, c.resolveCollection(query, collection), query, regulationIDs, topK)
}
// Search queries the compliance CE corpus for relevant passages.
// Search queries the compliance corpus for relevant passages. The target collection is resolved by
// the Blue-Green slice routing: the KB-2026.1 slice for in-scope queries, else the broad CE default.
func (c *LegalRAGClient) Search(ctx context.Context, query string, regulationIDs []string, topK int) ([]LegalSearchResult, error) {
return c.searchInternal(ctx, c.collection, query, regulationIDs, topK)
return c.searchInternal(ctx, c.resolveCollection(query, ""), query, regulationIDs, topK)
}
// searchInternal performs the actual search against a given collection.
@@ -20,7 +20,9 @@ var regulationCatalog = []struct {
CodeValues []string
}{
{"CRA", []string{"cra", "cyber resilience"}, []string{"CRA"}},
{"MaschVO", []string{"maschinenverordnung", "maschvo", "machinery regulation"}, []string{"MASCHVO", "MaschVO"}},
// MaschVO heisst je Collection anders: Slice MASCHVO · gesetze MVO · ce MACHINERY/MASCHINENVO.
// Alle Varianten als CodeValues, sonst findet der per-Reg-Filter MaschVO nur in der Slice (0070).
{"MaschVO", []string{"maschinenverordnung", "maschvo", "machinery regulation"}, []string{"MASCHVO", "MaschVO", "MVO", "MASCHINENVO", "MACHINERY"}},
{"NIS2", []string{"nis2", "nis-2", "nis 2"}, []string{"NIS2"}},
{"DORA", []string{"dora"}, []string{"DORA"}},
{"Data Act", []string{"data act", "datengesetz"}, []string{"DATA ACT", "DataAct"}},
@@ -53,6 +55,62 @@ func detectRegulations(query string) []detectedRegulation {
func hitID(h qdrantSearchHit) string { return fmt.Sprintf("%v", h.ID) }
// balanceByRegulation builds the final top-K so EVERY explicitly-named regulation with hits is
// represented, instead of letting the keyword-dominant domain (e.g. CRA) crowd out the other
// (e.g. MaschVO) in a cross-regulation query. The input pool must already be score-ordered;
// results are grouped by exact regulation_code match against each regulation's CodeValues, then
// taken round-robin across the named domains (highest-scored first within each), with any
// remaining slots filled by the leftover pool in score order. Generic; no doc-specific logic.
func balanceByRegulation(pool []LegalSearchResult, regs []detectedRegulation, topK int) []LegalSearchResult {
if topK <= 0 {
topK = 8
}
byReg := make([][]LegalSearchResult, len(regs))
matched := make([]bool, len(pool))
for ri, r := range regs {
for pi := range pool {
if matched[pi] {
continue
}
code := strings.TrimSpace(pool[pi].RegulationCode)
for _, cv := range r.CodeValues {
if strings.EqualFold(code, cv) {
byReg[ri] = append(byReg[ri], pool[pi])
matched[pi] = true
break
}
}
}
}
out := make([]LegalSearchResult, 0, topK)
idx := make([]int, len(regs))
for len(out) < topK {
progressed := false
for ri := range regs {
if idx[ri] < len(byReg[ri]) {
out = append(out, byReg[ri][idx[ri]])
idx[ri]++
progressed = true
if len(out) >= topK {
break
}
}
}
if !progressed {
break
}
}
for pi := range pool {
if len(out) >= topK {
break
}
if !matched[pi] {
out = append(out, pool[pi])
}
}
return out
}
// searchMultiRegulation retrieves each explicitly-named regulation SEPARATELY (per-regulation
// filter) and merges, so a cross-regulation query ("Wie greifen CRA und MaschVO ineinander?")
// returns BOTH domains in the prompt instead of only the keyword-dominant one. Generic over any