feat(ai-sdk): classify technical standards (NIST/OWASP/Grundschutz) as technical_standard
CI / detect-changes (pull_request) Successful in 5s
CI / branch-name (pull_request) Successful in 1s
CI / guardrail-integrity (pull_request) Successful in 4s
CI / secret-scan (pull_request) Successful in 6s
CI / dep-audit (pull_request) Failing after 55s
CI / sbom-scan (pull_request) Failing after 58s
CI / build-sha-integrity (pull_request) Successful in 6s
CI / validate-canonical-controls (pull_request) Successful in 4s
CI / loc-budget (pull_request) Successful in 18s
CI / go-lint (pull_request) Successful in 41s
CI / python-lint (pull_request) Failing after 13s
CI / nodejs-lint (pull_request) Failing after 1m4s
CI / nodejs-build (pull_request) Successful in 3m0s
CI / test-go (pull_request) Successful in 58s
CI / iace-gt-coverage (pull_request) Successful in 14s
CI / test-python-backend (pull_request) Successful in 25s
CI / test-python-document-crawler (pull_request) Successful in 13s
CI / test-python-dsms-gateway (pull_request) Successful in 10s

The existing NIST corpus (SP 800-82r3 etc., ingested before source_class tagging)
was classified supervisory_guidance because "NIST" sat in guidanceMarkers, so the
control-intent lift (#36) could never surface it. Add a technical_standard class:

- authority.go: new standardMarkers (NIST/OWASP/Grundschutz/ISO 27001/CSA CCM/CIS),
  checked before guidanceMarkers (so "BSI Grundschutz" -> standard, not BSI guidance);
  move NIST out of guidanceMarkers; sourceClassFromWeight maps weight 80 -> standard.
- authority_rerank.go: the intent-lift path (liftAboveBinding + bestBindingSemantic)
  now classifies via classifyAuthority instead of trusting the raw payload source_class,
  so the untagged legacy corpus is recognized — untagged NIST is now lifted on a
  control question ("Welche Controls passen zu Security Updates?").

Tested: classifier cases for NIST/Grundschutz/weight-80, and an untagged-NIST lift case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-24 12:11:07 +02:00
parent 05d75e8039
commit 90e0a57799
4 changed files with 36 additions and 6 deletions
+13 -3
View File
@@ -9,8 +9,8 @@ import (
// authorityInfo is the normative classification of a search result, used internally
// for re-ranking only (Phase 1 changes ordering, not the response contract).
type authorityInfo struct {
weight int // 100 binding_law, 70 guidance, 0 foreign_law, 50 unknown
sourceClass string // binding_law | supervisory_guidance | foreign_law | unknown
weight int // 100 binding, 80 technical_standard, 70 guidance, 0 foreign, 50 unknown
sourceClass string // binding_law | technical_standard | supervisory_guidance | foreign_law | unknown
jurisdiction string // DE | EU | CH
}
@@ -18,7 +18,13 @@ var (
guidanceMarkers = []string{
"DSK", "EDPB", "BfDI", "BFDI", "BayLfD", "Baylfb", "ENISA", "BSI", "EUCC",
"Standards Mapping", "Kpnr", "Orientierungshilfe", "Handreichung", "Beschluss",
"Leitlinie", "Guidance", "Empfehlung", "NIST", "OECD", "CISA", "Blue Guide",
"Leitlinie", "Guidance", "Empfehlung", "OECD", "CISA", "Blue Guide",
}
// Technical standards / control frameworks (best-practice controls). Checked BEFORE
// guidanceMarkers so a "BSI Grundschutz" chunk classifies as a standard, not BSI guidance.
standardMarkers = []string{
"NIST", "OWASP", "Grundschutz", "ISO 27001", "ISO/IEC 27001",
"CSA CCM", "Cloud Controls Matrix", "CIS Benchmark", "CIS Control",
}
foreignMarkers = []string{"RevDSG", "fedlex", "(CH)"}
deMarkers = []string{"BDSG", "DSK", "BfDI", "BFDI", "BayLfD", "Baylfb", "BSI"}
@@ -48,6 +54,8 @@ func classifyAuthority(r LegalSearchResult) authorityInfo {
switch {
case containsAny(hay, foreignMarkers):
return authorityInfo{weight: 0, sourceClass: "foreign_law", jurisdiction: "CH"}
case r.Category == "standard" || containsAny(hay, standardMarkers):
return authorityInfo{weight: 80, sourceClass: "technical_standard", jurisdiction: jur}
case r.Category == "guidance" || containsAny(hay, guidanceMarkers):
return authorityInfo{weight: 70, sourceClass: "supervisory_guidance", jurisdiction: jur}
case r.Category == "regulation" || r.Category == "eu_recht" || normPattern.MatchString(r.ArticleLabel):
@@ -61,6 +69,8 @@ func sourceClassFromWeight(w int) string {
switch {
case w >= 100:
return "binding_law"
case w >= 80:
return "technical_standard"
case w >= 70:
return "supervisory_guidance"
case w <= 0: