Compare commits
8 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 5f8009e844 | |||
| 079bb56922 | |||
| 24bb449a79 | |||
| 8af9584d09 | |||
| ce6b4c58e3 | |||
| f6d018234b | |||
| 32e45f0797 | |||
| 9d79cf1576 |
+2
-3
@@ -130,11 +130,10 @@ rsync -avz --exclude node_modules --exclude .next --exclude .git \
|
|||||||
|
|
||||||
**breakpilot-core MUSS laufen!** Dieses Projekt nutzt Core-Services:
|
**breakpilot-core MUSS laufen!** Dieses Projekt nutzt Core-Services:
|
||||||
- Valkey (Session-Cache)
|
- Valkey (Session-Cache)
|
||||||
|
- Vault (Secrets)
|
||||||
- RAG-Service (Vektorsuche fuer Compliance-Dokumente)
|
- RAG-Service (Vektorsuche fuer Compliance-Dokumente)
|
||||||
- Nginx (Reverse Proxy)
|
- Nginx (Reverse Proxy)
|
||||||
|
|
||||||
Secrets liegen in Infisical (`secrets.meghsakha.com`); die Projektverknuepfung steht in `.infisical.json`. Lokal mit `infisical run --env=dev -- docker compose up` (oder `make dev`) starten — `.env`/`.env.local` werden nicht mehr verwendet.
|
|
||||||
|
|
||||||
**Externe Services (Production):**
|
**Externe Services (Production):**
|
||||||
- PostgreSQL 17 (sslmode=require) — Schemas: `compliance`, `public`
|
- PostgreSQL 17 (sslmode=require) — Schemas: `compliance`, `public`
|
||||||
- Qdrant @ `qdrant-dev.breakpilot.ai` (HTTPS, API-Key)
|
- Qdrant @ `qdrant-dev.breakpilot.ai` (HTTPS, API-Key)
|
||||||
@@ -317,7 +316,7 @@ ssh macmini "/usr/local/bin/docker compose -f /Users/benjaminadmin/Projekte/brea
|
|||||||
|
|
||||||
### 5. Sensitive Dateien
|
### 5. Sensitive Dateien
|
||||||
**NIEMALS aendern oder committen:**
|
**NIEMALS aendern oder committen:**
|
||||||
- `.env`, `.env.local`, Infisical-Tokens, SSL-Zertifikate
|
- `.env`, `.env.local`, Vault-Tokens, SSL-Zertifikate
|
||||||
- `*.pdf`, `*.docx`, kompilierte Binaries, grosse Medien
|
- `*.pdf`, `*.docx`, kompilierte Binaries, grosse Medien
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
@@ -92,7 +92,7 @@ Wenn Hochrisiko:
|
|||||||
|
|
||||||
- [ ] **Transit:** TLS 1.3 für alle Verbindungen
|
- [ ] **Transit:** TLS 1.3 für alle Verbindungen
|
||||||
- [ ] **Rest:** Datenbank-Verschlüsselung
|
- [ ] **Rest:** Datenbank-Verschlüsselung
|
||||||
- [ ] **Secrets:** Infisical (`secrets.meghsakha.com`) für Credentials
|
- [ ] **Secrets:** Vault für Credentials
|
||||||
|
|
||||||
### Zugriffskontrollen
|
### Zugriffskontrollen
|
||||||
|
|
||||||
|
|||||||
@@ -136,14 +136,12 @@ jobs:
|
|||||||
runs-on: docker
|
runs-on: docker
|
||||||
needs: detect-changes
|
needs: detect-changes
|
||||||
if: github.event_name == 'pull_request' && needs.detect-changes.outputs.sdk == 'true'
|
if: github.event_name == 'pull_request' && needs.detect-changes.outputs.sdk == 'true'
|
||||||
container: golangci/golangci-lint:v1.64.8-alpine
|
container: golangci/golangci-lint:v1.62-alpine
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout
|
- name: Checkout
|
||||||
run: |
|
run: |
|
||||||
apk add --no-cache git
|
apk add --no-cache git
|
||||||
# Full clone so `main` is a local ref — new-from-merge-base needs the merge base.
|
git clone --depth 1 --branch ${GITHUB_HEAD_REF:-${GITHUB_REF_NAME}} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
|
||||||
git clone ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
|
|
||||||
git checkout ${GITHUB_HEAD_REF:-${GITHUB_REF_NAME}}
|
|
||||||
- name: Lint ai-compliance-sdk
|
- name: Lint ai-compliance-sdk
|
||||||
run: |
|
run: |
|
||||||
[ -d "ai-compliance-sdk" ] || exit 0
|
[ -d "ai-compliance-sdk" ] || exit 0
|
||||||
|
|||||||
@@ -1,5 +0,0 @@
|
|||||||
{
|
|
||||||
"workspaceId": "996bda36-9e01-4071-ae8d-69a9f9ff5a23",
|
|
||||||
"defaultEnvironment": "",
|
|
||||||
"gitBranchToEnvironmentMapping": null
|
|
||||||
}
|
|
||||||
@@ -1,157 +0,0 @@
|
|||||||
# Infisical Setup for Local Development
|
|
||||||
|
|
||||||
This is the per-developer onboarding for accessing the `breakpilot-compliance` secrets while developing locally. Once this is done, **everything you launch through `make dev` (or `infisical run …`) gets the dev secrets injected as environment variables** — including any Claude Code session that spawns those commands.
|
|
||||||
|
|
||||||
Secrets live in the self-hosted Infisical instance at **`secrets.meghsakha.com`**. The project link is committed in `.infisical.json`, so you don't need to know the project ID.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 1. Install the Infisical CLI
|
|
||||||
|
|
||||||
**macOS (recommended):**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
brew install infisical/get-cli/infisical
|
|
||||||
```
|
|
||||||
|
|
||||||
**Other platforms / manual install:**
|
|
||||||
|
|
||||||
See <https://infisical.com/docs/cli/overview>. Verify with:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
infisical --version
|
|
||||||
# infisical version 0.43.x (or newer)
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 2. Log in to the self-hosted instance
|
|
||||||
|
|
||||||
```bash
|
|
||||||
infisical login --domain https://secrets.meghsakha.com
|
|
||||||
```
|
|
||||||
|
|
||||||
This opens a browser for SSO. The login is persisted to your OS keychain — you only do this once per machine.
|
|
||||||
|
|
||||||
Sanity check:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ~/projects/breakpilot-compliance # wherever you cloned the repo
|
|
||||||
infisical --domain https://secrets.meghsakha.com secrets --env=dev
|
|
||||||
```
|
|
||||||
|
|
||||||
You should see a table of secret names + values. If you get an auth error, re-run `infisical login`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 3. Verify the project link
|
|
||||||
|
|
||||||
The repo already contains `.infisical.json` pointing at the `breakpilot-compliance` project:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cat .infisical.json
|
|
||||||
# { "workspaceId": "996bda36-9e01-4071-ae8d-69a9f9ff5a23", ... }
|
|
||||||
```
|
|
||||||
|
|
||||||
If the file is missing (rare — only if you reset the repo), recreate it:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
infisical init --domain https://secrets.meghsakha.com
|
|
||||||
```
|
|
||||||
|
|
||||||
Pick the `breakpilot-compliance` project from the picker.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 4. Launch the stack
|
|
||||||
|
|
||||||
```bash
|
|
||||||
make dev
|
|
||||||
```
|
|
||||||
|
|
||||||
This runs `infisical run --env=dev -- docker compose up`. Every service in the compose stack sees its secrets as normal env vars — no `.env` file ever touches disk.
|
|
||||||
|
|
||||||
Other targets:
|
|
||||||
|
|
||||||
| Target | What it does |
|
|
||||||
|--------|--------------|
|
|
||||||
| `make dev-build` | Same as `make dev` but rebuilds images first |
|
|
||||||
| `make dev-down` | Stop the stack (no secrets needed) |
|
|
||||||
| `make dev-logs` | Tail logs |
|
|
||||||
| `make dev-ps` | List running containers |
|
|
||||||
| `make secrets` | Print all secrets in `dev` (read-only) |
|
|
||||||
| `make secrets-set KEY=FOO VALUE=bar` | Add or update a secret in `dev` |
|
|
||||||
|
|
||||||
To target a different environment:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
make dev ENV=staging
|
|
||||||
make secrets ENV=prod
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 5. Using secrets from Claude Code
|
|
||||||
|
|
||||||
When Claude Code runs commands in this repo via its Bash tool, the commands inherit your shell's environment. Two patterns:
|
|
||||||
|
|
||||||
**Pattern A — let Claude launch the stack normally**
|
|
||||||
|
|
||||||
Claude just runs `make dev`. The Infisical CLI inside that command resolves secrets at run time and passes them to docker compose. Claude doesn't see plaintext secrets in its context, but the running services do.
|
|
||||||
|
|
||||||
**Pattern B — let Claude run a one-off script with secrets**
|
|
||||||
|
|
||||||
If Claude needs to execute a Python/Go script that requires secrets, wrap the command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
infisical run --env=dev -- python scripts/some_one_off.py
|
|
||||||
```
|
|
||||||
|
|
||||||
This works for any subprocess: pytest, alembic, go run, npm scripts. If Claude proposes a command that reads env vars and runs raw, ask it to wrap it in `infisical run --env=dev --` first.
|
|
||||||
|
|
||||||
**What Claude should not do:**
|
|
||||||
|
|
||||||
- `infisical export --env=dev > .env` — defeats the whole point and the `.gitignore` will still try to keep the file out.
|
|
||||||
- `infisical secrets get KEY --env=dev --raw` and pasting the value into a code edit — secrets must stay out of the repo.
|
|
||||||
|
|
||||||
If you want Claude to never accidentally dump secrets, add this to your `.claude/settings.json` permissions (project-level or user-level):
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"permissions": {
|
|
||||||
"deny": [
|
|
||||||
"Bash(infisical export*)",
|
|
||||||
"Bash(infisical secrets get*)"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
| Symptom | Fix |
|
|
||||||
|---------|-----|
|
|
||||||
| `please either run infisical init or pass --projectId` | `.infisical.json` is missing or unreadable — re-run `infisical init` |
|
|
||||||
| `unauthorized` / `please log in` | Re-run `infisical login --domain https://secrets.meghsakha.com` |
|
|
||||||
| `make dev` says secret is empty | Check the name in `make secrets` matches what docker-compose expects, then update the service config or rename the secret in Infisical |
|
|
||||||
| Browser SSO doesn't open | Use `infisical login --domain https://secrets.meghsakha.com --method=user` and paste the URL manually |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## What the dev env contains
|
|
||||||
|
|
||||||
Run `make secrets` to see the live list. As of this writing the dev env includes (at minimum):
|
|
||||||
|
|
||||||
- `BREAKPILOT_DB_PASSWORD`
|
|
||||||
- `BREAKPILOT_QDRANT_API_KEY`
|
|
||||||
- `LITELLM_API_KEY`
|
|
||||||
|
|
||||||
Every other variable in `.env.example` either has a sane default in `docker-compose.yml` or needs to be added to Infisical. To add one:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
make secrets-set KEY=ANTHROPIC_API_KEY VALUE=sk-ant-xxxx
|
|
||||||
```
|
|
||||||
|
|
||||||
Or via the web UI: <https://secrets.meghsakha.com>.
|
|
||||||
@@ -1,57 +0,0 @@
|
|||||||
# breakpilot-compliance — developer workflow
|
|
||||||
#
|
|
||||||
# Secrets are managed in Infisical (secrets.meghsakha.com). The project
|
|
||||||
# link lives in .infisical.json. To get started:
|
|
||||||
# 1) infisical login --domain https://secrets.meghsakha.com (once per machine)
|
|
||||||
# 2) make dev
|
|
||||||
#
|
|
||||||
# .env / .env.local are NOT used in this repo anymore. Anything that needs
|
|
||||||
# secrets MUST be launched through `infisical run` so the values come from
|
|
||||||
# the secrets store instead of disk.
|
|
||||||
|
|
||||||
INFISICAL ?= infisical
|
|
||||||
INFISICAL_DOMAIN ?= https://secrets.meghsakha.com
|
|
||||||
ENV ?= dev
|
|
||||||
|
|
||||||
INFISICAL_RUN := $(INFISICAL) --domain $(INFISICAL_DOMAIN) run --env=$(ENV) --
|
|
||||||
INFISICAL_SECRETS := $(INFISICAL) --domain $(INFISICAL_DOMAIN) secrets --env=$(ENV)
|
|
||||||
|
|
||||||
.PHONY: help dev dev-build dev-down dev-logs dev-ps secrets secrets-set check-loc
|
|
||||||
|
|
||||||
help:
|
|
||||||
@echo "Targets:"
|
|
||||||
@echo " dev Start the full compose stack with secrets injected from Infisical"
|
|
||||||
@echo " dev-build Same as dev, but force a rebuild first"
|
|
||||||
@echo " dev-down Stop the compose stack (no secrets needed)"
|
|
||||||
@echo " dev-logs Tail logs from all services"
|
|
||||||
@echo " dev-ps Show running containers"
|
|
||||||
@echo " secrets List all secrets in the current env ($(ENV))"
|
|
||||||
@echo " secrets-set Set a secret (KEY=... VALUE=...)"
|
|
||||||
@echo " check-loc Run the 500-line LOC guard"
|
|
||||||
|
|
||||||
dev:
|
|
||||||
$(INFISICAL_RUN) docker compose up
|
|
||||||
|
|
||||||
dev-build:
|
|
||||||
$(INFISICAL_RUN) docker compose up --build
|
|
||||||
|
|
||||||
dev-down:
|
|
||||||
docker compose down
|
|
||||||
|
|
||||||
dev-logs:
|
|
||||||
docker compose logs -f
|
|
||||||
|
|
||||||
dev-ps:
|
|
||||||
docker compose ps
|
|
||||||
|
|
||||||
secrets:
|
|
||||||
$(INFISICAL_SECRETS)
|
|
||||||
|
|
||||||
secrets-set:
|
|
||||||
@if [ -z "$(KEY)" ] || [ -z "$(VALUE)" ]; then \
|
|
||||||
echo "Usage: make secrets-set KEY=MY_KEY VALUE=my_value"; exit 1; \
|
|
||||||
fi
|
|
||||||
$(INFISICAL) --domain $(INFISICAL_DOMAIN) secrets set $(KEY)=$(VALUE) --env=$(ENV)
|
|
||||||
|
|
||||||
check-loc:
|
|
||||||
bash scripts/check-loc.sh
|
|
||||||
@@ -42,26 +42,23 @@ All containers share the external `breakpilot-network` Docker network and depend
|
|||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
**Prerequisites:** Docker, Go 1.24+, Python 3.12+, Node.js 20+, [Infisical CLI](https://infisical.com/docs/cli/overview)
|
**Prerequisites:** Docker, Go 1.24+, Python 3.12+, Node.js 20+
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone ssh://git@gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-compliance.git
|
git clone ssh://git@gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-compliance.git
|
||||||
cd breakpilot-compliance
|
cd breakpilot-compliance
|
||||||
|
|
||||||
# One-time per machine: log in to the self-hosted Infisical instance
|
# Copy and populate secrets (never commit .env)
|
||||||
infisical login --domain https://secrets.meghsakha.com
|
cp .env.example .env
|
||||||
|
|
||||||
# Start the full stack with secrets injected from Infisical (env=dev)
|
# Start all services
|
||||||
make dev
|
docker compose up -d
|
||||||
```
|
```
|
||||||
|
|
||||||
Secrets are pulled from Infisical (`secrets.meghsakha.com`) at runtime; `.env` files are not used. See [INFISICAL_SETUP.md](./INFISICAL_SETUP.md) for full onboarding, and `make help` for the rest of the targets (`dev-build`, `dev-down`, `secrets`, `secrets-set`).
|
|
||||||
|
|
||||||
For the Orca/Hetzner production target (x86_64), use the override:
|
For the Orca/Hetzner production target (x86_64), use the override:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
make dev ENV=prod # or:
|
docker compose -f docker-compose.yml -f docker-compose.hetzner.yml up -d
|
||||||
infisical run --env=prod -- docker compose -f docker-compose.yml -f docker-compose.hetzner.yml up -d
|
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
@@ -35,25 +35,6 @@ Dies ist ein **Legal RAG**. Eine falsch zitierte Fundstelle ist schlimmer als ga
|
|||||||
- **Interne IDs** (Control-IDs wie SEC-xxxx, MC-/M-Nummern) gehoeren NICHT in die Nutzerantwort
|
- **Interne IDs** (Control-IDs wie SEC-xxxx, MC-/M-Nummern) gehoeren NICHT in die Nutzerantwort
|
||||||
als Hauptaussage — fuehre die Pflicht im Klartext, eine ID hoechstens in Klammern nachgestellt.
|
als Hauptaussage — fuehre die Pflicht im Klartext, eine ID hoechstens in Klammern nachgestellt.
|
||||||
|
|
||||||
## Korpus-Autoritaet & Aktualitaet — der Kontext schlaegt dein Gedaechtnis (KRITISCH)
|
|
||||||
Gesetze aendern sich nach deinem Trainingsstand. Der bereitgestellte RAG-/Controls-Kontext bildet
|
|
||||||
den AKTUELLEN Rechtsstand ab — dein Trainingswissen kann veraltet sein. Diese Regel gilt fuer
|
|
||||||
FAKTEN, nicht nur fuer Fundstellen (ergaenzt **Quellentreue**).
|
|
||||||
- Rechtliche **Fakten** (Schwellenwerte, Fristen, Zahlen, ob/ab-wann eine Pflicht gilt,
|
|
||||||
Zustaendigkeiten) nimmst du AUSSCHLIESSLICH aus dem bereitgestellten Kontext. Dein Trainingswissen
|
|
||||||
dient nur fuer Sprache, Struktur und Schlussfolgerung — **niemals als Rechtsquelle**.
|
|
||||||
- Steht ein gefragter Fakt NICHT im Kontext: gib KEINE aus dem Gedaechtnis erinnerte Zahl/Frist/
|
|
||||||
Schwelle aus — auch nicht beilaeufig im Fliesstext ohne Fundstelle. Sag offen, dass du ihn aus
|
|
||||||
deinen geprueften Quellen nicht belegen kannst, nenne Pflicht/Thema allgemein, und biete den
|
|
||||||
naechsten Schritt an (gezielt nachschlagen / mit DSB oder Anwalt verifizieren).
|
|
||||||
- **Konflikt-Transparenz**: Weicht der Kontext von dem ab, was dir "gelaeufig" vorkommt, gewinnt
|
|
||||||
IMMER der Kontext. Mach es ruhig transparent — z.B. "Die aktuelle Quelle nennt 20; eine evtl.
|
|
||||||
aeltere, gelaeufige Annahme (10) gilt hier nicht."
|
|
||||||
- **Co-Pilot-Ton, keine Roboter-Verweigerung**: formuliere "Aus meinen geprueften Quellen kann ich
|
|
||||||
X nicht belegen — ich kann es gezielt nachschlagen, oder du klaerst es mit deinem DSB/Anwalt"
|
|
||||||
statt eines harten "Nein". Du bleibst hilfreicher Begleiter, gibst dem Nutzer aber keine
|
|
||||||
ungesicherte Rechtsangabe als Tatsache mit.
|
|
||||||
|
|
||||||
## Kompetenzbereich
|
## Kompetenzbereich
|
||||||
- DSGVO Art. 1-99 + Erwaegsgruende
|
- DSGVO Art. 1-99 + Erwaegsgruende
|
||||||
- BDSG (Bundesdatenschutzgesetz)
|
- BDSG (Bundesdatenschutzgesetz)
|
||||||
|
|||||||
@@ -80,7 +80,7 @@ export async function POST(request: NextRequest) {
|
|||||||
let systemContent = soulPrompt || FALLBACK_SYSTEM_PROMPT
|
let systemContent = soulPrompt || FALLBACK_SYSTEM_PROMPT
|
||||||
if (validCountry) systemContent += countryBlock(validCountry)
|
if (validCountry) systemContent += countryBlock(validCountry)
|
||||||
if (ragContext) {
|
if (ragContext) {
|
||||||
systemContent += `\n\n## Relevanter Kontext aus dem RAG-System (deine EINZIGEN Rechtsquellen)\n\nDies sind deine einzigen zulaessigen Rechtsquellen. Triff keine konkrete Rechtsaussage (Zahl, Frist, Schwelle, Pflicht, Fundstelle), die nicht hier oder im Controls-Block belegt ist — sonst sage offen, dass du sie aus deinen Quellen nicht belegen kannst. Verweise in deiner Antwort auf die jeweilige Quelle:\n\n${ragContext}`
|
systemContent += `\n\n## Relevanter Kontext aus dem RAG-System\n\nNutze die folgenden Quellen fuer deine Antwort. Verweise in deiner Antwort auf die jeweilige Quelle:\n\n${ragContext}`
|
||||||
}
|
}
|
||||||
if (controlsContext) systemContent += `\n\n${controlsContext}`
|
if (controlsContext) systemContent += `\n\n${controlsContext}`
|
||||||
systemContent += `\n\n## Aktueller SDK-Schritt\nDer Nutzer befindet sich im SDK-Schritt: ${currentStep}`
|
systemContent += `\n\n## Aktueller SDK-Schritt\nDer Nutzer befindet sich im SDK-Schritt: ${currentStep}`
|
||||||
|
|||||||
@@ -46,28 +46,6 @@ export interface CorpusOverview {
|
|||||||
totals: { documents: number; catalog_sources: number }
|
totals: { documents: number; catalog_sources: number }
|
||||||
}
|
}
|
||||||
|
|
||||||
// --- Ingested legal-corpus structure (from the vector store, via the Go SDK).
|
|
||||||
// Shows WHAT each eur-lex act consists of (articles/annexes/recitals), so the
|
|
||||||
// ingested corpus is not a black box for developers. ---
|
|
||||||
export interface LegalActStructure {
|
|
||||||
regulation_short: string
|
|
||||||
regulation_name: string
|
|
||||||
articles: number
|
|
||||||
annexes: number
|
|
||||||
recitals: number
|
|
||||||
chunks: number
|
|
||||||
}
|
|
||||||
|
|
||||||
export interface LegalCorpus {
|
|
||||||
regulations: LegalActStructure[]
|
|
||||||
totals: {
|
|
||||||
regulations: number
|
|
||||||
articles: number
|
|
||||||
annexes: number
|
|
||||||
recitals: number
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// --- Korpus-Dokumente: gruppieren nach Art (Gesetz/Leitfaden/Standard/Urteil)
|
// --- Korpus-Dokumente: gruppieren nach Art (Gesetz/Leitfaden/Standard/Urteil)
|
||||||
// + Herausgeber-Familie (DSK, EDPB, OWASP, NIST …). Deterministisch, pure. ---
|
// + Herausgeber-Familie (DSK, EDPB, OWASP, NIST …). Deterministisch, pure. ---
|
||||||
interface DocCat {
|
interface DocCat {
|
||||||
|
|||||||
@@ -3,7 +3,6 @@ import Link from 'next/link'
|
|||||||
import {
|
import {
|
||||||
type UseCaseRow,
|
type UseCaseRow,
|
||||||
type CorpusOverview,
|
type CorpusOverview,
|
||||||
type LegalCorpus,
|
|
||||||
licenseTierBadgeClass,
|
licenseTierBadgeClass,
|
||||||
commercialBadgeClass,
|
commercialBadgeClass,
|
||||||
groupUseCases,
|
groupUseCases,
|
||||||
@@ -12,46 +11,28 @@ import {
|
|||||||
|
|
||||||
const BACKEND_URL =
|
const BACKEND_URL =
|
||||||
process.env.COMPLIANCE_BACKEND_URL || 'http://backend-compliance:8002'
|
process.env.COMPLIANCE_BACKEND_URL || 'http://backend-compliance:8002'
|
||||||
// The legal-corpus structure comes from the Go SDK (it owns the vector store).
|
|
||||||
const SDK_URL = process.env.SDK_URL || 'http://ai-compliance-sdk:8090'
|
|
||||||
|
|
||||||
export const dynamic = 'force-dynamic'
|
export const dynamic = 'force-dynamic'
|
||||||
|
|
||||||
// Fetched from the SDK and isolated in its own try/catch so a vector-store
|
|
||||||
// hiccup degrades to "no structure shown" instead of blanking the whole page.
|
|
||||||
async function fetchLegalCorpus(): Promise<LegalCorpus | null> {
|
|
||||||
try {
|
|
||||||
const res = await fetch(`${SDK_URL}/sdk/v1/rag/legal-corpus`, {
|
|
||||||
cache: 'no-store',
|
|
||||||
})
|
|
||||||
return res.ok ? await res.json() : null
|
|
||||||
} catch {
|
|
||||||
return null
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async function getData(): Promise<{
|
async function getData(): Promise<{
|
||||||
useCases: UseCaseRow[]
|
useCases: UseCaseRow[]
|
||||||
corpus: CorpusOverview | null
|
corpus: CorpusOverview | null
|
||||||
legalCorpus: LegalCorpus | null
|
|
||||||
}> {
|
}> {
|
||||||
try {
|
try {
|
||||||
const [ucRes, corpusRes, legalCorpus] = await Promise.all([
|
const [ucRes, corpusRes] = await Promise.all([
|
||||||
fetch(`${BACKEND_URL}/api/compliance/v1/controls/use-cases`, {
|
fetch(`${BACKEND_URL}/api/compliance/v1/controls/use-cases`, {
|
||||||
cache: 'no-store',
|
cache: 'no-store',
|
||||||
}),
|
}),
|
||||||
fetch(`${BACKEND_URL}/api/compliance/v1/controls/corpus`, {
|
fetch(`${BACKEND_URL}/api/compliance/v1/controls/corpus`, {
|
||||||
cache: 'no-store',
|
cache: 'no-store',
|
||||||
}),
|
}),
|
||||||
fetchLegalCorpus(),
|
|
||||||
])
|
])
|
||||||
return {
|
return {
|
||||||
useCases: ucRes.ok ? await ucRes.json() : [],
|
useCases: ucRes.ok ? await ucRes.json() : [],
|
||||||
corpus: corpusRes.ok ? await corpusRes.json() : null,
|
corpus: corpusRes.ok ? await corpusRes.json() : null,
|
||||||
legalCorpus,
|
|
||||||
}
|
}
|
||||||
} catch {
|
} catch {
|
||||||
return { useCases: [], corpus: null, legalCorpus: null }
|
return { useCases: [], corpus: null }
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -65,7 +46,7 @@ function Stat({ label, value }: { label: string; value: string | number }) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
export default async function CoveragePage() {
|
export default async function CoveragePage() {
|
||||||
const { useCases, corpus, legalCorpus } = await getData()
|
const { useCases, corpus } = await getData()
|
||||||
const groups = groupUseCases(useCases)
|
const groups = groupUseCases(useCases)
|
||||||
const totalRelevant = useCases.reduce((s, u) => s + u.atom_relevant, 0)
|
const totalRelevant = useCases.reduce((s, u) => s + u.atom_relevant, 0)
|
||||||
const totalAtoms = useCases.reduce((s, u) => s + u.atom_total, 0)
|
const totalAtoms = useCases.reduce((s, u) => s + u.atom_total, 0)
|
||||||
@@ -240,67 +221,6 @@ export default async function CoveragePage() {
|
|||||||
</div>
|
</div>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
{legalCorpus?.regulations?.length ? (
|
|
||||||
<section className="space-y-2">
|
|
||||||
<h2 className="text-lg font-semibold text-gray-900">
|
|
||||||
Ingestierter Rechtskorpus – Struktur ({legalCorpus.totals.regulations}{' '}
|
|
||||||
Rechtsakte)
|
|
||||||
</h2>
|
|
||||||
<p className="text-xs text-gray-500">
|
|
||||||
Woraus jeder ingestierte eur-lex-Rechtsakt tatsächlich besteht:
|
|
||||||
Artikel (§), Anhänge, Erwägungsgründe und retrievbare Chunks — direkt
|
|
||||||
aus dem Vektorspeicher, damit kein Black-Box-Korpus entsteht.
|
|
||||||
</p>
|
|
||||||
<div className="overflow-auto rounded-lg border border-gray-200">
|
|
||||||
<table className="min-w-full divide-y divide-gray-200 text-sm">
|
|
||||||
<thead className="bg-gray-50 text-left text-xs uppercase text-gray-500">
|
|
||||||
<tr>
|
|
||||||
<th className="px-4 py-2">Rechtsakt</th>
|
|
||||||
<th className="px-4 py-2 text-right">Artikel (§)</th>
|
|
||||||
<th className="px-4 py-2 text-right">Anhänge</th>
|
|
||||||
<th className="px-4 py-2 text-right">Erwägungsgründe</th>
|
|
||||||
<th className="px-4 py-2 text-right">Chunks</th>
|
|
||||||
</tr>
|
|
||||||
</thead>
|
|
||||||
<tbody className="divide-y divide-gray-100 bg-white">
|
|
||||||
{legalCorpus.regulations.map((r) => (
|
|
||||||
<tr key={r.regulation_short}>
|
|
||||||
<td className="px-4 py-2 text-gray-900">
|
|
||||||
<span className="font-medium">{r.regulation_short}</span>
|
|
||||||
{r.regulation_name !== r.regulation_short ? (
|
|
||||||
<span className="ml-2 text-xs text-gray-500">
|
|
||||||
{r.regulation_name}
|
|
||||||
</span>
|
|
||||||
) : null}
|
|
||||||
</td>
|
|
||||||
<td className="px-4 py-2 text-right font-semibold">
|
|
||||||
{r.articles.toLocaleString('de-DE')}
|
|
||||||
</td>
|
|
||||||
<td className="px-4 py-2 text-right">
|
|
||||||
{r.annexes > 0 ? (
|
|
||||||
r.annexes.toLocaleString('de-DE')
|
|
||||||
) : (
|
|
||||||
<span className="text-gray-300">—</span>
|
|
||||||
)}
|
|
||||||
</td>
|
|
||||||
<td className="px-4 py-2 text-right text-gray-500">
|
|
||||||
{r.recitals > 0 ? (
|
|
||||||
r.recitals.toLocaleString('de-DE')
|
|
||||||
) : (
|
|
||||||
<span className="text-gray-300">—</span>
|
|
||||||
)}
|
|
||||||
</td>
|
|
||||||
<td className="px-4 py-2 text-right text-gray-500">
|
|
||||||
{r.chunks.toLocaleString('de-DE')}
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
))}
|
|
||||||
</tbody>
|
|
||||||
</table>
|
|
||||||
</div>
|
|
||||||
</section>
|
|
||||||
) : null}
|
|
||||||
|
|
||||||
{corpus?.license_catalog?.length ? (
|
{corpus?.license_catalog?.length ? (
|
||||||
<section className="space-y-2">
|
<section className="space-y-2">
|
||||||
<h2 className="text-lg font-semibold text-gray-900">
|
<h2 className="text-lg font-semibold text-gray-900">
|
||||||
|
|||||||
@@ -55,7 +55,8 @@ linters-settings:
|
|||||||
rules:
|
rules:
|
||||||
- name: exported
|
- name: exported
|
||||||
arguments:
|
arguments:
|
||||||
- disableStutteringCheck
|
- checkPrivateReceivers: false
|
||||||
|
- disableStutteringCheck: true
|
||||||
- name: error-return
|
- name: error-return
|
||||||
- name: increment-decrement
|
- name: increment-decrement
|
||||||
- name: var-declaration
|
- name: var-declaration
|
||||||
@@ -82,6 +83,6 @@ issues:
|
|||||||
max-issues-per-linter: 50
|
max-issues-per-linter: 50
|
||||||
max-same-issues: 5
|
max-same-issues: 5
|
||||||
|
|
||||||
# New code only: lint lines changed vs main, so pre-existing debt doesn't fail CI.
|
# New code only: don't fail on pre-existing issues in files we haven't touched.
|
||||||
# Needs the go-lint job to clone with a local `main` ref (see .gitea/workflows/ci.yaml).
|
# Remove this once a clean baseline is established.
|
||||||
new-from-merge-base: main
|
new: false
|
||||||
|
|||||||
@@ -1,24 +0,0 @@
|
|||||||
// Control-Mapping: CRA Annex I -> OWASP ASVS 5.0. Eine Zeile = ein Mapping (Schema: ControlMapping).
|
|
||||||
// Reviewt 2026-06-25 (benjamin): 7 accepted, 13 rejected. accepted = Audit-Wahrheit (Advisor nutzt acceptedOnly).
|
|
||||||
// rejected bleiben als Audit-Spur ("warum verworfen"). KEIN confidence — kuratiert = fachliche Feststellung.
|
|
||||||
// Architekturbeweis: CRA -> OWASP fuer AppSec/Auth/Crypto/Logging; Ops/Update/Attack-Surface/Integritaet -> NIST/BSI.
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(c) — Schutz vor unbefugtem Zugriff", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V6.3.1", "mapping_type": "supports", "mapping_status": "accepted", "provenance": "human_curated", "rationale": "V6 = Authentication.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "V6 = Authentication, sauberer Treffer fuer Zugriffsschutz/Authentisierung.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(c) — Schutz vor unbefugtem Zugriff", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V6.1.1", "mapping_type": "supports", "mapping_status": "accepted", "provenance": "human_curated", "rationale": "V6 = Authentication.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "V6 = Authentication, sauberer Treffer fuer Zugriffsschutz/Authentisierung.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(d) — Vertraulichkeit / Verschluesselung", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V11.2.1", "mapping_type": "supports", "mapping_status": "accepted", "provenance": "human_curated", "rationale": "V11 = Cryptography.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "Korrektur von V14: V11 = Cryptography, richtiger Bereich fuer Verschluesselung.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(d) — Vertraulichkeit / Verschluesselung", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V11.7.1", "mapping_type": "supports", "mapping_status": "accepted", "provenance": "human_curated", "rationale": "V11.7 = Key Management.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "Korrektur von V14: V11.7 = Key Management fuer Verschluesselung/Schluesselverwaltung.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(k) — Sicherheitsrelevante Ereignisse / Logging", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V16.3.3", "mapping_type": "supports", "mapping_status": "accepted", "provenance": "human_curated", "rationale": "V16 = Security Logging.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "V16 = Logging, sauberer Treffer fuer sicherheitsrelevante Ereignisse.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(k) — Sicherheitsrelevante Ereignisse / Logging", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V16.3.4", "mapping_type": "supports", "mapping_status": "accepted", "provenance": "human_curated", "rationale": "V16 = Security Logging.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "V16 = Logging, sauberer Treffer fuer sicherheitsrelevante Ereignisse.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(k) — Sicherheitsrelevante Ereignisse / Logging", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V16.1.1", "mapping_type": "supports", "mapping_status": "accepted", "provenance": "human_curated", "rationale": "V16 = Security Logging.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "V16 = Logging, sauberer Treffer fuer sicherheitsrelevante Ereignisse.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(c) — Schutz vor unbefugtem Zugriff", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V14.2.4", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "V14 = Config, kein Auth — verworfen.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(d) — Vertraulichkeit / Verschluesselung", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V14.2.4", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "V14 = Config, Crypto gehoert zu V11 — verworfen.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(d) — Vertraulichkeit / Verschluesselung", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V14.3.2", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "V14 = Config, Crypto gehoert zu V11 — verworfen.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(d) — Vertraulichkeit / Verschluesselung", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V14.2.3", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "V14 = Config, Crypto gehoert zu V11 — verworfen.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(e) — Integritaet", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V14.2.4", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "OWASP ASVS ist hier nicht der passende Zielstandard. Mapping ueber NIST/BSI erforderlich.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(e) — Integritaet", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V1.2.4", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "OWASP ASVS ist hier nicht der passende Zielstandard. Mapping ueber NIST/BSI erforderlich.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(e) — Integritaet", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V6.1.1", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "OWASP ASVS ist hier nicht der passende Zielstandard. Mapping ueber NIST/BSI erforderlich.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(l) — Sichere Updates", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V14.2.4", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "OWASP ASVS ist hier nicht der passende Zielstandard. Mapping ueber NIST/BSI erforderlich.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(l) — Sichere Updates", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V2.4.1", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "OWASP ASVS ist hier nicht der passende Zielstandard. Mapping ueber NIST/BSI erforderlich.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(l) — Sichere Updates", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V6.1.1", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "OWASP ASVS ist hier nicht der passende Zielstandard. Mapping ueber NIST/BSI erforderlich.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(i) — Angriffsflaeche minimieren", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V6.1.1", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "OWASP ASVS ist hier nicht der passende Zielstandard. Mapping ueber NIST/BSI erforderlich.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(i) — Angriffsflaeche minimieren", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V15.3.3", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "OWASP ASVS ist hier nicht der passende Zielstandard. Mapping ueber NIST/BSI erforderlich.", "version": "2026-06-25"}
|
|
||||||
{"source_norm": "CRA Annex I Part I (2)(i) — Angriffsflaeche minimieren", "source_role": "operational_requirement", "target_framework": "OWASP ASVS", "target_control": "V8.2.4", "mapping_type": "related", "mapping_status": "rejected", "provenance": "human_curated", "rationale": "Retriever-Kandidat.", "reviewed_by": "benjamin", "review_date": "2026-06-25", "review_reason": "OWASP ASVS ist hier nicht der passende Zielstandard. Mapping ueber NIST/BSI erforderlich.", "version": "2026-06-25"}
|
|
||||||
@@ -1,16 +0,0 @@
|
|||||||
// Evidence-Requirements je OWASP-ASVS-Control (Schema: EvidenceRequirement). Eine Zeile = eine geforderte Evidenz.
|
|
||||||
// Autoriert/kuratiert (nicht Retriever). Der Advisor kann eine CRA-Anforderung erst dann als erfuellt melden,
|
|
||||||
// wenn die required Evidenzen der gemappten, accepted Controls vorliegen + frisch genug sind.
|
|
||||||
// Stand 2026-06-25, Basis: die 7 accepted CRA->OWASP-Mappings (Auth V6, Crypto V11, Logging V16).
|
|
||||||
{"framework": "OWASP ASVS", "control": "V6.3.1", "evidence_type": "config_export", "evidence_source": "github", "freshness_requirement": "per_release", "required": true, "rationale": "IAM-/Zugriffskonfiguration als Nachweis der Authentisierungs-Anforderung.", "version": "2026-06-25"}
|
|
||||||
{"framework": "OWASP ASVS", "control": "V6.3.1", "evidence_type": "test_report", "evidence_source": "ci", "freshness_requirement": "per_release", "required": true, "rationale": "Automatisierter Zugriffstest (CI) belegt funktionierende Zugriffskontrolle.", "version": "2026-06-25"}
|
|
||||||
{"framework": "OWASP ASVS", "control": "V6.3.1", "evidence_type": "pentest", "evidence_source": "manual_upload", "freshness_requirement": "annually", "required": false, "rationale": "Jaehrlicher PenTest der Authentisierung — vertieft, aber nicht Pflicht je Release.", "version": "2026-06-25"}
|
|
||||||
{"framework": "OWASP ASVS", "control": "V6.1.1", "evidence_type": "config_export", "evidence_source": "github", "freshness_requirement": "per_release", "required": true, "rationale": "Rollenmodell/Auth-Architektur als Nachweis.", "version": "2026-06-25"}
|
|
||||||
{"framework": "OWASP ASVS", "control": "V11.2.1", "evidence_type": "config_export", "evidence_source": "github", "freshness_requirement": "per_release", "required": true, "rationale": "Krypto-Konfiguration (zugelassene Algorithmen) als Nachweis der Verschluesselung.", "version": "2026-06-25"}
|
|
||||||
{"framework": "OWASP ASVS", "control": "V11.2.1", "evidence_type": "sbom", "evidence_source": "ci", "freshness_requirement": "per_release", "required": true, "rationale": "SBOM weist die eingesetzten Krypto-Bibliotheken/-Versionen nach.", "version": "2026-06-25"}
|
|
||||||
{"framework": "OWASP ASVS", "control": "V11.7.1", "evidence_type": "policy", "evidence_source": "manual_upload", "freshness_requirement": "annually", "required": true, "rationale": "Key-Management-Policy (Rotation, Aufbewahrung) als organisatorischer Nachweis.", "version": "2026-06-25"}
|
|
||||||
{"framework": "OWASP ASVS", "control": "V11.7.1", "evidence_type": "config_export", "evidence_source": "github", "freshness_requirement": "per_release", "required": true, "rationale": "Konfiguration der Schluesselverwaltung als technischer Nachweis.", "version": "2026-06-25"}
|
|
||||||
{"framework": "OWASP ASVS", "control": "V16.3.3", "evidence_type": "audit_log", "evidence_source": "ci", "freshness_requirement": "continuous", "required": true, "rationale": "Security-Audit-Logs belegen, dass sicherheitsrelevante Ereignisse protokolliert werden.", "version": "2026-06-25"}
|
|
||||||
{"framework": "OWASP ASVS", "control": "V16.3.3", "evidence_type": "config_export", "evidence_source": "github", "freshness_requirement": "per_release", "required": true, "rationale": "Logging-Konfiguration als Nachweis der erfassten Ereignisarten.", "version": "2026-06-25"}
|
|
||||||
{"framework": "OWASP ASVS", "control": "V16.3.4", "evidence_type": "audit_log", "evidence_source": "ci", "freshness_requirement": "continuous", "required": true, "rationale": "Security-Audit-Logs.", "version": "2026-06-25"}
|
|
||||||
{"framework": "OWASP ASVS", "control": "V16.1.1", "evidence_type": "config_export", "evidence_source": "github", "freshness_requirement": "per_release", "required": true, "rationale": "Logging-Architektur-Konfiguration als Nachweis.", "version": "2026-06-25"}
|
|
||||||
@@ -211,13 +211,6 @@ func (h *IACEHandler) InitializeProject(c *gin.Context) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
for _, cat := range mp.HazardCats {
|
for _, cat := range mp.HazardCats {
|
||||||
// Native cyber/AI categories (frontend groups I+J) belong to the
|
|
||||||
// CRA module, not the traditional CE (ISO 12100) hazard log.
|
|
||||||
// Enforced centrally here so it holds for EVERY project.
|
|
||||||
if isCyberSecurityCategory(cat) {
|
|
||||||
fmt.Printf("CYBER-SKIP: cat=%s pattern=%s — routed to CRA module\n", cat, mp.PatternID)
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
maxForCat := categoryHazardCap(cat, len(comps))
|
maxForCat := categoryHazardCap(cat, len(comps))
|
||||||
if catCount[cat] >= maxForCat {
|
if catCount[cat] >= maxForCat {
|
||||||
continue
|
continue
|
||||||
@@ -298,10 +291,6 @@ func (h *IACEHandler) InitializeProject(c *gin.Context) {
|
|||||||
if len(mp.SuggestedMeasureIDs) > 0 {
|
if len(mp.SuggestedMeasureIDs) > 0 {
|
||||||
hazardPatternMeasures[hz.ID] = mp.SuggestedMeasureIDs
|
hazardPatternMeasures[hz.ID] = mp.SuggestedMeasureIDs
|
||||||
}
|
}
|
||||||
// E1: one hazard per pattern — keep only the primary (first
|
|
||||||
// eligible) category; a secondary category would be the same
|
|
||||||
// scenario+zone under a different label (cross-category duplicate).
|
|
||||||
break
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,45 +0,0 @@
|
|||||||
package handlers
|
|
||||||
|
|
||||||
// Safety/Security separation for the IACE hazard log.
|
|
||||||
//
|
|
||||||
// The traditional CE risk assessment (Maschinenrichtlinie / EN ISO 12100) and
|
|
||||||
// the cybersecurity assessment (Cyber Resilience Act) are two distinct steps.
|
|
||||||
// IACE owns the traditional, physical + functional-safety hazards; the CRA
|
|
||||||
// module (/sdk/iace/{id}/cra) owns the native cyber/AI topics and re-examines
|
|
||||||
// which safety functions a cyber attack can re-open (see iace-safety-bridge).
|
|
||||||
//
|
|
||||||
// The split is by the NATURE of the hazard, not by the component: a control
|
|
||||||
// fault, bus failure or botched update is FUNCTIONAL safety (random/systematic
|
|
||||||
// fault) and stays in CE — independent of whether the controller is a bought-in
|
|
||||||
// CE-marked PLC or the manufacturer's own embedded control. Only the security
|
|
||||||
// PROPERTIES against malicious actors (access control, firmware/update
|
|
||||||
// integrity, SBOM, vulnerability handling, default passwords) are CRA.
|
|
||||||
//
|
|
||||||
// Functional-safety control categories (software_control, software_fault,
|
|
||||||
// safety_function_failure, configuration_error, communication_failure,
|
|
||||||
// update_failure, sensor_fault, …) therefore intentionally STAY in IACE — they
|
|
||||||
// are the safety functions whose loss the CRA bridge re-examines.
|
|
||||||
//
|
|
||||||
// Enforced centrally in InitializeProject so it holds for EVERY project.
|
|
||||||
var nativeCyberSecurityCategories = map[string]bool{
|
|
||||||
// I. Cyber / Netzwerk — security against malicious actors
|
|
||||||
"unauthorized_access": true,
|
|
||||||
"firmware_corruption": true,
|
|
||||||
"cyber_resilience": true,
|
|
||||||
"logging_audit_failure": true,
|
|
||||||
"cyber_network": true,
|
|
||||||
"sensor_spoofing": true,
|
|
||||||
// J. KI-spezifisch
|
|
||||||
"ai_specific": true,
|
|
||||||
"ai_misclassification": true,
|
|
||||||
"false_classification": true,
|
|
||||||
"model_drift": true,
|
|
||||||
"data_poisoning": true,
|
|
||||||
"unintended_bias": true,
|
|
||||||
}
|
|
||||||
|
|
||||||
// isCyberSecurityCategory reports whether a hazard category is a native cyber/AI
|
|
||||||
// topic that belongs to the CRA module rather than the traditional CE hazard log.
|
|
||||||
func isCyberSecurityCategory(category string) bool {
|
|
||||||
return nativeCyberSecurityCategories[category]
|
|
||||||
}
|
|
||||||
@@ -1,37 +0,0 @@
|
|||||||
package handlers
|
|
||||||
|
|
||||||
import "testing"
|
|
||||||
|
|
||||||
func TestIsCyberSecurityCategory_RoutedToCRA(t *testing.T) {
|
|
||||||
cyber := []string{
|
|
||||||
"unauthorized_access", "firmware_corruption", "cyber_resilience",
|
|
||||||
"logging_audit_failure", "cyber_network", "sensor_spoofing",
|
|
||||||
"ai_specific", "ai_misclassification", "false_classification",
|
|
||||||
"model_drift", "data_poisoning", "unintended_bias",
|
|
||||||
}
|
|
||||||
for _, c := range cyber {
|
|
||||||
if !isCyberSecurityCategory(c) {
|
|
||||||
t.Errorf("category %q must be routed to the CRA module, not the traditional IACE log", c)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestIsCyberSecurityCategory_StaysInIACE(t *testing.T) {
|
|
||||||
// Physical + functional-safety categories must remain in the traditional CE
|
|
||||||
// hazard log. communication_failure (bus failure -> loss of control) and
|
|
||||||
// update_failure (botched update -> lost safety function) are FUNCTIONAL
|
|
||||||
// faults, not attacks, so they stay too.
|
|
||||||
keep := []string{
|
|
||||||
"mechanical_hazard", "electrical_hazard", "thermal_hazard",
|
|
||||||
"pneumatic_hydraulic", "noise_vibration", "ergonomic_hazard",
|
|
||||||
"material_environmental", "chemical_risk", "fire_explosion",
|
|
||||||
"software_control", "software_fault", "safety_function_failure",
|
|
||||||
"configuration_error", "sensor_fault", "hmi_error",
|
|
||||||
"communication_failure", "update_failure",
|
|
||||||
}
|
|
||||||
for _, c := range keep {
|
|
||||||
if isCyberSecurityCategory(c) {
|
|
||||||
t.Errorf("category %q must stay in the traditional IACE log, not be routed to CRA", c)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -78,7 +78,6 @@ func (h *RAGHandlers) Search(c *gin.Context) {
|
|||||||
"query": req.Query,
|
"query": req.Query,
|
||||||
"results": results,
|
"results": results,
|
||||||
"count": len(results),
|
"count": len(results),
|
||||||
"assessment": ucca.Assess(results),
|
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -207,32 +206,3 @@ func (h *RAGHandlers) HandleScrollChunks(c *gin.Context) {
|
|||||||
"total": len(chunks),
|
"total": len(chunks),
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
// LegalCorpusStructure returns the composition (distinct articles, annexes,
|
|
||||||
// recitals + chunk count) of every ingested eur-lex legal act, so the coverage
|
|
||||||
// page can show WHAT was ingested instead of just the act name.
|
|
||||||
// GET /sdk/v1/rag/legal-corpus
|
|
||||||
func (h *RAGHandlers) LegalCorpusStructure(c *gin.Context) {
|
|
||||||
acts, err := h.ragClient.CorpusStructure(c.Request.Context())
|
|
||||||
if err != nil {
|
|
||||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to aggregate legal corpus: " + err.Error()})
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
arts, anns, recs := 0, 0, 0
|
|
||||||
for _, a := range acts {
|
|
||||||
arts += a.Articles
|
|
||||||
anns += a.Annexes
|
|
||||||
recs += a.Recitals
|
|
||||||
}
|
|
||||||
|
|
||||||
c.JSON(http.StatusOK, gin.H{
|
|
||||||
"regulations": acts,
|
|
||||||
"totals": gin.H{
|
|
||||||
"regulations": len(acts),
|
|
||||||
"articles": arts,
|
|
||||||
"annexes": anns,
|
|
||||||
"recitals": recs,
|
|
||||||
},
|
|
||||||
})
|
|
||||||
}
|
|
||||||
|
|||||||
@@ -161,7 +161,6 @@ func registerRAGRoutes(v1 *gin.RouterGroup, h *handlers.RAGHandlers) {
|
|||||||
ragRoutes.GET("/corpus-status", h.CorpusStatus)
|
ragRoutes.GET("/corpus-status", h.CorpusStatus)
|
||||||
ragRoutes.GET("/corpus-versions/:collection", h.CorpusVersionHistory)
|
ragRoutes.GET("/corpus-versions/:collection", h.CorpusVersionHistory)
|
||||||
ragRoutes.GET("/scroll", h.HandleScrollChunks)
|
ragRoutes.GET("/scroll", h.HandleScrollChunks)
|
||||||
ragRoutes.GET("/legal-corpus", h.LegalCorpusStructure)
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -1,182 +0,0 @@
|
|||||||
package iace
|
|
||||||
|
|
||||||
import (
|
|
||||||
"encoding/json"
|
|
||||||
"os"
|
|
||||||
"path/filepath"
|
|
||||||
"sort"
|
|
||||||
"testing"
|
|
||||||
)
|
|
||||||
|
|
||||||
// GT #3 — commercial UNDERCOUNTER dishwasher (Winterhalter UC-M). Self-assessed
|
|
||||||
// ground truth: we can judge what a dishwasher is. The test runs the narrative
|
|
||||||
// through the SAME chain as production (ParseNarrative -> engine -> relevance
|
|
||||||
// filter + cyber-skip), so keyword/gating fixes are measured on the hazard set
|
|
||||||
// the user actually sees — not the raw pattern flood.
|
|
||||||
|
|
||||||
// Condensed UC-M limits_form narrative. Deliberately includes "Cool-Ausfuehrung"
|
|
||||||
// and "Filter" so the known false components (Kuehlaggregat, Absauganlage) are
|
|
||||||
// reproduced and visible in the baseline.
|
|
||||||
const warewashingNarrative = `Gewerbliche Untertisch-Geschirrspuelmaschine fuer Gastronomie-Kueche, ` +
|
|
||||||
`vernetzt ueber LAN und WLAN (Connected Wash Internetportal). Heisswasser-Boiler mit ` +
|
|
||||||
`Nachspueltemperatur ca. 85 Grad C, Tank mit Hygiene-Tankheizkoerper. Spuelpumpe 150-200 l/min ` +
|
|
||||||
`mit rotierenden Spuelfeldern und Spuelarmen, Ablaufpumpe. Eingebautes Dosiergeraet fuer Reiniger ` +
|
|
||||||
`und Klarspueler (aetzende Konzentrate). 4-fach-Laugenfiltration mit Filter. Doppelwandige Tuer ` +
|
|
||||||
`mit Sicherheitsschalter und Rastposition (Thermostopp). Elektromotor (Drehstrom) 400 V. ` +
|
|
||||||
`Touch-Steuerung (SPS) mit Bedienfeld und HMI, USB-Schnittstelle fuer Softwareupdates, ` +
|
|
||||||
`PIN-geschuetzter Servicetechniker-Fernzugriff. Cool-Ausfuehrung mit kalter Nachspuelung. ` +
|
|
||||||
`Untertischmontage. Eingreifen in die Spuelkammer moeglich. Aerosole und Daempfe der ` +
|
|
||||||
`Reinigungschemie gelangen in die Atemzone. Manuelles Be- und Entladen der Spuelkoerbe von Hand. ` +
|
|
||||||
`Reinigung und Wartung durch Servicetechniker. Branche Lebensmittel und Getraenke. ` +
|
|
||||||
`Siebe und scharfe Blechkanten in der Spuelkammer. Boiler kann bei Wassermangel trockenlaufen. ` +
|
|
||||||
`Frequenzumrichter und Elektronik mit Restspannung nach dem Abschalten. Wartung nur im ` +
|
|
||||||
`freigeschalteten Zustand; Gefahr des unerwarteten Wiederanlaufs. Frischwasseranschluss mit ` +
|
|
||||||
`Rueckflussverhinderer gegen Ruecksaugen in das Trinkwassernetz. Stehwasser im Boiler ` +
|
|
||||||
`(Hygiene/Legionellen). Standsicherheit bei Untertischmontage.`
|
|
||||||
|
|
||||||
// warewashingCyberCategories mirrors handlers.nativeCyberSecurityCategories —
|
|
||||||
// native cyber/AI hazards are routed to the CRA module, not the CE hazard log.
|
|
||||||
var warewashingCyberCategories = map[string]bool{
|
|
||||||
"unauthorized_access": true, "firmware_corruption": true, "cyber_resilience": true,
|
|
||||||
"logging_audit_failure": true, "cyber_network": true, "sensor_spoofing": true,
|
|
||||||
"ai_specific": true, "ai_misclassification": true, "false_classification": true,
|
|
||||||
"model_drift": true, "data_poisoning": true, "unintended_bias": true,
|
|
||||||
}
|
|
||||||
|
|
||||||
// warewashingEngineOutput runs the production chain and returns the filtered
|
|
||||||
// hazards/mitigations the user would see for the UC-M.
|
|
||||||
func warewashingEngineOutput() ([]Hazard, []Mitigation, int) {
|
|
||||||
res := ParseNarrative(warewashingNarrative, "Gewerbliche Untertisch-Geschirrspuelmaschine (vernetzt)")
|
|
||||||
|
|
||||||
var compIDs, compNames []string
|
|
||||||
for _, c := range res.Components {
|
|
||||||
if c.Negated {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
compIDs = append(compIDs, c.LibraryID)
|
|
||||||
compNames = append(compNames, c.NameDE)
|
|
||||||
}
|
|
||||||
var energyIDs []string
|
|
||||||
for _, e := range res.EnergySources {
|
|
||||||
energyIDs = append(energyIDs, e.SourceID)
|
|
||||||
}
|
|
||||||
lifecycles := append([]string{}, res.LifecyclePhases...)
|
|
||||||
lifecycles = append(lifecycles, "normal_operation", "maintenance", "cleaning", "setup", "fault_clearing")
|
|
||||||
|
|
||||||
input := MatchInput{
|
|
||||||
ComponentLibraryIDs: compIDs,
|
|
||||||
EnergySourceIDs: energyIDs,
|
|
||||||
LifecyclePhases: lifecycles,
|
|
||||||
CustomTags: res.CustomTags,
|
|
||||||
OperationalStates: append(res.OperationalStates, "normal_operation", "cleaning", "maintenance"),
|
|
||||||
HumanRoles: res.Roles,
|
|
||||||
MachineTypes: []string{"food_processing", "Gewerbliche Untertisch-Geschirrspuelmaschine (vernetzt)"},
|
|
||||||
}
|
|
||||||
|
|
||||||
out := NewPatternEngine().Match(input)
|
|
||||||
|
|
||||||
var kept []PatternMatch
|
|
||||||
for _, pm := range out.MatchedPatterns {
|
|
||||||
if !IsPatternRelevant(pm, warewashingNarrative, compNames) {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
allCyber := len(pm.HazardCats) > 0
|
|
||||||
for _, c := range pm.HazardCats {
|
|
||||||
if !warewashingCyberCategories[c] {
|
|
||||||
allCyber = false
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if allCyber {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
kept = append(kept, pm)
|
|
||||||
}
|
|
||||||
filtered := *out
|
|
||||||
filtered.MatchedPatterns = kept
|
|
||||||
hazards, mitigations := patternsToHazardsAndMitigations(&filtered)
|
|
||||||
return hazards, mitigations, len(kept)
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestWarewashing_GTCoverage(t *testing.T) {
|
|
||||||
gtPath := filepath.Join("testdata", "ground_truth_warewashing.json")
|
|
||||||
raw, err := os.ReadFile(gtPath)
|
|
||||||
if err != nil {
|
|
||||||
t.Fatalf("read GT: %v", err)
|
|
||||||
}
|
|
||||||
var gt GroundTruth
|
|
||||||
if err := json.Unmarshal(raw, >); err != nil {
|
|
||||||
t.Fatalf("parse GT: %v", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
{
|
|
||||||
res := ParseNarrative(warewashingNarrative, "Gewerbliche Untertisch-Geschirrspuelmaschine (vernetzt)")
|
|
||||||
var cn []string
|
|
||||||
for _, c := range res.Components {
|
|
||||||
if !c.Negated {
|
|
||||||
cn = append(cn, c.NameDE)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
t.Logf("Parsed components: %v", cn)
|
|
||||||
}
|
|
||||||
|
|
||||||
hazards, mitigations, nPatterns := warewashingEngineOutput()
|
|
||||||
t.Logf("Engine: %d patterns kept (relevance+cyber filter) -> %d hazards", nPatterns, len(hazards))
|
|
||||||
|
|
||||||
result := CompareBenchmark(>, hazards, mitigations)
|
|
||||||
precision := 0.0
|
|
||||||
if result.TotalEngine > 0 {
|
|
||||||
precision = float64(len(result.MatchedPairs)) / float64(result.TotalEngine)
|
|
||||||
}
|
|
||||||
t.Logf("=== Warewashing-GT (GT #3) Baseline ===")
|
|
||||||
t.Logf("Recall (Coverage): %.1f%% (%d/%d matched, %d missing)",
|
|
||||||
result.CoverageScore*100, len(result.MatchedPairs), result.TotalGT, len(result.MissingFromEngine))
|
|
||||||
t.Logf("Precision: %.1f%% (%d engine hazards, %d extra)",
|
|
||||||
precision*100, result.TotalEngine, len(result.ExtraInEngine))
|
|
||||||
|
|
||||||
if len(result.MissingFromEngine) > 0 {
|
|
||||||
t.Logf("--- MISSING (recall gaps) ---")
|
|
||||||
for _, m := range result.MissingFromEngine {
|
|
||||||
t.Logf(" MISS %s: %s", m.Nr, abbrev(m.HazardType, 60))
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Measure completeness: which generated hazards have NO protective measure?
|
|
||||||
t.Logf("--- Measure completeness ---")
|
|
||||||
t.Logf("Measure coverage (GT-matched): %.0f%%", result.MeasureCoverage*100)
|
|
||||||
withMeas := make(map[string]bool)
|
|
||||||
for _, m := range mitigations {
|
|
||||||
withMeas[m.HazardID.String()] = true
|
|
||||||
}
|
|
||||||
noMeasure := 0
|
|
||||||
for _, h := range hazards {
|
|
||||||
if !withMeas[h.ID.String()] {
|
|
||||||
noMeasure++
|
|
||||||
n := h.Name
|
|
||||||
if n == "" {
|
|
||||||
n = h.Scenario
|
|
||||||
}
|
|
||||||
t.Logf(" NO-MEASURE: [%s] %s", h.Category, abbrev(n, 60))
|
|
||||||
}
|
|
||||||
}
|
|
||||||
t.Logf("Hazards without any measure: %d/%d", noMeasure, len(hazards))
|
|
||||||
if len(result.ExtraInEngine) > 0 {
|
|
||||||
t.Logf("--- EXTRA (false positives / precision loss) ---")
|
|
||||||
names := make([]string, 0, len(result.ExtraInEngine))
|
|
||||||
for _, e := range result.ExtraInEngine {
|
|
||||||
n := e.Name
|
|
||||||
if n == "" {
|
|
||||||
n = e.Scenario
|
|
||||||
}
|
|
||||||
names = append(names, "["+e.Category+"] "+n)
|
|
||||||
}
|
|
||||||
sort.Strings(names)
|
|
||||||
for _, n := range names {
|
|
||||||
t.Logf(" EXTRA %s", abbrev(n, 85))
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Loose smoke floor for the baseline — fixes should push recall up, not down.
|
|
||||||
if result.CoverageScore < 0.4 {
|
|
||||||
t.Errorf("warewashing recall below 40%% floor: %.1f%%", result.CoverageScore*100)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -62,13 +62,6 @@ type HazardPattern struct {
|
|||||||
// "hazard" = source only, "hazardous_situation" = person exposed, "harm" = injury.
|
// "hazard" = source only, "hazardous_situation" = person exposed, "harm" = injury.
|
||||||
// Empty = default (hazardous_situation).
|
// Empty = default (hazardous_situation).
|
||||||
GeneratedHazardType string `json:"generated_hazard_type,omitempty"`
|
GeneratedHazardType string `json:"generated_hazard_type,omitempty"`
|
||||||
// GuardableByEnclosure marks a contact/entanglement hazard that an interlocked
|
|
||||||
// enclosure removes during normal operation. When the project emits the
|
|
||||||
// "interlocked_enclosure" tag, such a pattern is re-scoped to maintenance/
|
|
||||||
// cleaning (guard open) and does NOT fire as a normal-operation hazard.
|
|
||||||
// Generic EN ISO 14120 logic — surfaced by the warewashing GT (the spray
|
|
||||||
// arm rotates behind the interlocked door).
|
|
||||||
GuardableByEnclosure bool `json:"guardable_by_enclosure,omitempty"`
|
|
||||||
// RequiredFailureModes restricts this pattern to fire only when at least one
|
// RequiredFailureModes restricts this pattern to fire only when at least one
|
||||||
// of the listed failure modes is relevant (by ComponentType match against project components).
|
// of the listed failure modes is relevant (by ComponentType match against project components).
|
||||||
// Empty/nil = fires regardless of failure modes (backwards compatible).
|
// Empty/nil = fires regardless of failure modes (backwards compatible).
|
||||||
|
|||||||
@@ -37,7 +37,6 @@ func GetDGUVExtendedPatterns() []HazardPattern {
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
ID: "HP096", NameDE: "Reibung/Abrieb durch rotierende Oberflaechen", NameEN: "Friction/abrasion by rotating surfaces",
|
ID: "HP096", NameDE: "Reibung/Abrieb durch rotierende Oberflaechen", NameEN: "Friction/abrasion by rotating surfaces",
|
||||||
GuardableByEnclosure: true,
|
|
||||||
RequiredComponentTags: []string{"rotating_part"},
|
RequiredComponentTags: []string{"rotating_part"},
|
||||||
RequiredEnergyTags: []string{},
|
RequiredEnergyTags: []string{},
|
||||||
GeneratedHazardCats: []string{"mechanical_hazard"},
|
GeneratedHazardCats: []string{"mechanical_hazard"},
|
||||||
@@ -89,7 +88,6 @@ func GetDGUVExtendedPatterns() []HazardPattern {
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
ID: "HP101", NameDE: "Aufwickeln von Kleidung/Haaren", NameEN: "Winding up of clothing/hair",
|
ID: "HP101", NameDE: "Aufwickeln von Kleidung/Haaren", NameEN: "Winding up of clothing/hair",
|
||||||
GuardableByEnclosure: true,
|
|
||||||
RequiredComponentTags: []string{"rotating_part"},
|
RequiredComponentTags: []string{"rotating_part"},
|
||||||
RequiredEnergyTags: []string{"rotational"},
|
RequiredEnergyTags: []string{"rotational"},
|
||||||
GeneratedHazardCats: []string{"mechanical_hazard"},
|
GeneratedHazardCats: []string{"mechanical_hazard"},
|
||||||
|
|||||||
@@ -1,178 +0,0 @@
|
|||||||
package iace
|
|
||||||
|
|
||||||
// GetWarewashingPatterns returns hazard patterns for commercial warewashing
|
|
||||||
// machines (gewerbliche Geschirrspuelmaschinen / Untertisch-, Hauben-, Korb-
|
|
||||||
// und Bandspuelmaschinen). These capture the machine-specific hazards a
|
|
||||||
// Fachmann immediately expects but that the generic library did not cover:
|
|
||||||
// hot-water/steam scalding on door opening, hot surfaces, hot ware, corrosive
|
|
||||||
// detergent/rinse-aid contact, door pinch and wet-floor slipping.
|
|
||||||
//
|
|
||||||
// Every pattern is gated by the capability tag "dom_warewashing" (emitted only
|
|
||||||
// by warewashing narrative keywords in keyword_dictionary.go), so none of these
|
|
||||||
// leak into unrelated machine classes.
|
|
||||||
//
|
|
||||||
// HP range: HP2200-HP2206. ISO 12100 Annex B section identifiers only (facts);
|
|
||||||
// product standard EN 60335-2-58 (commercial dishwashing machines).
|
|
||||||
func GetWarewashingPatterns() []HazardPattern {
|
|
||||||
return []HazardPattern{
|
|
||||||
{
|
|
||||||
ID: "HP2200", NameDE: "Verbruehung durch Heisswasser/Dampf beim Oeffnen der Tuer", NameEN: "Scalding by hot water/steam when opening the door",
|
|
||||||
RequiredComponentTags: []string{"dom_warewashing", "steam_emission"},
|
|
||||||
GeneratedHazardCats: []string{"thermal_hazard"},
|
|
||||||
SuggestedMeasureIDs: []string{"M2200", "M2201", "M2202", "M2208"},
|
|
||||||
Priority: 94,
|
|
||||||
ApplicableLifecycles: []string{"normal_operation", "cleaning"},
|
|
||||||
ScenarioDE: "Beim Oeffnen der Tuer waehrend oder unmittelbar nach dem Spuelgang tritt ein Schwall aus heissem Wasser und Wrasen (Dampf) aus der Spuelkammer aus und trifft Gesicht, Haende und Arme des Bedieners.",
|
|
||||||
TriggerDE: "Tuer wird vor Programmende oder bei noch vorhandenem Restdampf geoeffnet; Tuerverriegelung fehlt oder ist ueberbrueckt; Nachspueltemperatur ca. 85 Grad C.",
|
|
||||||
HarmDE: "Verbruehung 1.-2. Grades an Gesicht, Haenden und Unterarmen; Augenreizung durch heissen Dampf.",
|
|
||||||
AffectedDE: "Bedienpersonal (Spuelkraft)",
|
|
||||||
ZoneDE: "Tuer- und Beschickungsoeffnung der Spuelkammer",
|
|
||||||
ISO12100Section: "6.2.4",
|
|
||||||
DefaultSeverity: 3, DefaultExposure: 4,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
ID: "HP2201", NameDE: "Verbrennung an heissen Oberflaechen (Boiler/Tank/Spuelkammer)", NameEN: "Burn on hot surfaces (boiler/tank/wash chamber)",
|
|
||||||
RequiredComponentTags: []string{"dom_warewashing", "high_temperature"},
|
|
||||||
GeneratedHazardCats: []string{"thermal_hazard"},
|
|
||||||
SuggestedMeasureIDs: []string{"M2202", "M055", "M2208"},
|
|
||||||
Priority: 90,
|
|
||||||
ApplicableLifecycles: []string{"cleaning", "maintenance"},
|
|
||||||
ScenarioDE: "Beruehrung heisser Oberflaechen von Boiler, Tankheizkoerper oder Spuelkammerwaenden bei Reinigung, Entkalkung oder Wartung fuehrt zu Kontaktverbrennungen.",
|
|
||||||
TriggerDE: "Reinigung/Entkalkung ohne Abkuehlzeit; Eingriff in die Spuelkammer bei betriebswarmem Geraet.",
|
|
||||||
HarmDE: "Kontaktverbrennung an Haenden und Unterarmen.",
|
|
||||||
AffectedDE: "Reinigungspersonal, Wartungspersonal",
|
|
||||||
ZoneDE: "Boiler, Tankheizkoerper, Spuelkammerwaende",
|
|
||||||
ISO12100Section: "6.2.4",
|
|
||||||
DefaultSeverity: 2, DefaultExposure: 3,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
ID: "HP2202", NameDE: "Verbrennung an heissem Spuelgut beim Entladen", NameEN: "Burn on hot ware when unloading",
|
|
||||||
RequiredComponentTags: []string{"dom_warewashing", "hot_water"},
|
|
||||||
GeneratedHazardCats: []string{"thermal_hazard"},
|
|
||||||
SuggestedMeasureIDs: []string{"M2202", "M055", "M2208"},
|
|
||||||
Priority: 86,
|
|
||||||
ApplicableLifecycles: []string{"normal_operation"},
|
|
||||||
ScenarioDE: "Geschirr, Glaeser und Bestecke sind nach dem Spuelgang durch die Heisswasser-Nachspuelung sehr heiss; beim Entladen kommt es zu Verbrennungen.",
|
|
||||||
TriggerDE: "Sofortiges Entnehmen des Spuelguts nach Programmende ohne Abkuehl-/Trocknungszeit.",
|
|
||||||
HarmDE: "Verbrennung an Haenden/Fingern beim Greifen heisser Teile.",
|
|
||||||
AffectedDE: "Bedienpersonal (Spuelkraft)",
|
|
||||||
ZoneDE: "Spuelkammer, Entnahmebereich/Korb",
|
|
||||||
ISO12100Section: "6.2.4",
|
|
||||||
DefaultSeverity: 2, DefaultExposure: 3,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
ID: "HP2203", NameDE: "Chemische Veraetzung (Haut/Augen) durch Reiniger-/Klarspueler-Konzentrat", NameEN: "Chemical burn (skin/eyes) from detergent/rinse-aid concentrate",
|
|
||||||
RequiredComponentTags: []string{"dom_warewashing", "corrosive_chemical"},
|
|
||||||
GeneratedHazardCats: []string{"chemical_risk"},
|
|
||||||
SuggestedMeasureIDs: []string{"M2203", "M2204", "M2208"},
|
|
||||||
Priority: 92,
|
|
||||||
ApplicableLifecycles: []string{"normal_operation", "maintenance"},
|
|
||||||
ScenarioDE: "Direkter Kontakt mit dem aetzenden (alkalischen) Reiniger- bzw. Klarspueler-Konzentrat beim Nachfuellen, Sauglanzenwechsel oder bei Leckage fuehrt zu Veraetzungen von Haut und Augen.",
|
|
||||||
TriggerDE: "Gebinde-/Sauglanzenwechsel ohne Schutzausruestung; Umfuellen von Konzentrat; undichte Dosierleitung.",
|
|
||||||
HarmDE: "Veraetzung von Haut und Augen (alkalische Verletzung), bleibende Augenschaeden moeglich.",
|
|
||||||
AffectedDE: "Bedienpersonal, Reinigungspersonal beim Chemikalien-Handling",
|
|
||||||
ZoneDE: "Dosiergeraet, Reiniger-/Klarspueler-Gebinde, Sauglanzen",
|
|
||||||
ISO12100Section: "6.2.4",
|
|
||||||
DefaultSeverity: 3, DefaultExposure: 3,
|
|
||||||
ClarificationQuestionsDE: []string{
|
|
||||||
"Liegt fuer alle eingesetzten Reiniger/Klarspueler/Entkalker ein aktuelles Sicherheitsdatenblatt (SDB) am Geraet vor?",
|
|
||||||
"Ist ein geschlossenes Dosiersystem mit Sauglanzen vorhanden, sodass kein Umfuellen noetig ist?",
|
|
||||||
},
|
|
||||||
},
|
|
||||||
{
|
|
||||||
ID: "HP2204", NameDE: "Reizung/Veraetzung der Atemwege durch Reinigungs-Aerosole/Daempfe", NameEN: "Respiratory irritation from cleaning aerosols/vapours",
|
|
||||||
RequiredComponentTags: []string{"dom_warewashing", "corrosive_chemical"},
|
|
||||||
GeneratedHazardCats: []string{"chemical_risk"},
|
|
||||||
SuggestedMeasureIDs: []string{"M2205", "M2203", "M2204"},
|
|
||||||
Priority: 82,
|
|
||||||
ApplicableLifecycles: []string{"normal_operation", "maintenance"},
|
|
||||||
ScenarioDE: "Aerosole und Daempfe der Reinigungschemie (insbesondere beim Oeffnen kurz nach dem Spuelgang oder bei der Entkalkung mit Saeure) gelangen in die Atemzone und reizen Atemwege und Schleimhaeute.",
|
|
||||||
TriggerDE: "Oeffnen bei laufender/heisser Chemie; Entkalkung mit Saeure; unzureichende Lueftung des Aufstellbereichs.",
|
|
||||||
HarmDE: "Reizung von Atemwegen, Augen und Schleimhaeuten; bei Saeure-/Laugen-Vermischung gefaehrliche Gase.",
|
|
||||||
AffectedDE: "Bedienpersonal, Reinigungspersonal",
|
|
||||||
ZoneDE: "Atemzone vor der Spuelkammer, Aufstellbereich",
|
|
||||||
ISO12100Section: "6.2.4",
|
|
||||||
DefaultSeverity: 2, DefaultExposure: 2,
|
|
||||||
ClarificationQuestionsDE: []string{
|
|
||||||
"Ist der Aufstellbereich ausreichend be-/entlueftet (Kuechenlueftung)?",
|
|
||||||
"Wird in der BA vor dem Vermischen von Reiniger und Entkalker/Saeure gewarnt?",
|
|
||||||
},
|
|
||||||
},
|
|
||||||
{
|
|
||||||
ID: "HP2205", NameDE: "Quetschen der Finger an der Tuer/Haube", NameEN: "Finger crushing at the door/hood",
|
|
||||||
RequiredComponentTags: []string{"dom_warewashing", "access_door"},
|
|
||||||
GeneratedHazardCats: []string{"mechanical_hazard"},
|
|
||||||
SuggestedMeasureIDs: []string{"M2206", "M003", "M2208"},
|
|
||||||
Priority: 78,
|
|
||||||
ApplicableLifecycles: []string{"normal_operation"},
|
|
||||||
ScenarioDE: "Beim Schliessen der Tuer bzw. Absenken der Haube werden Finger zwischen Tuer/Haube und Gehaeuse gequetscht.",
|
|
||||||
TriggerDE: "Greifen in den Schliessbereich beim Schliessen; hohe Schliesskraft der Haube; scharfe Kanten.",
|
|
||||||
HarmDE: "Quetschung und Prellung der Finger.",
|
|
||||||
AffectedDE: "Bedienpersonal (Spuelkraft)",
|
|
||||||
ZoneDE: "Tuer-/Haubenkante, Schliessbereich",
|
|
||||||
ISO12100Section: "6.2.3",
|
|
||||||
DefaultSeverity: 1, DefaultExposure: 3,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
ID: "HP2206", NameDE: "Ausrutschen auf nassem Boden (Wasseraustritt/Leckage)", NameEN: "Slipping on wet floor (water leakage)",
|
|
||||||
RequiredComponentTags: []string{"dom_warewashing"},
|
|
||||||
GeneratedHazardCats: []string{"mechanical_hazard"},
|
|
||||||
SuggestedMeasureIDs: []string{"M2207", "M538", "M2208"},
|
|
||||||
Priority: 76,
|
|
||||||
ApplicableLifecycles: []string{"normal_operation", "cleaning", "maintenance"},
|
|
||||||
ScenarioDE: "Aus der Spuelmaschine austretendes Wasser (Beschickung, Tuer oeffnen, Leckage, Tankwasserwechsel) macht den Boden im Aufstellbereich rutschig; der Bediener rutscht aus.",
|
|
||||||
TriggerDE: "Wasseraustritt beim Oeffnen/Beschicken; undichter Ablauf; fehlender Bodenablauf.",
|
|
||||||
HarmDE: "Sturz mit Prellungen, Knochenbruechen oder Kopfaufprall.",
|
|
||||||
AffectedDE: "Bedienpersonal, Reinigungspersonal",
|
|
||||||
ZoneDE: "Aufstell- und Bedienbereich der Spuelmaschine",
|
|
||||||
ISO12100Section: "6.3.5.6",
|
|
||||||
DefaultSeverity: 2, DefaultExposure: 3,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
ID: "HP2207", NameDE: "Rueckfluss / Kontamination des Trinkwassers", NameEN: "Backflow / potable-water contamination",
|
|
||||||
RequiredComponentTags: []string{"dom_warewashing", "backflow_risk"},
|
|
||||||
GeneratedHazardCats: []string{"material_environmental"},
|
|
||||||
SuggestedMeasureIDs: []string{"M2209"},
|
|
||||||
Priority: 84,
|
|
||||||
ApplicableLifecycles: []string{"normal_operation"},
|
|
||||||
ScenarioDE: "Verschmutztes Spuel- oder Chemiewasser wird ueber den Frischwasseranschluss in das Trinkwassernetz zurueckgesaugt und kontaminiert es (Ruecksaugen bei Unterdruck im Netz).",
|
|
||||||
TriggerDE: "Fehlender oder defekter Rueckflussverhinderer/Systemtrenner; Unterdruck im Trinkwassernetz; kein freier Auslauf.",
|
|
||||||
HarmDE: "Gesundheitsgefaehrdung Dritter durch kontaminiertes Trinkwasser (Chemie, Keime).",
|
|
||||||
AffectedDE: "Verbraucher am selben Trinkwassernetz, Betreiber",
|
|
||||||
ZoneDE: "Frischwasseranschluss, Wasserzulauf",
|
|
||||||
ISO12100Section: "6.2.4",
|
|
||||||
DefaultSeverity: 3, DefaultExposure: 2,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
ID: "HP2208", NameDE: "Schnittverletzung an scharfen Kanten/Sieben", NameEN: "Cut injury on sharp edges/screens",
|
|
||||||
RequiredComponentTags: []string{"dom_warewashing", "sharp_edge"},
|
|
||||||
GeneratedHazardCats: []string{"mechanical_hazard"},
|
|
||||||
SuggestedMeasureIDs: []string{"M003"},
|
|
||||||
Priority: 74,
|
|
||||||
ApplicableLifecycles: []string{"cleaning", "maintenance"},
|
|
||||||
ScenarioDE: "Schneiden an scharfen Blechkanten, Sieben oder dem Ablaufpumpen-Laufrad beim Reinigen oder Eingreifen in die Spuelkammer.",
|
|
||||||
TriggerDE: "Entnehmen/Reinigen der Siebe; Eingreifen an scharfen Kanten ohne Schutzhandschuhe.",
|
|
||||||
HarmDE: "Schnittwunden an Haenden und Fingern.",
|
|
||||||
AffectedDE: "Reinigungspersonal, Bedienpersonal",
|
|
||||||
ZoneDE: "Zugaengliche Kanten, Siebe, Spuelkammer, Ablaufpumpe",
|
|
||||||
ISO12100Section: "6.2.2.1",
|
|
||||||
DefaultSeverity: 1, DefaultExposure: 3,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
ID: "HP2209", NameDE: "Unerwarteter Wiederanlauf bei Wartung/Reinigung", NameEN: "Unexpected restart during maintenance/cleaning",
|
|
||||||
RequiredComponentTags: []string{"dom_warewashing", "programmable"},
|
|
||||||
RequiredLifecycles: []string{"maintenance", "cleaning", "fault_clearing"},
|
|
||||||
GeneratedHazardCats: []string{"safety_function_failure"},
|
|
||||||
SuggestedMeasureIDs: []string{"M042"},
|
|
||||||
Priority: 80,
|
|
||||||
ApplicableLifecycles: []string{"maintenance", "cleaning"},
|
|
||||||
ScenarioDE: "Waehrend Wartung oder Reinigung laeuft die Maschine durch fehlende Freischaltung (LOTO) oder automatischen Wiederanlauf unerwartet an (Pumpe, Spuelgang).",
|
|
||||||
TriggerDE: "Kein Freischalten/Sichern gegen Wiedereinschalten; automatischer Wiederanlauf nach Netzunterbrechung.",
|
|
||||||
HarmDE: "Verbruehung, Quetschen oder elektrischer Schlag durch unerwartet anlaufende Maschine.",
|
|
||||||
AffectedDE: "Wartungspersonal, Reinigungspersonal",
|
|
||||||
ZoneDE: "Gesamte Maschine, Pumpe, Antriebe",
|
|
||||||
ISO12100Section: "6.2.11.4",
|
|
||||||
DefaultSeverity: 3, DefaultExposure: 2,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,112 +0,0 @@
|
|||||||
package iace
|
|
||||||
|
|
||||||
import "testing"
|
|
||||||
|
|
||||||
// firedSet runs the engine for the given custom tags and returns the set of
|
|
||||||
// fired pattern IDs.
|
|
||||||
func firedSet(customTags []string) map[string]bool {
|
|
||||||
engine := NewPatternEngine()
|
|
||||||
out := engine.Match(MatchInput{CustomTags: customTags})
|
|
||||||
fired := make(map[string]bool, len(out.MatchedPatterns))
|
|
||||||
for _, m := range out.MatchedPatterns {
|
|
||||||
fired[m.PatternID] = true
|
|
||||||
}
|
|
||||||
return fired
|
|
||||||
}
|
|
||||||
|
|
||||||
// A warewashing narrative emits these capability + functional tags.
|
|
||||||
var warewashingTags = []string{
|
|
||||||
"dom_warewashing", "steam_emission", "hot_water", "high_temperature",
|
|
||||||
"corrosive_chemical", "access_door", "rotating_part",
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestWarewashing_PatternsFireForDishwasher(t *testing.T) {
|
|
||||||
fired := firedSet(warewashingTags)
|
|
||||||
want := []string{"HP2200", "HP2201", "HP2202", "HP2203", "HP2204", "HP2205", "HP2206"}
|
|
||||||
for _, id := range want {
|
|
||||||
if !fired[id] {
|
|
||||||
t.Errorf("expected warewashing pattern %s to fire for a dishwasher, but it did not", id)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestWarewashing_PatternsDoNotLeakIntoOtherMachines(t *testing.T) {
|
|
||||||
// A machine with thermal + electrical + chemical capability but NOT a
|
|
||||||
// dishwasher must never produce warewashing hazards (dom_warewashing gate).
|
|
||||||
fired := firedSet([]string{"high_temperature", "electrical_part", "chemical_risk", "rotating_part", "moving_part"})
|
|
||||||
for _, id := range []string{"HP2200", "HP2201", "HP2202", "HP2203", "HP2204", "HP2205", "HP2206"} {
|
|
||||||
if fired[id] {
|
|
||||||
t.Errorf("warewashing pattern %s leaked into a non-dishwasher machine", id)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestWarewashing_WeldingAndGlueDoNotLeakIntoDishwasher(t *testing.T) {
|
|
||||||
// The gate-term additions must stop the welding/flame/glue burn patterns
|
|
||||||
// from firing for a dishwasher (they previously leaked via high_temperature
|
|
||||||
// / electrical_part). dom_welding/dom_flame/dom_glue are absent here.
|
|
||||||
fired := firedSet(warewashingTags)
|
|
||||||
leak := map[string]string{
|
|
||||||
"HP530": "Lichtbogen-Verbrennung (Schweissen)",
|
|
||||||
"HP532": "Schweissrauch",
|
|
||||||
"HP533": "Brand durch Schweissfunken (Schweissen)",
|
|
||||||
}
|
|
||||||
for id, name := range leak {
|
|
||||||
if fired[id] {
|
|
||||||
t.Errorf("cross-domain pattern %s (%s) leaked into a dishwasher", id, name)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestWarewashing_MeasureIDsExist(t *testing.T) {
|
|
||||||
lib := GetProtectiveMeasureLibrary()
|
|
||||||
have := make(map[string]bool, len(lib))
|
|
||||||
for _, m := range lib {
|
|
||||||
have[m.ID] = true
|
|
||||||
}
|
|
||||||
for _, p := range GetWarewashingPatterns() {
|
|
||||||
for _, mid := range p.SuggestedMeasureIDs {
|
|
||||||
if !have[mid] {
|
|
||||||
t.Errorf("pattern %s references measure %s which is not in the library", p.ID, mid)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestWarewashing_NarrativeEmitsTags(t *testing.T) {
|
|
||||||
// Closes the loop: a realistic dishwasher description must emit the tags
|
|
||||||
// the warewashing patterns gate on (otherwise the patterns are dead).
|
|
||||||
narrative := "Gewerbliche Untertisch-Geschirrspuelmaschine mit Heisswasser-Boiler " +
|
|
||||||
"und Nachspuelung ca. 85 Grad C, Spuelpumpe mit rotierenden Spuelfeldern, " +
|
|
||||||
"Dampf-/Wrasenabgabe beim Oeffnen, Reiniger und Klarspueler ueber Dosiergeraet, " +
|
|
||||||
"Tuer mit Sicherheitsschalter, Eingreifen in die Spuelkammer."
|
|
||||||
res := ParseNarrative(narrative, "Gewerbliche Geschirrspuelmaschine")
|
|
||||||
got := make(map[string]bool, len(res.CustomTags))
|
|
||||||
for _, tag := range res.CustomTags {
|
|
||||||
got[tag] = true
|
|
||||||
}
|
|
||||||
for _, want := range []string{"dom_warewashing", "steam_emission", "hot_water", "corrosive_chemical", "access_door", "rotating_part"} {
|
|
||||||
if !got[want] {
|
|
||||||
t.Errorf("narrative did not emit expected tag %q (got %v)", want, res.CustomTags)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
// And it must NOT emit any welding/flame/glue domain that would re-open leaks.
|
|
||||||
for _, bad := range []string{"dom_welding", "dom_flame", "dom_glue"} {
|
|
||||||
if got[bad] {
|
|
||||||
t.Errorf("dishwasher narrative unexpectedly emitted cross-domain tag %q", bad)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestWarewashing_NewMeasuresPresent(t *testing.T) {
|
|
||||||
lib := GetProtectiveMeasureLibrary()
|
|
||||||
have := make(map[string]bool, len(lib))
|
|
||||||
for _, m := range lib {
|
|
||||||
have[m.ID] = true
|
|
||||||
}
|
|
||||||
for _, mid := range []string{"M2200", "M2201", "M2202", "M2203", "M2204", "M2205", "M2206", "M2207", "M2208"} {
|
|
||||||
if !have[mid] {
|
|
||||||
t.Errorf("expected warewashing measure %s to be registered in the library", mid)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -88,28 +88,6 @@ func GetKeywordDictionary() []KeywordEntry {
|
|||||||
{Keywords: []string{"folienwickler", "wickelmaschine", "konfektioniermaschine", "folienverpackung", "wellpappe"}, ExtraTags: []string{"dom_converting"}},
|
{Keywords: []string{"folienwickler", "wickelmaschine", "konfektioniermaschine", "folienverpackung", "wellpappe"}, ExtraTags: []string{"dom_converting"}},
|
||||||
{Keywords: []string{"bergbau", "untertage", "tunnelbau", "off-grid"}, ExtraTags: []string{"dom_remote"}},
|
{Keywords: []string{"bergbau", "untertage", "tunnelbau", "off-grid"}, ExtraTags: []string{"dom_remote"}},
|
||||||
{Keywords: []string{"asbest", "asbestsanierung", "asbestexposition"}, ExtraTags: []string{"dom_asbestos"}},
|
{Keywords: []string{"asbest", "asbestsanierung", "asbestexposition"}, ExtraTags: []string{"dom_asbestos"}},
|
||||||
{Keywords: []string{"gasbrenner", "brennerbetrieb", "offene flamme", "flammhaert", "abflammen", "flammrichten"}, ExtraTags: []string{"dom_flame"}},
|
|
||||||
{Keywords: []string{"heissleim", "heissleimanlage", "schmelzkleber", "schmelzklebstoff", "klebstoffschmelzer", "leimwerk"}, ExtraTags: []string{"dom_glue"}},
|
|
||||||
|
|
||||||
// ── Gewerbliche Spuelmaschine / Warewashing ──────────────────────
|
|
||||||
// dom_warewashing gates the warewashing-specific patterns
|
|
||||||
// (hazard_patterns_warewashing.go) so they never leak into other
|
|
||||||
// machine classes. The functional tags (hot_water, steam_emission,
|
|
||||||
// corrosive_chemical, access_door) are the within-domain triggers.
|
|
||||||
{Keywords: []string{"spuelmaschine", "geschirrspuelmaschine", "geschirrspueler", "haubenspuelmaschine", "untertischspuelmaschine", "korbspuelmaschine", "bandspuelmaschine", "glaeserspuelmaschine", "bistrospuelmaschine", "warewashing", "dishwasher"}, ExtraTags: []string{"dom_warewashing"}},
|
|
||||||
{Keywords: []string{"heisswasser", "nachspuelung", "nachspueltemperatur", "spuelgang", "spuelzyklus", "thermostopp", "thermostop"}, ExtraTags: []string{"hot_water", "high_temperature"}},
|
|
||||||
{Keywords: []string{"dampf", "wrasen", "schwaden", "brueden"}, ExtraTags: []string{"steam_emission", "high_temperature"}},
|
|
||||||
{Keywords: []string{"boiler", "spuelboiler", "nachspuelboiler", "tankheiz", "boilerheiz"}, ComponentIDs: []string{"C094"}, ExtraTags: []string{"heating_element", "high_temperature"}},
|
|
||||||
{Keywords: []string{"reiniger", "klarspueler", "spuelmittel", "reinigungsmittel", "reinigerkonzentrat", "spuelchemie", "dosiergeraet", "dosierpumpe", "sauglanze", "entkalker"}, ExtraTags: []string{"corrosive_chemical"}},
|
|
||||||
// Spuelarm/Spuelfeld emit only the rotating_part capability tag. They are
|
|
||||||
// NOT mapped to a library component — C004 is a "Drehtisch" (rotary table)
|
|
||||||
// and that mislabels the spray arm. Keyword->component must be semantically
|
|
||||||
// honest (generic hygiene; surfaced by the warewashing GT).
|
|
||||||
{Keywords: []string{"spuelarm", "spuelfeld", "wascharm", "spruehfeld"}, ExtraTags: []string{"rotating_part"}},
|
|
||||||
{Keywords: []string{"spuelkammer", "spueltuer", "geraetetuer", "haubentuer", "klapptuer"}, ExtraTags: []string{"access_door"}},
|
|
||||||
// Frischwasseranschluss an das Trinkwassernetz -> Rueckfluss/Ruecksaug-Risiko (EN 1717).
|
|
||||||
{Keywords: []string{"rueckfluss", "rueckflussverhinderer", "ruecksaug", "trinkwasser", "frischwasseranschluss", "systemtrenner"}, ExtraTags: []string{"backflow_risk"}},
|
|
||||||
{Keywords: []string{"scharfe kante", "scharfkant", "blechkante", "scharfe blechkante", "sieb", "siebe"}, ExtraTags: []string{"sharp_edge"}},
|
|
||||||
// Ghost-Closure (Emit-Seite): macht die 34 toten Required-Tags
|
// Ghost-Closure (Emit-Seite): macht die 34 toten Required-Tags
|
||||||
// emittierbar, jeweils NUR via domaenenspezifische Keywords -> die 120
|
// emittierbar, jeweils NUR via domaenenspezifische Keywords -> die 120
|
||||||
// Ghost-Patterns feuern wieder, aber nur fuer ihre echte Maschine (kein
|
// Ghost-Patterns feuern wieder, aber nur fuer ihre echte Maschine (kein
|
||||||
@@ -204,12 +182,6 @@ func GetKeywordDictionary() []KeywordEntry {
|
|||||||
{Keywords: []string{"lichtgitter", "lichtvorhang", "light curtain", "light grid"}, ComponentIDs: []string{"C102"}, ExtraTags: []string{"safety_device"}},
|
{Keywords: []string{"lichtgitter", "lichtvorhang", "light curtain", "light grid"}, ComponentIDs: []string{"C102"}, ExtraTags: []string{"safety_device"}},
|
||||||
{Keywords: []string{"sicherheitsschalter", "safety switch"}, ComponentIDs: []string{"C104"}, ExtraTags: []string{"safety_device", "interlocked"}},
|
{Keywords: []string{"sicherheitsschalter", "safety switch"}, ComponentIDs: []string{"C104"}, ExtraTags: []string{"safety_device", "interlocked"}},
|
||||||
{Keywords: []string{"zuhaltung", "guard locking", "interlock"}, ComponentIDs: []string{"C105"}, ExtraTags: []string{"safety_device", "interlocked"}},
|
{Keywords: []string{"zuhaltung", "guard locking", "interlock"}, ComponentIDs: []string{"C105"}, ExtraTags: []string{"safety_device", "interlocked"}},
|
||||||
// interlocked_enclosure signals that moving parts are inaccessible behind a
|
|
||||||
// guard that is monitored/locked — feeds the GuardableByEnclosure re-scoping
|
|
||||||
// (contact/entanglement becomes a maintenance/guard-open hazard, not a
|
|
||||||
// normal-operation one). Emitted only by explicit "interlocked door/guard"
|
|
||||||
// vocabulary so it does not trigger for machines with exposed motion.
|
|
||||||
{Keywords: []string{"tuer mit sicherheitsschalter", "verriegelte tuer", "verriegelte haube", "verriegelte einhausung", "sicherheitstuer", "tuerverriegelung", "haube mit sicherheitsschalter"}, ExtraTags: []string{"interlocked_enclosure"}},
|
|
||||||
{Keywords: []string{"zweihand", "two-hand", "zweihandschaltung"}, ComponentIDs: []string{"C106"}, ExtraTags: []string{"safety_device", "two_hand_control_required"}},
|
{Keywords: []string{"zweihand", "two-hand", "zweihandschaltung"}, ComponentIDs: []string{"C106"}, ExtraTags: []string{"safety_device", "two_hand_control_required"}},
|
||||||
{Keywords: []string{"schaltmatte", "safety mat"}, ComponentIDs: []string{"C108"}, ExtraTags: []string{"safety_device"}},
|
{Keywords: []string{"schaltmatte", "safety mat"}, ComponentIDs: []string{"C108"}, ExtraTags: []string{"safety_device"}},
|
||||||
{Keywords: []string{"seilzug", "pull wire"}, ComponentIDs: []string{"C109"}, ExtraTags: []string{"safety_device"}},
|
{Keywords: []string{"seilzug", "pull wire"}, ComponentIDs: []string{"C109"}, ExtraTags: []string{"safety_device"}},
|
||||||
@@ -222,9 +194,7 @@ func GetKeywordDictionary() []KeywordEntry {
|
|||||||
|
|
||||||
// ── Absaugung / Umwelt ──────────────────────────────────────────
|
// ── Absaugung / Umwelt ──────────────────────────────────────────
|
||||||
{Keywords: []string{"absaug", "extraction", "abscheider"}, ComponentIDs: []string{"C124"}, ExtraTags: []string{"noise_source"}},
|
{Keywords: []string{"absaug", "extraction", "abscheider"}, ComponentIDs: []string{"C124"}, ExtraTags: []string{"noise_source"}},
|
||||||
// "filteranlage" only — bare "filter" falsely mapped any filter (Laugen-,
|
{Keywords: []string{"filter", "filteranlage"}, ComponentIDs: []string{"C124"}, ExtraTags: []string{}},
|
||||||
// Wasser-, Oel-, Netzfilter) to the oil-mist extractor C124.
|
|
||||||
{Keywords: []string{"filteranlage"}, ComponentIDs: []string{"C124"}, ExtraTags: []string{}},
|
|
||||||
|
|
||||||
// ── IT / Netzwerk ───────────────────────────────────────────────
|
// ── IT / Netzwerk ───────────────────────────────────────────────
|
||||||
{Keywords: []string{"switch", "netzwerk"}, ComponentIDs: []string{"C111"}, ExtraTags: []string{"networked"}},
|
{Keywords: []string{"switch", "netzwerk"}, ComponentIDs: []string{"C111"}, ExtraTags: []string{"networked"}},
|
||||||
@@ -253,10 +223,7 @@ func GetKeywordDictionary() []KeywordEntry {
|
|||||||
{Keywords: []string{"biege", "bend"}, ComponentIDs: []string{"C019"}, ExtraTags: []string{"high_force"}},
|
{Keywords: []string{"biege", "bend"}, ComponentIDs: []string{"C019"}, ExtraTags: []string{"high_force"}},
|
||||||
{Keywords: []string{"stanz", "stamp", "punch"}, ComponentIDs: []string{"C018"}, ExtraTags: []string{"high_force", "crush_point"}},
|
{Keywords: []string{"stanz", "stamp", "punch"}, ComponentIDs: []string{"C018"}, ExtraTags: []string{"high_force", "crush_point"}},
|
||||||
{Keywords: []string{"heiz", "heater", "heating"}, ComponentIDs: []string{"C094"}, EnergyIDs: []string{"EN06"}, ExtraTags: []string{"high_temperature"}},
|
{Keywords: []string{"heiz", "heater", "heating"}, ComponentIDs: []string{"C094"}, EnergyIDs: []string{"EN06"}, ExtraTags: []string{"high_temperature"}},
|
||||||
// Cooling UNIT only — not the bare adjectives "kuehl"/"cool", which falsely
|
{Keywords: []string{"kuehl", "cool"}, ComponentIDs: []string{"C095"}, ExtraTags: []string{}},
|
||||||
// matched product-variant names ("Cool-Ausfuehrung") and outputs ("kuehle
|
|
||||||
// Glaeser"). Keyword->component must name an actual component.
|
|
||||||
{Keywords: []string{"kuehlaggregat", "kuehlanlage", "kuehler", "kaeltemaschine", "chiller", "rueckkuehl"}, ComponentIDs: []string{"C095"}, ExtraTags: []string{}},
|
|
||||||
{Keywords: []string{"luefter", "fan", "geblaese"}, ComponentIDs: []string{"C096"}, ExtraTags: []string{"rotating_part", "noise_source"}},
|
{Keywords: []string{"luefter", "fan", "geblaese"}, ComponentIDs: []string{"C096"}, ExtraTags: []string{"rotating_part", "noise_source"}},
|
||||||
{Keywords: []string{"spannvorrichtung", "fixture", "clamp"}, ComponentIDs: []string{"C100"}, ExtraTags: []string{"clamping_part"}},
|
{Keywords: []string{"spannvorrichtung", "fixture", "clamp"}, ComponentIDs: []string{"C100"}, ExtraTags: []string{"clamping_part"}},
|
||||||
|
|
||||||
|
|||||||
@@ -22,7 +22,6 @@ func GetProtectiveMeasureLibrary() []ProtectiveMeasureEntry {
|
|||||||
all = append(all, getGTBremseMeasures()...) // GT-Bremse-Coverage-Gaps (M483-M522)
|
all = append(all, getGTBremseMeasures()...) // GT-Bremse-Coverage-Gaps (M483-M522)
|
||||||
all = append(all, GetCRAMeasures()...) // CRA / DIN EN 40000-1-2 cyber-resilience (M540-M548)
|
all = append(all, GetCRAMeasures()...) // CRA / DIN EN 40000-1-2 cyber-resilience (M540-M548)
|
||||||
all = append(all, getLiftEndstopMeasures()...) // Lift/hoist endstop (M600-M604) — bridges OSHA MD library
|
all = append(all, getLiftEndstopMeasures()...) // Lift/hoist endstop (M600-M604) — bridges OSHA MD library
|
||||||
all = append(all, getWarewashingMeasures()...) // Commercial dishwasher (M2200-M2208) — scald/chemical/door/slip
|
|
||||||
return all
|
return all
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -1,75 +0,0 @@
|
|||||||
package iace
|
|
||||||
|
|
||||||
// getWarewashingMeasures returns protective measures for commercial warewashing
|
|
||||||
// machines (gewerbliche Geschirrspuelmaschinen): hot-water/steam scalding,
|
|
||||||
// hot surfaces, corrosive cleaning chemicals, door pinch and wet-floor slip.
|
|
||||||
// They complement the generic thermal/mechanical/material measures with the
|
|
||||||
// machine-specific controls a Fachmann expects for this product class.
|
|
||||||
//
|
|
||||||
// M-ID range: M2200-M2208. Norm identifiers only (facts) — no norm text is
|
|
||||||
// reproduced (DIN/Beuth license). Lead standard: EN 60335-2-58 (safety of
|
|
||||||
// commercial electric dishwashing machines).
|
|
||||||
func getWarewashingMeasures() []ProtectiveMeasureEntry {
|
|
||||||
return []ProtectiveMeasureEntry{
|
|
||||||
{ID: "M2200", ReductionType: "design", SubType: "interlock",
|
|
||||||
Name: "Tuer-/Haubenverriegelung beendet Spuelgang vor dem Oeffnen",
|
|
||||||
Description: "Die Tuer bzw. Haube ist so mit der Steuerung verriegelt, dass beim Oeffnen Spuelpumpe und Nachspuelung sofort abschalten und ein Oeffnen erst nach Programmende (bzw. nach Abbau des Restdampfs) freigegeben wird. Verhindert den Schwall aus Heisswasser/Wrasen und den Kontakt mit noch rotierenden Spuelfeldern.",
|
|
||||||
HazardCategory: "thermal",
|
|
||||||
Examples: []string{"Tuerkontaktschalter schaltet Pumpe + Heizung beim Oeffnen ab", "Rastposition mit Restdampf-Verzoegerung vor Freigabe"},
|
|
||||||
NormReferences: []string{"EN 60335-2-58", "EN ISO 12100 — Inhaerent sichere Konstruktion"}},
|
|
||||||
{ID: "M2201", ReductionType: "design", SubType: "thermal",
|
|
||||||
Name: "Wrasen-/Dampfreduzierung (Kondensations- / Waermerueckgewinnungssystem)",
|
|
||||||
Description: "Der beim Oeffnen austretende Wrasen wird durch ein Kondensations- bzw. Waermerueckgewinnungssystem reduziert, sodass beim Entnehmen kein gefaehrlicher Dampfschwall entsteht. Senkt zugleich die Restwaerme- und Feuchtebelastung am Arbeitsplatz.",
|
|
||||||
HazardCategory: "thermal",
|
|
||||||
Examples: []string{"Umluft-Waermerueckgewinnung reduziert austretenden Wrasen", "Kondensationshaube ueber der Spuelkammer"},
|
|
||||||
NormReferences: []string{"EN 60335-2-58"}},
|
|
||||||
{ID: "M2202", ReductionType: "protection", SubType: "monitoring",
|
|
||||||
Name: "Thermostop / Temperaturueberwachung von Boiler und Tank",
|
|
||||||
Description: "Boiler- und Tanktemperatur werden ueberwacht; ein Thermostop gibt den naechsten Schritt erst frei, wenn die Solltemperatur erreicht ist, und begrenzt die maximale Nachspueltemperatur. Schuetzt vor Verbruehung durch unkontrolliert heisses Nachspuelwasser.",
|
|
||||||
HazardCategory: "thermal",
|
|
||||||
Examples: []string{"Temperatursensor in Boiler und Tank mit Abschaltgrenze", "Thermostop-Funktion im Spuelprogramm"},
|
|
||||||
NormReferences: []string{"EN 60335-2-58", "EN ISO 13732-1"}},
|
|
||||||
{ID: "M2203", ReductionType: "design", SubType: "containment",
|
|
||||||
Name: "Geschlossenes Dosiersystem mit Sauglanzen und Niveauueberwachung",
|
|
||||||
Description: "Reiniger und Klarspueler werden ausschliesslich ueber ein geschlossenes Dosiersystem mit Sauglanzen aus dem Originalgebinde gefoerdert (Niveau-Ueberwachung statt Umfuellen). Direkter Haut-/Augenkontakt mit dem aetzenden Konzentrat beim Nachfuellen wird konstruktiv vermieden.",
|
|
||||||
HazardCategory: "material_environmental",
|
|
||||||
Examples: []string{"Sauglanze mit Leermeldung im Reiniger-Kanister", "Kein Umfuellen — Gebindewechsel ohne offenen Chemiekontakt"},
|
|
||||||
NormReferences: []string{"EN 60335-2-58", "Verordnung (EG) Nr. 1272/2008 (CLP/GHS)"}},
|
|
||||||
{ID: "M2204", ReductionType: "information", SubType: "ppe",
|
|
||||||
Name: "PSA (Augen-/Hautschutz) + GHS-Kennzeichnung und Sicherheitsdatenblatt",
|
|
||||||
Description: "Fuer Handhabung, Gebindewechsel und Entkalkung werden Augen- und Handschutz vorgeschrieben; Reiniger/Klarspueler/Entkalker sind GHS-gekennzeichnet und das Sicherheitsdatenblatt liegt am Geraet vor. Stellt die sichere Handhabung der aetzenden Konzentrate sicher.",
|
|
||||||
HazardCategory: "material_environmental",
|
|
||||||
Examples: []string{"Schutzbrille + chemikalienbestaendige Handschuhe bei Gebindewechsel", "GHS-Etikett und SDB im Chemikalienschrank am Geraet"},
|
|
||||||
NormReferences: []string{"Verordnung (EG) Nr. 1272/2008 (CLP/GHS)", "TRGS 500"}},
|
|
||||||
{ID: "M2205", ReductionType: "protection", SubType: "ventilation",
|
|
||||||
Name: "Be-/Entlueftung bzw. geschlossene Haube gegen Chemie-Aerosole und Wrasen",
|
|
||||||
Description: "Der Aufstellbereich ist ausreichend be- und entlueftet bzw. die Spuelkammer bleibt waehrend des Programms geschlossen, sodass Reinigungs-Aerosole und heisser Wrasen nicht in die Atemzone des Bedieners gelangen.",
|
|
||||||
HazardCategory: "material_environmental",
|
|
||||||
Examples: []string{"Kuechenlueftung ueber dem Spuelbereich", "Programmstart nur bei geschlossener Haube"},
|
|
||||||
NormReferences: []string{"EN 60335-2-58", "TRGS 500"}},
|
|
||||||
{ID: "M2206", ReductionType: "design", SubType: "geometry",
|
|
||||||
Name: "Tuerkanten mit geringer Schliesskraft / Einklemmschutz",
|
|
||||||
Description: "Die Tuer-/Haubenmechanik ist so gestaltet (gefuehrte Bewegung, begrenzte Schliesskraft, abgerundete Kanten), dass beim Schliessen keine Finger gequetscht werden.",
|
|
||||||
HazardCategory: "mechanical",
|
|
||||||
Examples: []string{"Gefuehrte Haube mit gedaempfter Schliessbewegung", "Abgerundete Tuerkanten ohne Quetschspalt"},
|
|
||||||
NormReferences: []string{"EN 60335-2-58", "EN ISO 12100 — Geometrie und Anordnung"}},
|
|
||||||
{ID: "M2207", ReductionType: "design", SubType: "environment",
|
|
||||||
Name: "Rutschhemmender Bodenbelag + Ablauf/Leckagewanne im Aufstellbereich",
|
|
||||||
Description: "Im Aufstell- und Bedienbereich der Spuelmaschine sorgen rutschhemmender Bodenbelag und ein definierter Ablauf bzw. eine Leckagewanne dafuer, dass austretendes Wasser nicht zur Sturzgefahr wird.",
|
|
||||||
HazardCategory: "mechanical",
|
|
||||||
Examples: []string{"Rutschhemmender Industrieboden (Bewertungsgruppe R11/R12)", "Bodenablauf bzw. Leckagewanne unter dem Geraet"},
|
|
||||||
NormReferences: []string{"ASR A1.5/1,2", "DGUV Regel 108-003"}},
|
|
||||||
{ID: "M2208", ReductionType: "information", SubType: "signage",
|
|
||||||
Name: "Warnhinweis heisser Dampf/Heisswasser — Tuer erst nach Programmende oeffnen",
|
|
||||||
Description: "Am Geraet und in der Betriebsanleitung wird vor heissem Dampf und Heisswasser gewarnt und das Oeffnen der Tuer erst nach Programmende mit vorsichtigem Anheben vorgeschrieben. Sprachneutrale Piktogramme ergaenzen den Hinweis.",
|
|
||||||
HazardCategory: "general",
|
|
||||||
Examples: []string{"Warnpiktogramm 'Heisser Dampf' an der Tuer", "BA-Hinweis 'Tuer nach Programmende langsam oeffnen'"},
|
|
||||||
NormReferences: []string{"ISO 7010", "EN 60335-2-58"}},
|
|
||||||
{ID: "M2209", ReductionType: "design", SubType: "containment",
|
|
||||||
Name: "Rueckflussverhinderer / Systemtrenner nach EN 1717",
|
|
||||||
Description: "Der Frischwasseranschluss ist durch einen Rueckflussverhinderer bzw. Systemtrenner der passenden Schutzklasse oder durch einen freien Auslauf gegen Ruecksaugen verschmutzten Wassers in das Trinkwassernetz gesichert.",
|
|
||||||
HazardCategory: "material_environmental",
|
|
||||||
Examples: []string{"Systemtrenner Typ BA nach EN 1717", "Freier Auslauf Typ AB ueber dem hoechsten Wasserstand"},
|
|
||||||
NormReferences: []string{"EN 1717", "EN 60335-2-58"}},
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -46,20 +46,6 @@ var domainGateTerms = map[string]string{
|
|||||||
"widerstandsschweiss": "dom_welding", "lichtbogenschweiss": "dom_welding",
|
"widerstandsschweiss": "dom_welding", "lichtbogenschweiss": "dom_welding",
|
||||||
"schutzgasschweiss": "dom_welding", "punktschweiss": "dom_welding",
|
"schutzgasschweiss": "dom_welding", "punktschweiss": "dom_welding",
|
||||||
"schweisselektrod": "dom_welding", "elektrodenspalt": "dom_welding",
|
"schweisselektrod": "dom_welding", "elektrodenspalt": "dom_welding",
|
||||||
// Schweissen — Oberflaechenformen die bisher ungegatet leakten (z.B. in
|
|
||||||
// thermische Hazards einer Spuelmaschine ueber high_temperature/electrical_part)
|
|
||||||
"schweissarbeitsplatz": "dom_welding", "schweissfunke": "dom_welding",
|
|
||||||
"schweisshelm": "dom_welding", "schweisserschutz": "dom_welding",
|
|
||||||
"lichtbogenzone": "dom_welding", "lichtbogen-verbrennung": "dom_welding",
|
|
||||||
"schweissrauch": "dom_welding", "schweissgeraet": "dom_welding",
|
|
||||||
"schweisszone": "dom_welding", "schweissbrenner": "dom_welding",
|
|
||||||
"schweissspritzer": "dom_welding", "schweissstrom": "dom_welding",
|
|
||||||
// Offene Flamme / Brenner (Gasbrenner, Flammhaerten, Abflammen)
|
|
||||||
"offene flamme": "dom_flame", "brennerbereich": "dom_flame",
|
|
||||||
"flammenzone": "dom_flame", "gasbrenner": "dom_flame",
|
|
||||||
// Heissleim / Schmelzkleber
|
|
||||||
"heissleimanlage": "dom_glue", "klebstoffschmelzer": "dom_glue",
|
|
||||||
"heisskleber": "dom_glue", "schmelzkleber": "dom_glue",
|
|
||||||
// Solar / PV
|
// Solar / PV
|
||||||
"pv-modul": "dom_solar", "photovoltaik": "dom_solar", "pv-anlage": "dom_solar",
|
"pv-modul": "dom_solar", "photovoltaik": "dom_solar", "pv-anlage": "dom_solar",
|
||||||
"dc-steckverbindung": "dom_solar", "solarmodul": "dom_solar",
|
"dc-steckverbindung": "dom_solar", "solarmodul": "dom_solar",
|
||||||
@@ -67,7 +53,6 @@ var domainGateTerms = map[string]string{
|
|||||||
"gondel": "dom_wind", "rotorblatt": "dom_wind", "windenergieanlage": "dom_wind",
|
"gondel": "dom_wind", "rotorblatt": "dom_wind", "windenergieanlage": "dom_wind",
|
||||||
// CNC / Zerspanung
|
// CNC / Zerspanung
|
||||||
"drehmaschine": "dom_cnc", "fraesmaschine": "dom_cnc",
|
"drehmaschine": "dom_cnc", "fraesmaschine": "dom_cnc",
|
||||||
"spanende": "dom_cnc", "spanenden bearbeitung": "dom_cnc",
|
|
||||||
// Landwirtschaft
|
// Landwirtschaft
|
||||||
"maehdrescher": "dom_agri", "ballenpresse": "dom_agri", "feldhaecksler": "dom_agri",
|
"maehdrescher": "dom_agri", "ballenpresse": "dom_agri", "feldhaecksler": "dom_agri",
|
||||||
// Roll-/Fahrtreppe
|
// Roll-/Fahrtreppe
|
||||||
|
|||||||
@@ -1,44 +0,0 @@
|
|||||||
package iace
|
|
||||||
|
|
||||||
// Interlocked-enclosure model (EN ISO 14120 / EN ISO 12100).
|
|
||||||
//
|
|
||||||
// A contact or entanglement hazard from a moving part is removed during NORMAL
|
|
||||||
// operation when that part is inaccessible behind an interlocked guard. The
|
|
||||||
// hazard then remains only when the guard is open — maintenance, cleaning or
|
|
||||||
// fault clearing. Patterns flagged GuardableByEnclosure express this; a project
|
|
||||||
// emits the "interlocked_enclosure" tag (interlocked door/hood, see
|
|
||||||
// keyword_dictionary.go) to declare the guard.
|
|
||||||
//
|
|
||||||
// This is GENERIC: it applies to every enclosed machine (dishwasher spray arm,
|
|
||||||
// enclosed mixer, centrifuge ...) and is regression-safe — machines that do not
|
|
||||||
// emit interlocked_enclosure are unaffected.
|
|
||||||
|
|
||||||
const (
|
|
||||||
phaseMaintenance = "maintenance"
|
|
||||||
phaseCleaning = "cleaning"
|
|
||||||
phaseFaultClearing = "fault_clearing"
|
|
||||||
)
|
|
||||||
|
|
||||||
// suppressedByEnclosure reports whether a guardable hazard must be dropped: the
|
|
||||||
// part is enclosed AND none of the project's lifecycle phases opens the guard.
|
|
||||||
func suppressedByEnclosure(p HazardPattern, tagSet map[string]bool, lifecycles []string) bool {
|
|
||||||
if !p.GuardableByEnclosure || !tagSet["interlocked_enclosure"] || len(lifecycles) == 0 {
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
for _, lc := range lifecycles {
|
|
||||||
if lc == phaseMaintenance || lc == phaseCleaning || lc == phaseFaultClearing {
|
|
||||||
return false // guard is open in some phase → hazard remains there
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return true
|
|
||||||
}
|
|
||||||
|
|
||||||
// guardedLifecycles re-scopes a guardable hazard to the guard-open phases when
|
|
||||||
// the project declares an interlocked enclosure, so it is documented as a
|
|
||||||
// maintenance/cleaning hazard rather than a normal-operation one.
|
|
||||||
func guardedLifecycles(p HazardPattern, tagSet map[string]bool) []string {
|
|
||||||
if p.GuardableByEnclosure && tagSet["interlocked_enclosure"] {
|
|
||||||
return []string{phaseMaintenance, phaseCleaning}
|
|
||||||
}
|
|
||||||
return p.ApplicableLifecycles
|
|
||||||
}
|
|
||||||
@@ -223,7 +223,7 @@ func (e *PatternEngine) Match(input MatchInput) *MatchOutput {
|
|||||||
HumanRoles: p.HumanRoles,
|
HumanRoles: p.HumanRoles,
|
||||||
GeneratedHazardType: p.GeneratedHazardType,
|
GeneratedHazardType: p.GeneratedHazardType,
|
||||||
MatchedFailureModes: matchedFMs,
|
MatchedFailureModes: matchedFMs,
|
||||||
ApplicableLifecycles: guardedLifecycles(p, tagSet),
|
ApplicableLifecycles: p.ApplicableLifecycles,
|
||||||
SuggestedMeasureIDs: p.SuggestedMeasureIDs,
|
SuggestedMeasureIDs: p.SuggestedMeasureIDs,
|
||||||
ClarificationQuestionsDE: p.ClarificationQuestionsDE,
|
ClarificationQuestionsDE: p.ClarificationQuestionsDE,
|
||||||
ISO12100Section: p.ISO12100Section,
|
ISO12100Section: p.ISO12100Section,
|
||||||
@@ -411,11 +411,6 @@ func patternMatches(p HazardPattern, tagSet map[string]bool, input MatchInput) b
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Interlocked-enclosure gate (guardable contact/entanglement). See pattern_enclosure.go.
|
|
||||||
if suppressedByEnclosure(p, tagSet, input.LifecyclePhases) {
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
|
|
||||||
return true
|
return true
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -44,7 +44,6 @@ func collectAllPatterns() []HazardPattern {
|
|||||||
patterns = append(patterns, GetCRAPatterns()...) // HP1910-HP1918 CRA / DIN EN 40000-1-2 cyber-resilience spur
|
patterns = append(patterns, GetCRAPatterns()...) // HP1910-HP1918 CRA / DIN EN 40000-1-2 cyber-resilience spur
|
||||||
patterns = append(patterns, GetSecondaryHarmDemoPatterns()...) // HP2000-HP2001 secondary harm chain demos (Cola splitter, Pharma)
|
patterns = append(patterns, GetSecondaryHarmDemoPatterns()...) // HP2000-HP2001 secondary harm chain demos (Cola splitter, Pharma)
|
||||||
patterns = append(patterns, GetLiftEndstopPatterns()...) // HP2100-HP2102 lift body-part crush at endstops
|
patterns = append(patterns, GetLiftEndstopPatterns()...) // HP2100-HP2102 lift body-part crush at endstops
|
||||||
patterns = append(patterns, GetWarewashingPatterns()...) // HP2200-HP2206 commercial dishwasher (scald/chemical/door/slip)
|
|
||||||
patterns = applyMachineTypeOverrides(patterns) // Fill MachineTypes on legacy patterns to prevent drift
|
patterns = applyMachineTypeOverrides(patterns) // Fill MachineTypes on legacy patterns to prevent drift
|
||||||
patterns = applyDomainGates(patterns) // Capability-domain gate: stop domain-specific patterns leaking cross-machine
|
patterns = applyDomainGates(patterns) // Capability-domain gate: stop domain-specific patterns leaking cross-machine
|
||||||
return patterns
|
return patterns
|
||||||
|
|||||||
@@ -1,383 +0,0 @@
|
|||||||
{
|
|
||||||
"machine_name": "Gewerbliche Untertisch-Geschirrspuelmaschine (Winterhalter UC-M)",
|
|
||||||
"machine_description": "Untertisch-Gewerbespuelmaschine, vernetzt (Connected Wash), Heisswasser-Boiler, Spuelpumpe mit rotierenden Spuelfeldern, Tuer mit Sicherheitsschalter, Reiniger-/Klarspueler-Dosierung.",
|
|
||||||
"source": "Selbstbewertung GT #3 (Fachmann-Erwartung, EN 60335-2-58 + EN ISO 12100)",
|
|
||||||
"version": "1.0",
|
|
||||||
"entries": [
|
|
||||||
{
|
|
||||||
"nr": "1.1",
|
|
||||||
"hazard_group": "Thermische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Verbrühung durch Heißwasser und Dampf",
|
|
||||||
"hazard_cause": "Beim Öffnen der Tür während oder kurz nach dem Spülgang tritt heißes Wasser und Wrasen (Dampf) aus der Spülkammer aus und trifft Gesicht, Hände und Arme",
|
|
||||||
"lifecycle_phases": ["Betrieb", "Reinigung"],
|
|
||||||
"component_zone": "Tür und Beschickungsöffnung der Spülkammer",
|
|
||||||
"risk_in": {"f": 4, "w": 3, "p": 2, "s": 3, "r": 27},
|
|
||||||
"measures": ["Türverriegelung beendet Spülgang vor dem Öffnen", "Wrasen-/Dampfreduzierung", "Warnhinweis heißer Dampf"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 2, "w": 1, "p": 1, "s": 2, "r": 8},
|
|
||||||
"norm_references": ["EN 60335-2-58"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "1.2",
|
|
||||||
"hazard_group": "Thermische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Verbrennung an heißen Oberflächen",
|
|
||||||
"hazard_cause": "Berührung heißer Oberflächen von Boiler, Tankheizkörper oder Spülkammerwänden bei Reinigung, Entkalkung oder Wartung",
|
|
||||||
"lifecycle_phases": ["Reinigung", "Instandhaltung"],
|
|
||||||
"component_zone": "Boiler, Tankheizkörper, Spülkammerwände",
|
|
||||||
"risk_in": {"f": 3, "w": 2, "p": 2, "s": 2, "r": 14},
|
|
||||||
"measures": ["Temperaturbegrenzung zugänglicher Oberflächen", "Warnhinweis heiße Oberfläche"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 2, "r": 6},
|
|
||||||
"norm_references": ["EN ISO 13732-1"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "1.3",
|
|
||||||
"hazard_group": "Thermische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Verbrennung an heißem Spülgut",
|
|
||||||
"hazard_cause": "Geschirr und Gläser sind nach der Heißwasser-Nachspülung sehr heiß, beim Entladen kommt es zu Verbrennungen an den Händen",
|
|
||||||
"lifecycle_phases": ["Betrieb"],
|
|
||||||
"component_zone": "Spülkammer, Entnahmebereich, Korb",
|
|
||||||
"risk_in": {"f": 3, "w": 3, "p": 2, "s": 2, "r": 16},
|
|
||||||
"measures": ["Abkühl-/Trocknungszeit", "Warnhinweis heißes Spülgut"],
|
|
||||||
"measure_type": "BI",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 2, "r": 6},
|
|
||||||
"norm_references": ["EN 60335-2-58"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "2.1",
|
|
||||||
"hazard_group": "Gefährdungen durch Materialien und Substanzen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Verätzung von Haut und Augen durch Reiniger-/Klarspüler-Konzentrat",
|
|
||||||
"hazard_cause": "Direkter Kontakt mit dem ätzenden Reiniger- bzw. Klarspüler-Konzentrat beim Nachfüllen, Sauglanzenwechsel oder bei Leckage des Dosiergeräts",
|
|
||||||
"lifecycle_phases": ["Betrieb", "Instandhaltung"],
|
|
||||||
"component_zone": "Dosiergerät, Reiniger- und Klarspüler-Gebinde, Sauglanzen",
|
|
||||||
"risk_in": {"f": 3, "w": 3, "p": 2, "s": 3, "r": 24},
|
|
||||||
"measures": ["Geschlossenes Dosiersystem mit Sauglanzen", "PSA Augen-/Hautschutz", "GHS-Kennzeichnung und Sicherheitsdatenblatt"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 3, "r": 9},
|
|
||||||
"norm_references": ["Verordnung (EG) Nr. 1272/2008", "TRGS 500"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "2.2",
|
|
||||||
"hazard_group": "Gefährdungen durch Materialien und Substanzen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Reizung der Atemwege durch Reinigungs-Aerosole und Dämpfe",
|
|
||||||
"hazard_cause": "Einatmen von Aerosolen und Dämpfen der Reinigungschemie beim Öffnen kurz nach dem Spülgang oder bei der Entkalkung mit Säure",
|
|
||||||
"lifecycle_phases": ["Betrieb", "Instandhaltung"],
|
|
||||||
"component_zone": "Atemzone vor der Spülkammer, Aufstellbereich",
|
|
||||||
"risk_in": {"f": 2, "w": 2, "p": 2, "s": 2, "r": 12},
|
|
||||||
"measures": ["Be-/Entlüftung", "geschlossene Haube", "Warnung vor Vermischen von Reiniger und Säure"],
|
|
||||||
"measure_type": "BI",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 2, "r": 6},
|
|
||||||
"norm_references": ["TRGS 500"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "3.1",
|
|
||||||
"hazard_group": "Elektrische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Elektrischer Schlag in Nassumgebung",
|
|
||||||
"hazard_cause": "Berührung spannungsführender Teile bei unzureichendem IP-Schutz, defekten Kabeldurchführungen oder Feuchtigkeit im Steuerungsgehäuse",
|
|
||||||
"lifecycle_phases": ["Betrieb", "Reinigung", "Instandhaltung"],
|
|
||||||
"component_zone": "Steuerungsgehäuse, Kabelübergänge, Antriebsgehäuse",
|
|
||||||
"risk_in": {"f": 3, "w": 2, "p": 3, "s": 4, "r": 32},
|
|
||||||
"measures": ["IP-Schutz gegen eindringendes Wasser", "Fehlerstrom-Schutzeinrichtung (RCD)"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 4, "r": 12},
|
|
||||||
"norm_references": ["IEC 60335-1"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "3.2",
|
|
||||||
"hazard_group": "Elektrische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Kurzschluss und Brand bei Reinigung am Schaltschrank",
|
|
||||||
"hazard_cause": "Reinigung ohne vorherige Freischaltung oder mit Hochdruckreiniger am elektrisch aktiven Schaltschrank führt zu Kurzschluss und Brand",
|
|
||||||
"lifecycle_phases": ["Reinigung", "Instandhaltung"],
|
|
||||||
"component_zone": "Schaltschrank, elektrisch aktive Komponenten",
|
|
||||||
"risk_in": {"f": 2, "w": 2, "p": 2, "s": 3, "r": 18},
|
|
||||||
"measures": ["Netztrenneinrichtung", "Warnhinweis Reinigung nur spannungsfrei, kein Hochdruckreiniger"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 3, "r": 9},
|
|
||||||
"norm_references": ["IEC 60204-1"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "3.3",
|
|
||||||
"hazard_group": "Elektrische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Motorüberlast mit Überhitzung",
|
|
||||||
"hazard_cause": "Blockierter oder überlasteter Pumpenmotor überhitzt, Wicklungsbrand und Rauchentwicklung",
|
|
||||||
"lifecycle_phases": ["Betrieb"],
|
|
||||||
"component_zone": "Motorgehäuse, Umgebung",
|
|
||||||
"risk_in": {"f": 2, "w": 2, "p": 2, "s": 2, "r": 12},
|
|
||||||
"measures": ["Überstromschutz", "Motorschutzschalter"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 2, "r": 6},
|
|
||||||
"norm_references": ["IEC 60204-1"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "4.1",
|
|
||||||
"hazard_group": "Mechanische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Ausrutschen auf nassem Boden",
|
|
||||||
"hazard_cause": "Aus der Spülmaschine austretendes Wasser durch Leckage oder beim Öffnen macht den Boden im Aufstellbereich rutschig, Person rutscht aus und stürzt",
|
|
||||||
"lifecycle_phases": ["Betrieb", "Reinigung", "Instandhaltung"],
|
|
||||||
"component_zone": "Aufstell- und Bedienbereich der Spülmaschine",
|
|
||||||
"risk_in": {"f": 3, "w": 3, "p": 2, "s": 2, "r": 16},
|
|
||||||
"measures": ["Rutschhemmender Bodenbelag", "Bodenablauf bzw. Leckagewanne"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 2, "r": 6},
|
|
||||||
"norm_references": ["ASR A1.5/1,2"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "4.2",
|
|
||||||
"hazard_group": "Mechanische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Quetschen der Finger an der Tür/Haube",
|
|
||||||
"hazard_cause": "Beim Schließen der Tür bzw. Absenken der Haube werden Finger zwischen Tür/Haube und Gehäuse gequetscht",
|
|
||||||
"lifecycle_phases": ["Betrieb"],
|
|
||||||
"component_zone": "Tür- und Haubenkante, Schließbereich",
|
|
||||||
"risk_in": {"f": 3, "w": 2, "p": 2, "s": 1, "r": 7},
|
|
||||||
"measures": ["Geringe Schließkraft, Einklemmschutz", "Abgerundete Türkanten"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 1, "r": 3},
|
|
||||||
"norm_references": ["EN ISO 12100"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "4.3",
|
|
||||||
"hazard_group": "Mechanische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Kontakt mit rotierendem Spülarm bei geöffneter Tür",
|
|
||||||
"hazard_cause": "Eingreifen in die Spülkammer bei noch nachlaufendem rotierendem Spülarm/Spülfeld nach dem Öffnen der Tür",
|
|
||||||
"lifecycle_phases": ["Betrieb", "Reinigung"],
|
|
||||||
"component_zone": "Spülkammer, Spülarm und Spülfeld",
|
|
||||||
"risk_in": {"f": 2, "w": 2, "p": 2, "s": 1, "r": 6},
|
|
||||||
"measures": ["Türverriegelung stoppt Spülarm beim Öffnen"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 1, "r": 3},
|
|
||||||
"norm_references": ["EN ISO 12100"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "5.1",
|
|
||||||
"hazard_group": "Ergonomische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Belastung des Bewegungsapparats durch wiederholte Be- und Entladung",
|
|
||||||
"hazard_cause": "Wiederholtes Heben und Bücken beim manuellen Be- und Entladen der Spülkörbe am Untertischgerät",
|
|
||||||
"lifecycle_phases": ["Betrieb"],
|
|
||||||
"component_zone": "Be- und Entladestelle, Spülkorb",
|
|
||||||
"risk_in": {"f": 4, "w": 3, "p": 2, "s": 1, "r": 9},
|
|
||||||
"measures": ["Ergonomische Arbeitshöhe", "Be-/Entladung auf günstiger Greifhöhe"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 2, "w": 1, "p": 1, "s": 1, "r": 4},
|
|
||||||
"norm_references": ["EN 1005-2"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "5.2",
|
|
||||||
"hazard_group": "Ergonomische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Zwangshaltung durch ungünstige Bedienelement-Position",
|
|
||||||
"hazard_cause": "Bedienelemente am HMI außerhalb der ergonomisch günstigen Reichweite führen bei dauerhafter Bedienung zu Zwangshaltung",
|
|
||||||
"lifecycle_phases": ["Betrieb"],
|
|
||||||
"component_zone": "Bedienstand HMI, Steuerpult",
|
|
||||||
"risk_in": {"f": 3, "w": 2, "p": 1, "s": 1, "r": 6},
|
|
||||||
"measures": ["Bedienelemente in ergonomisch günstiger Höhe"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 1, "r": 3},
|
|
||||||
"norm_references": ["EN 894-3"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "6.1",
|
|
||||||
"hazard_group": "zusätzliche Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Verlust einer Sicherheitsfunktion durch Steuerungs- oder Softwarefehler",
|
|
||||||
"hazard_cause": "Steuerungs- oder Softwarefehler der eigenen Maschinensteuerung führt zu unkontrolliertem Verhalten oder Verlust einer Sicherheitsfunktion",
|
|
||||||
"lifecycle_phases": ["Betrieb", "Instandhaltung"],
|
|
||||||
"component_zone": "Gesamte Maschine, Steuerung",
|
|
||||||
"risk_in": {"f": 2, "w": 2, "p": 2, "s": 3, "r": 18},
|
|
||||||
"measures": ["Sichere Fehlerbehandlung", "Sichere Software-Fallbacks", "Watchdog"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 3, "r": 9},
|
|
||||||
"norm_references": ["EN ISO 13849-1"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "6.2",
|
|
||||||
"hazard_group": "zusätzliche Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Verlust der Sicherheitsfunktion nach fehlerhaftem Software-Update",
|
|
||||||
"hazard_cause": "Korrupte oder inkompatible Firmware nach fehlerhaftem Update über die USB-Schnittstelle lässt die Steuerung undefiniert verhalten oder Sicherheitsfunktion verlieren",
|
|
||||||
"lifecycle_phases": ["Instandhaltung"],
|
|
||||||
"component_zone": "Gesamte Maschine, Steuerung, Update-Schnittstelle",
|
|
||||||
"risk_in": {"f": 2, "w": 2, "p": 2, "s": 3, "r": 18},
|
|
||||||
"measures": ["Atomares Update mit Rückfall auf lauffähige Version", "Kompatibilitätsprüfung vor Update"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 3, "r": 9},
|
|
||||||
"norm_references": ["EN ISO 13849-1"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "4.4",
|
|
||||||
"hazard_group": "Mechanische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Erfassen/Aufwickeln an rotierenden Teilen bei geöffneter Schutztür",
|
|
||||||
"hazard_cause": "Bei geöffneter Tür im Wartungs- oder Reinigungsfall können lose Kleidung oder Haare an noch zugänglichen rotierenden Wellen erfasst und aufgewickelt werden",
|
|
||||||
"lifecycle_phases": ["Instandhaltung", "Reinigung"],
|
|
||||||
"component_zone": "Rotierende Wellen, Spülarm bei geöffneter Schutztür",
|
|
||||||
"risk_in": {"f": 1, "w": 1, "p": 2, "s": 3, "r": 12},
|
|
||||||
"measures": ["Rotation stoppt bei geöffneter Tür durch Verriegelung", "Warnhinweis"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 3, "r": 6},
|
|
||||||
"norm_references": ["EN ISO 14120"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "4.5",
|
|
||||||
"hazard_group": "Mechanische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Reibung/Hautabschürfung an rotierenden Teilen bei geöffneter Schutztür",
|
|
||||||
"hazard_cause": "Berührung rotierender Wellen oder Oberflächen bei geöffneter Tür im Wartungsfall führt zu Hautabschürfungen durch Reibung",
|
|
||||||
"lifecycle_phases": ["Instandhaltung"],
|
|
||||||
"component_zone": "Rotierende Welle bei geöffneter Schutztür",
|
|
||||||
"risk_in": {"f": 1, "w": 1, "p": 2, "s": 2, "r": 8},
|
|
||||||
"measures": ["Rotation stoppt bei geöffneter Tür durch Verriegelung"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 2, "r": 4},
|
|
||||||
"norm_references": ["EN ISO 14120"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "1.4",
|
|
||||||
"hazard_group": "Thermische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Trockenlauf-Überhitzung von Boiler/Heizung",
|
|
||||||
"hazard_cause": "Das Heizelement bzw. der Boiler läuft bei Wassermangel trocken, überhitzt und kann einen Brand oder eine Verbrühung durch überhitztes Wasser auslösen",
|
|
||||||
"lifecycle_phases": ["Betrieb"],
|
|
||||||
"component_zone": "Boiler, Tankheizkörper, Heizelement",
|
|
||||||
"risk_in": {"f": 2, "w": 2, "p": 2, "s": 3, "r": 18},
|
|
||||||
"measures": ["Trockengehschutz / Niveauüberwachung der Heizung", "Temperaturbegrenzer (STB)"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 3, "r": 9},
|
|
||||||
"norm_references": ["EN 60335-2-58", "EN 60335-1"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "3.4",
|
|
||||||
"hazard_group": "Elektrische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Restspannung / gespeicherte elektrische Energie nach Abschalten",
|
|
||||||
"hazard_cause": "Nach dem Abschalten der Spannungsversorgung stehen durch Kondensatoren im Frequenzumrichter oder Netzfilter noch gefährliche Berührungsspannungen an",
|
|
||||||
"lifecycle_phases": ["Instandhaltung", "Fehlersuche und -beseitigung"],
|
|
||||||
"component_zone": "Frequenzumrichter, Netzfilter, Schaltschrank",
|
|
||||||
"risk_in": {"f": 1, "w": 2, "p": 3, "s": 4, "r": 24},
|
|
||||||
"measures": ["Sichere Energieentladung nach Abschalten", "Warnhinweis Restspannung, Entladezeit abwarten"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 4, "r": 12},
|
|
||||||
"norm_references": ["IEC 60204-1"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "4.6",
|
|
||||||
"hazard_group": "Mechanische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Schnittverletzung an scharfen Kanten",
|
|
||||||
"hazard_cause": "Schneiden an scharfen Blechkanten, Sieben oder dem Ablaufpumpen-Laufrad beim Reinigen oder Eingreifen in die Spülkammer",
|
|
||||||
"lifecycle_phases": ["Reinigung", "Instandhaltung"],
|
|
||||||
"component_zone": "Zugängliche Kanten, Siebe, Spülkammer, Ablaufpumpe",
|
|
||||||
"risk_in": {"f": 3, "w": 2, "p": 2, "s": 1, "r": 7},
|
|
||||||
"measures": ["Brechen oder Runden aller zugänglichen Kanten"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 1, "r": 3},
|
|
||||||
"norm_references": ["EN ISO 12100"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "4.7",
|
|
||||||
"hazard_group": "Mechanische Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Kippen / mangelnde Standsicherheit",
|
|
||||||
"hazard_cause": "Unzureichende Standsicherheit bei Untertischmontage, Transport oder Installation führt zum Kippen oder Umstürzen der Maschine",
|
|
||||||
"lifecycle_phases": ["Transport", "Montage und Installation"],
|
|
||||||
"component_zone": "Gesamte Maschine, Aufstellbereich",
|
|
||||||
"risk_in": {"f": 1, "w": 1, "p": 2, "s": 2, "r": 8},
|
|
||||||
"measures": ["Standsichere Aufstellung / Befestigung", "Kippsichere Konstruktion"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 2, "r": 4},
|
|
||||||
"norm_references": ["EN ISO 12100"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "2.3",
|
|
||||||
"hazard_group": "Gefährdungen durch Materialien und Substanzen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Rückfluss / Kontamination des Trinkwassers",
|
|
||||||
"hazard_cause": "Verschmutztes Spül- oder Chemiewasser wird ohne Rückflussverhinderer in das Trinkwassernetz zurückgesaugt und kontaminiert es",
|
|
||||||
"lifecycle_phases": ["Betrieb"],
|
|
||||||
"component_zone": "Frischwasseranschluss, Wasserzulauf",
|
|
||||||
"risk_in": {"f": 2, "w": 2, "p": 2, "s": 3, "r": 18},
|
|
||||||
"measures": ["Rückflussverhinderer / Systemtrenner nach EN 1717", "Freier Auslauf"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 3, "r": 9},
|
|
||||||
"norm_references": ["EN 1717", "EN 60335-2-58"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "2.4",
|
|
||||||
"hazard_group": "Gefährdungen durch Materialien und Substanzen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Mikrobielle Belastung / Legionellen im Stehwasser",
|
|
||||||
"hazard_cause": "Stehwasser im Boiler oder Tank bei niedrigen Temperaturen begünstigt mikrobielles Wachstum und Legionellen, die über Aerosole eingeatmet werden",
|
|
||||||
"lifecycle_phases": ["Betrieb", "Instandhaltung"],
|
|
||||||
"component_zone": "Boiler, Tank, Stehwasser",
|
|
||||||
"risk_in": {"f": 1, "w": 1, "p": 2, "s": 3, "r": 12},
|
|
||||||
"measures": ["Thermische Desinfektion / ausreichende Wassertemperatur", "Regelmäßiger Wasserwechsel"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 3, "r": 9},
|
|
||||||
"norm_references": ["EN 60335-2-58"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "6.3",
|
|
||||||
"hazard_group": "zusätzliche Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Versagen der Tür-/Schutzeinrichtungs-Verriegelung",
|
|
||||||
"hazard_cause": "Die Verriegelung des Tür-Sicherheitsschalters versagt oder wird überbrückt, sodass der Zugriff in die Spülkammer bei laufendem Spülgang (Heißwasser, rotierender Spülarm) möglich wird",
|
|
||||||
"lifecycle_phases": ["Betrieb", "Instandhaltung"],
|
|
||||||
"component_zone": "Tür-Sicherheitsschalter, Verriegelung, Spülkammer",
|
|
||||||
"risk_in": {"f": 3, "w": 2, "p": 2, "s": 3, "r": 21},
|
|
||||||
"measures": ["Sichere Verriegelung mit Fehlerüberwachung (PL nach ISO 13849)", "Zwangsöffnende Kontakte"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 3, "r": 9},
|
|
||||||
"norm_references": ["EN ISO 14119", "EN ISO 13849-1"],
|
|
||||||
"sufficient": true
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"nr": "6.4",
|
|
||||||
"hazard_group": "zusätzliche Gefährdungen",
|
|
||||||
"hazard_group_applicable": true,
|
|
||||||
"hazard_type": "Unerwarteter Wiederanlauf bei Wartung",
|
|
||||||
"hazard_cause": "Während Wartung oder Reinigung läuft die Maschine durch fehlende Freischaltung (LOTO) oder automatischen Wiederanlauf unerwartet an",
|
|
||||||
"lifecycle_phases": ["Instandhaltung", "Reinigung"],
|
|
||||||
"component_zone": "Gesamte Maschine, Antriebe, Pumpe",
|
|
||||||
"risk_in": {"f": 2, "w": 2, "p": 2, "s": 3, "r": 18},
|
|
||||||
"measures": ["Freischalten und gegen Wiedereinschalten sichern (LOTO)", "Kein automatischer Wiederanlauf"],
|
|
||||||
"measure_type": "KM",
|
|
||||||
"risk_out": {"f": 1, "w": 1, "p": 1, "s": 3, "r": 9},
|
|
||||||
"norm_references": ["IEC 60204-1", "EN ISO 12100"],
|
|
||||||
"sufficient": true
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
@@ -1,237 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import (
|
|
||||||
"regexp"
|
|
||||||
"strconv"
|
|
||||||
"strings"
|
|
||||||
)
|
|
||||||
|
|
||||||
// authorityInfo is the normative classification of a search result, used internally
|
|
||||||
// for re-ranking only (Phase 1 changes ordering, not the response contract).
|
|
||||||
type authorityInfo struct {
|
|
||||||
weight int // 100 binding, 80 technical_standard, 70 guidance, 0 foreign, 50 unknown
|
|
||||||
sourceClass string // binding_law | technical_standard | supervisory_guidance | foreign_law | unknown
|
|
||||||
jurisdiction string // DE | EU | CH
|
|
||||||
}
|
|
||||||
|
|
||||||
var (
|
|
||||||
guidanceMarkers = []string{
|
|
||||||
"DSK", "EDPB", "BfDI", "BFDI", "BayLfD", "Baylfb", "ENISA", "BSI", "EUCC",
|
|
||||||
"Standards Mapping", "Kpnr", "Orientierungshilfe", "Handreichung", "Beschluss",
|
|
||||||
"Leitlinie", "Guidance", "Empfehlung", "OECD", "CISA", "Blue Guide",
|
|
||||||
}
|
|
||||||
// Technical standards / control frameworks (best-practice controls). Checked BEFORE
|
|
||||||
// guidanceMarkers so a "BSI Grundschutz" chunk classifies as a standard, not BSI guidance.
|
|
||||||
standardMarkers = []string{
|
|
||||||
"NIST", "OWASP", "Grundschutz", "ISO 27001", "ISO/IEC 27001",
|
|
||||||
"CSA CCM", "Cloud Controls Matrix", "CIS Benchmark", "CIS Control",
|
|
||||||
}
|
|
||||||
foreignMarkers = []string{"RevDSG", "fedlex", "(CH)"}
|
|
||||||
deMarkers = []string{"BDSG", "DSK", "BfDI", "BFDI", "BayLfD", "Baylfb", "BSI"}
|
|
||||||
normPattern = regexp.MustCompile(`(§|Art\.?)\s*\d`)
|
|
||||||
bdsgParagraph = regexp.MustCompile(`§\s*(\d+)`)
|
|
||||||
)
|
|
||||||
|
|
||||||
// classifyAuthority derives weight/source-class/jurisdiction. Explicitly tagged payload
|
|
||||||
// values win; otherwise it falls back to the curated category + name markers, so the
|
|
||||||
// not-yet-re-ingested (untagged) corpus is still classified deterministically.
|
|
||||||
func classifyAuthority(r LegalSearchResult) authorityInfo {
|
|
||||||
jur := r.Jurisdiction
|
|
||||||
if jur == "" {
|
|
||||||
jur = inferJurisdiction(r)
|
|
||||||
}
|
|
||||||
hay := r.ArticleLabel + " " + r.RegulationShort + " " + r.RegulationName + " " + r.RegulationCode
|
|
||||||
// A recognised standard NAME (NIST/OWASP/ISO 27001/CIS/CSA CCM/Grundschutz) is authoritative
|
|
||||||
// even when the corpus mis-tagged the chunk as supervisory_guidance (weight 70) — many
|
|
||||||
// standards were ingested with a generic guidance source_class. The name wins, so they
|
|
||||||
// classify (and rank) as technical_standard / control_standard. binding_law is preserved.
|
|
||||||
if r.SourceClass != "binding_law" && containsAny(hay, standardMarkers) {
|
|
||||||
return authorityInfo{weight: 80, sourceClass: "technical_standard", jurisdiction: jur}
|
|
||||||
}
|
|
||||||
if r.SourceClass != "" {
|
|
||||||
w := r.AuthorityWeight
|
|
||||||
if w == 0 && r.SourceClass == "binding_law" {
|
|
||||||
w = 100
|
|
||||||
}
|
|
||||||
return authorityInfo{weight: w, sourceClass: r.SourceClass, jurisdiction: jur}
|
|
||||||
}
|
|
||||||
if r.AuthorityWeight > 0 {
|
|
||||||
return authorityInfo{weight: r.AuthorityWeight, sourceClass: sourceClassFromWeight(r.AuthorityWeight), jurisdiction: jur}
|
|
||||||
}
|
|
||||||
switch {
|
|
||||||
case containsAny(hay, foreignMarkers):
|
|
||||||
return authorityInfo{weight: 0, sourceClass: "foreign_law", jurisdiction: "CH"}
|
|
||||||
case r.Category == "standard" || containsAny(hay, standardMarkers):
|
|
||||||
return authorityInfo{weight: 80, sourceClass: "technical_standard", jurisdiction: jur}
|
|
||||||
case r.Category == "guidance" || containsAny(hay, guidanceMarkers):
|
|
||||||
return authorityInfo{weight: 70, sourceClass: "supervisory_guidance", jurisdiction: jur}
|
|
||||||
case r.Category == "regulation" || r.Category == "eu_recht" || normPattern.MatchString(r.ArticleLabel):
|
|
||||||
return authorityInfo{weight: 100, sourceClass: "binding_law", jurisdiction: jur}
|
|
||||||
default:
|
|
||||||
return authorityInfo{weight: 50, sourceClass: "unknown", jurisdiction: jur}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func sourceClassFromWeight(w int) string {
|
|
||||||
switch {
|
|
||||||
case w >= 100:
|
|
||||||
return "binding_law"
|
|
||||||
case w >= 80:
|
|
||||||
return "technical_standard"
|
|
||||||
case w >= 70:
|
|
||||||
return "supervisory_guidance"
|
|
||||||
case w <= 0:
|
|
||||||
return "foreign_law"
|
|
||||||
default:
|
|
||||||
return "unknown"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func inferJurisdiction(r LegalSearchResult) string {
|
|
||||||
hay := r.ArticleLabel + " " + r.RegulationShort + " " + r.RegulationName
|
|
||||||
switch {
|
|
||||||
case containsAny(hay, foreignMarkers):
|
|
||||||
return "CH"
|
|
||||||
case strings.Contains(hay, "§") || containsAny(hay, deMarkers):
|
|
||||||
return "DE"
|
|
||||||
default:
|
|
||||||
return "EU"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// --- Domain routing: separates same-authority but topically foreign norms ---
|
|
||||||
|
|
||||||
type domainDef struct {
|
|
||||||
name string
|
|
||||||
regs []string // regulation markers found in a chunk
|
|
||||||
keywords []string // query keywords that signal this domain
|
|
||||||
}
|
|
||||||
|
|
||||||
// Deterministic order (slice, not map) — important for stable classification + tests.
|
|
||||||
var domains = []domainDef{
|
|
||||||
{"data_protection",
|
|
||||||
[]string{"DSGVO", "GDPR", "BDSG", "EDPB", "DSK", "BfDI", "BayLfD", "DPF"},
|
|
||||||
[]string{"personenbezogen", "betroffene", "datenschutz", "datenschutzbeauftrag", "dsb",
|
|
||||||
"datenpanne", "auskunft", "loesch", "lösch", "einwilligung", "besondere kategorien", "auftragsverarbeiter"}},
|
|
||||||
{"cyber",
|
|
||||||
[]string{"CRA", "NIS2", "NIS-2", "ENISA", "DORA", "EUCC"},
|
|
||||||
[]string{"security update", "sicherheitsupdate", "sicherheitsaktualisierung", "schwachstelle", "sbom",
|
|
||||||
"cybersicherheit", "konformit", "hersteller", "importeur", "haendler", "händler", "ikt-",
|
|
||||||
"resilienz", "sicherheitsvorfall", "digitalen elementen"}},
|
|
||||||
{"ai",
|
|
||||||
[]string{"AI Act", "KI-VO", "KI-Verordnung"},
|
|
||||||
[]string{"ki-system", "ki-modell", "hochrisiko", "kuenstliche intelligenz", "künstliche intelligenz"}},
|
|
||||||
{"product_safety",
|
|
||||||
[]string{"Maschinenverordnung", "MaschinenVO", "GPSR", "RED", "MDR"},
|
|
||||||
nil},
|
|
||||||
}
|
|
||||||
|
|
||||||
func queryDomain(query string) string {
|
|
||||||
ql := strings.ToLower(query)
|
|
||||||
for _, d := range domains {
|
|
||||||
for _, kw := range d.keywords {
|
|
||||||
if strings.Contains(ql, kw) {
|
|
||||||
return d.name
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return ""
|
|
||||||
}
|
|
||||||
|
|
||||||
func chunkDomain(r LegalSearchResult) string {
|
|
||||||
hay := r.ArticleLabel + " " + r.RegulationShort + " " + r.RegulationCode + " " + r.RegulationName
|
|
||||||
for _, d := range domains {
|
|
||||||
if containsAny(hay, d.regs) {
|
|
||||||
return d.name
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return ""
|
|
||||||
}
|
|
||||||
|
|
||||||
// scopeClass flags special sub-regimes that must not win general questions —
|
|
||||||
// BDSG Teil 3 (§§ 45-84) implements the JI directive (law enforcement), not the general regime.
|
|
||||||
func scopeClass(r LegalSearchResult) string {
|
|
||||||
hay := r.ArticleLabel + " " + r.RegulationShort
|
|
||||||
if strings.Contains(hay, "BDSG") {
|
|
||||||
if m := bdsgParagraph.FindStringSubmatch(hay); m != nil {
|
|
||||||
if n, err := strconv.Atoi(m[1]); err == nil && n >= 45 && n <= 84 {
|
|
||||||
return "law_enforcement"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return "general"
|
|
||||||
}
|
|
||||||
|
|
||||||
// --- Topic ontology: amplifier only (boost), never an override ---
|
|
||||||
|
|
||||||
type topicDef struct {
|
|
||||||
keywords []string
|
|
||||||
norms []string // preferred canonical citation fragments
|
|
||||||
}
|
|
||||||
|
|
||||||
var topics = []topicDef{
|
|
||||||
{[]string{"datenschutzbeauftrag", "dsb", "benennung"}, []string{"Art. 37", "§ 38 BDSG"}},
|
|
||||||
{[]string{"stellung des"}, []string{"Art. 38"}},
|
|
||||||
{[]string{"aufgaben des"}, []string{"Art. 39"}},
|
|
||||||
{[]string{"folgenabsch", "dsfa"}, []string{"Art. 35"}},
|
|
||||||
{[]string{"besondere kategorien"}, []string{"Art. 9", "§ 22 BDSG"}},
|
|
||||||
{[]string{"auskunft"}, []string{"Art. 15", "§ 34 BDSG"}},
|
|
||||||
{[]string{"loesch", "lösch"}, []string{"Art. 17", "§ 35 BDSG"}},
|
|
||||||
{[]string{"bussgeld", "geldbusse"}, []string{"Art. 83"}},
|
|
||||||
{[]string{"security update", "sicherheitsupdate", "schwachstelle", "sbom", "cybersicherheitsanforderung"}, []string{"CRA Anhang I"}},
|
|
||||||
{[]string{"meldepflicht", "sicherheitsvorfall"}, []string{"Art. 14 CRA"}},
|
|
||||||
}
|
|
||||||
|
|
||||||
// resultMatchesTopic reports whether the result is a preferred norm of a topic the query hits.
|
|
||||||
func resultMatchesTopic(query string, r LegalSearchResult) bool {
|
|
||||||
ql := strings.ToLower(query)
|
|
||||||
hay := r.ArticleLabel + " " + r.RegulationShort
|
|
||||||
for _, t := range topics {
|
|
||||||
if !containsAnyLower(ql, t.keywords) {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
for _, n := range t.norms {
|
|
||||||
if normMatches(hay, n) {
|
|
||||||
return true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
|
|
||||||
// normMatches checks that norm appears in hay with a non-digit boundary, so "Art. 9"
|
|
||||||
// matches "Art. 9 DSGVO" but not "Art. 90".
|
|
||||||
func normMatches(hay, norm string) bool {
|
|
||||||
idx := strings.Index(hay, norm)
|
|
||||||
if idx < 0 {
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
end := idx + len(norm)
|
|
||||||
if end < len(hay) && hay[end] >= '0' && hay[end] <= '9' {
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
return true
|
|
||||||
}
|
|
||||||
|
|
||||||
func queryIsForeign(query string) bool {
|
|
||||||
return containsAnyLower(strings.ToLower(query),
|
|
||||||
[]string{"schweiz", "revdsg", "fedlex", " ch ", "oesterreich", "österreich"})
|
|
||||||
}
|
|
||||||
|
|
||||||
func containsAny(hay string, markers []string) bool {
|
|
||||||
for _, m := range markers {
|
|
||||||
if strings.Contains(hay, m) {
|
|
||||||
return true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
|
|
||||||
func containsAnyLower(haylower string, markers []string) bool {
|
|
||||||
for _, m := range markers {
|
|
||||||
if strings.Contains(haylower, strings.ToLower(m)) {
|
|
||||||
return true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
@@ -1,171 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import (
|
|
||||||
"sort"
|
|
||||||
"strings"
|
|
||||||
)
|
|
||||||
|
|
||||||
// Re-ranking coefficients (validated in the offline golden harness; Phase A — conservative).
|
|
||||||
const (
|
|
||||||
authorityCoef = 0.40 // * weight/100
|
|
||||||
jurisdictionGain = 0.05 // binding/guidance from DE or EU
|
|
||||||
foreignPenalty = 0.60 // foreign law on a DE/EU question (demoted, not removed)
|
|
||||||
unknownPenalty = 0.08
|
|
||||||
domainMatchGain = 0.15
|
|
||||||
offDomainPenalty = 0.10 // off-domain binding (demoted, not removed)
|
|
||||||
scopePenalty = 0.25 // BDSG Teil 3 (law enforcement) on a general DP question
|
|
||||||
topicGain = 0.18 // amplifier only
|
|
||||||
supersededPenalty = 0.50 // superseded Alt-Quelle (pre-eu-v1): demoted, nicht versteckt
|
|
||||||
intentLiftGain = 0.10 // epsilon a qualifying interpretative source is lifted ABOVE the best binding
|
|
||||||
intentLiftMargin = 0.05 // ...only if that source is semantically competitive with binding
|
|
||||||
)
|
|
||||||
|
|
||||||
// guidanceIntentSignals mark a query that EXPLICITLY asks for an interpretation /
|
|
||||||
// recommendation by a guidance body, rather than for the binding obligation. Only
|
|
||||||
// then may a (semantically competitive) guideline outrank the binding norm.
|
|
||||||
var guidanceIntentSignals = []string{
|
|
||||||
"edpb", "europäischer datenschutzausschuss", "europaeischer datenschutzausschuss",
|
|
||||||
"dsk", "enisa", "bsi", "leitlinie", "guideline", "orientierungshilfe",
|
|
||||||
"auslegung", "empfiehlt", "empfehlung", "sagt", "laut",
|
|
||||||
}
|
|
||||||
|
|
||||||
// controlIntentSignals mark a query that asks HOW to implement / which controls or
|
|
||||||
// measures fit — rather than WHAT the binding obligation is. Only then may a
|
|
||||||
// (semantically competitive) technical_standard outrank the binding norm.
|
|
||||||
var controlIntentSignals = []string{
|
|
||||||
"control", "controls", "maßnahme", "massnahme", "schutzmaßnahme",
|
|
||||||
"best practice", "best-practice", "umsetzen", "implementier", "absicher",
|
|
||||||
"härt", "haert", "hardening", "nist", "owasp", "grundschutz",
|
|
||||||
"ccm", "iso 27001", "isms",
|
|
||||||
}
|
|
||||||
|
|
||||||
func queryMatchesAny(query string, signals []string) bool {
|
|
||||||
q := strings.ToLower(query)
|
|
||||||
for _, sig := range signals {
|
|
||||||
if strings.Contains(q, sig) {
|
|
||||||
return true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
|
|
||||||
// queryWantsGuidance reports whether the query explicitly asks for guidance/interpretation.
|
|
||||||
func queryWantsGuidance(query string) bool { return queryMatchesAny(query, guidanceIntentSignals) }
|
|
||||||
|
|
||||||
// queryWantsControls reports whether the query asks for implementation controls/measures.
|
|
||||||
func queryWantsControls(query string) bool { return queryMatchesAny(query, controlIntentSignals) }
|
|
||||||
|
|
||||||
// bestBindingSemantic returns the highest RAW semantic score among binding-law
|
|
||||||
// results (0 if none / no intent). Used as the guard threshold so an off-topic
|
|
||||||
// interpretative source cannot ride the intent boost.
|
|
||||||
func bestBindingSemantic(results []LegalSearchResult, wantsIntent bool) float64 {
|
|
||||||
if !wantsIntent {
|
|
||||||
return 0
|
|
||||||
}
|
|
||||||
best := 0.0
|
|
||||||
for _, r := range results {
|
|
||||||
if classifyAuthority(r).sourceClass == "binding_law" && r.Score > best {
|
|
||||||
best = r.Score
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return best
|
|
||||||
}
|
|
||||||
|
|
||||||
// authorityScore computes the normative relevance of a result for a query. It augments the
|
|
||||||
// semantic score with authority/jurisdiction/domain/scope/topic signals. Exposed for tests.
|
|
||||||
func authorityScore(query string, r LegalSearchResult, qDomain string, qForeign bool) float64 {
|
|
||||||
info := classifyAuthority(r)
|
|
||||||
score := r.Score + authorityCoef*float64(info.weight)/100.0
|
|
||||||
|
|
||||||
if r.Superseded {
|
|
||||||
// Alt-Quelle (pre-eu-v1): Default-Fragen sollen die eu-v1-Norm sehen. Demoted,
|
|
||||||
// nicht entfernt — fuer Historie/Uebergangsfragen bleibt sie auffindbar.
|
|
||||||
score -= supersededPenalty
|
|
||||||
}
|
|
||||||
|
|
||||||
if info.jurisdiction == "CH" && !qForeign {
|
|
||||||
score -= foreignPenalty // Fremdrecht bei DE/EU-Frage: demoted, nicht geloescht
|
|
||||||
} else {
|
|
||||||
score += jurisdictionGain
|
|
||||||
}
|
|
||||||
if info.sourceClass == "unknown" {
|
|
||||||
score -= unknownPenalty
|
|
||||||
}
|
|
||||||
if qDomain != "" {
|
|
||||||
switch cd := chunkDomain(r); {
|
|
||||||
case cd == qDomain:
|
|
||||||
score += domainMatchGain
|
|
||||||
case cd != "":
|
|
||||||
score -= offDomainPenalty // off-domain binding: demoted, nicht geloescht
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if qDomain == "data_protection" && scopeClass(r) == "law_enforcement" {
|
|
||||||
score -= scopePenalty
|
|
||||||
}
|
|
||||||
if resultMatchesTopic(query, r) {
|
|
||||||
score += topicGain // Verstaerker, kein Override
|
|
||||||
}
|
|
||||||
return score
|
|
||||||
}
|
|
||||||
|
|
||||||
// rerankByAuthority re-orders results so binding law from the matching jurisdiction/domain
|
|
||||||
// ranks above guidance, foreign and off-domain law — WITHOUT dropping anything (guidance is
|
|
||||||
// kept as interpretation context). The computed score is written back to Score so downstream
|
|
||||||
// merges (e.g. the multi-collection advisor) preserve this order. Pure + deterministic.
|
|
||||||
func rerankByAuthority(query string, results []LegalSearchResult) []LegalSearchResult {
|
|
||||||
if len(results) < 2 {
|
|
||||||
return results
|
|
||||||
}
|
|
||||||
qDomain := queryDomain(query)
|
|
||||||
qForeign := queryIsForeign(query)
|
|
||||||
wantsGuidance := queryWantsGuidance(query)
|
|
||||||
wantsControls := queryWantsControls(query)
|
|
||||||
bestBindingSem := bestBindingSemantic(results, wantsGuidance)
|
|
||||||
|
|
||||||
out := make([]LegalSearchResult, len(results))
|
|
||||||
copy(out, results)
|
|
||||||
for i := range out {
|
|
||||||
out[i].Score = authorityScore(query, out[i], qDomain, qForeign)
|
|
||||||
}
|
|
||||||
// Explicit interpretation intent → a competitive guideline may outrank binding (lift
|
|
||||||
// above the best binding FINAL). Explicit implementation intent → boost the CONTROL-POOL
|
|
||||||
// (operational/procedural requirement, control standard, implementation guidance) over
|
|
||||||
// the abstract obligation, soft-ordered by role. Norm questions (neither) stay untouched.
|
|
||||||
if wantsGuidance {
|
|
||||||
liftAboveBinding(out, results, bestBindingSem, "supervisory_guidance")
|
|
||||||
}
|
|
||||||
if wantsControls {
|
|
||||||
applyControlRoles(out)
|
|
||||||
}
|
|
||||||
sort.SliceStable(out, func(a, b int) bool {
|
|
||||||
return out[a].Score > out[b].Score
|
|
||||||
})
|
|
||||||
return out
|
|
||||||
}
|
|
||||||
|
|
||||||
// liftAboveBinding lifts a semantically-competitive interpretative source (the given
|
|
||||||
// sourceClass — supervisory_guidance or technical_standard) just ABOVE the best binding
|
|
||||||
// hit, ordered by semantic, so an EXPLICIT guidance/implementation question can return
|
|
||||||
// that source Top-1. A pure norm question (no intent → not called) keeps binding on top.
|
|
||||||
// Sources below the semantic margin are left untouched, so an off-topic source can never
|
|
||||||
// ride the override — and the lift is from the binding FINAL score, so authority/topic/
|
|
||||||
// domain bonuses cannot edge it out.
|
|
||||||
func liftAboveBinding(out, raw []LegalSearchResult, bestBindingSem float64, sourceClass string) {
|
|
||||||
bestBindingFinal := 0.0
|
|
||||||
for i := range out {
|
|
||||||
if classifyAuthority(out[i]).sourceClass == "binding_law" && out[i].Score > bestBindingFinal {
|
|
||||||
bestBindingFinal = out[i].Score
|
|
||||||
}
|
|
||||||
}
|
|
||||||
for i := range out {
|
|
||||||
// Classify (not raw payload) so the untagged legacy corpus — e.g. NIST ingested
|
|
||||||
// before source_class tagging — is still recognized as its interpretative class.
|
|
||||||
if classifyAuthority(out[i]).sourceClass != sourceClass || raw[i].Score < bestBindingSem-intentLiftMargin {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
lifted := bestBindingFinal + intentLiftGain + (raw[i].Score - bestBindingSem)
|
|
||||||
if lifted > out[i].Score {
|
|
||||||
out[i].Score = lifted
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,96 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import "testing"
|
|
||||||
|
|
||||||
func bindingRes(label, reg, jur string, score float64) LegalSearchResult {
|
|
||||||
return LegalSearchResult{ArticleLabel: label, RegulationShort: reg, SourceClass: "binding_law", AuthorityWeight: 100, Jurisdiction: jur, Score: score}
|
|
||||||
}
|
|
||||||
|
|
||||||
func guidanceRes(label, reg string, score float64) LegalSearchResult {
|
|
||||||
return LegalSearchResult{ArticleLabel: label, RegulationShort: reg, SourceClass: "supervisory_guidance", AuthorityWeight: 70, Jurisdiction: "EU", Score: score}
|
|
||||||
}
|
|
||||||
|
|
||||||
func foreignRes(label string, score float64) LegalSearchResult {
|
|
||||||
return LegalSearchResult{ArticleLabel: label, RegulationShort: "RevDSG", SourceClass: "foreign_law", AuthorityWeight: 0, Jurisdiction: "CH", Score: score}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Acceptance criteria (Phase 1) expressed as ordering tests.
|
|
||||||
func TestRerankByAuthority_Acceptance(t *testing.T) {
|
|
||||||
t.Run("guidance does not overtake semantically competitive binding", func(t *testing.T) {
|
|
||||||
out := rerankByAuthority("Was gilt hier?", []LegalSearchResult{
|
|
||||||
guidanceRes("ENISA Mapping", "ENISA", 0.72),
|
|
||||||
bindingRes("CRA Anhang I", "CRA", "EU", 0.66),
|
|
||||||
})
|
|
||||||
if out[0].RegulationShort != "CRA" {
|
|
||||||
t.Fatalf("binding must rank first over competitive guidance, got %q", out[0].RegulationShort)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
t.Run("foreign law demoted on DE/EU question but kept", func(t *testing.T) {
|
|
||||||
in := []LegalSearchResult{foreignRes("RevDSG Art 1", 0.85), bindingRes("Art. 9 DSGVO", "DSGVO", "EU", 0.62)}
|
|
||||||
out := rerankByAuthority("Welche Daten sind besonders geschuetzt?", in)
|
|
||||||
if out[0].RegulationShort != "DSGVO" {
|
|
||||||
t.Fatalf("binding EU must beat foreign on a DE/EU query, got %q", out[0].RegulationShort)
|
|
||||||
}
|
|
||||||
if len(out) != 2 {
|
|
||||||
t.Fatalf("foreign law must be kept, got len=%d", len(out))
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
t.Run("off-domain binding demoted but not removed", func(t *testing.T) {
|
|
||||||
in := []LegalSearchResult{
|
|
||||||
bindingRes("Art. 13 EU MDR", "MDR", "EU", 0.70),
|
|
||||||
bindingRes("Art. 13 CRA", "CRA", "EU", 0.60),
|
|
||||||
}
|
|
||||||
out := rerankByAuthority("Welche Pflichten hat der Hersteller von Produkten mit digitalen Elementen?", in)
|
|
||||||
if out[0].RegulationShort != "CRA" {
|
|
||||||
t.Fatalf("on-domain CRA must beat off-domain MDR, got %q", out[0].RegulationShort)
|
|
||||||
}
|
|
||||||
if len(out) != 2 {
|
|
||||||
t.Fatalf("off-domain MDR must be kept, got len=%d", len(out))
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
t.Run("same-regime binding wins over guidance", func(t *testing.T) {
|
|
||||||
out := rerankByAuthority("Was gilt hier?", []LegalSearchResult{
|
|
||||||
bindingRes("Art. 13 CRA", "CRA", "EU", 0.70),
|
|
||||||
guidanceRes("ENISA Mapping", "ENISA", 0.60),
|
|
||||||
})
|
|
||||||
if out[0].RegulationShort != "CRA" {
|
|
||||||
t.Fatalf("binding must win, got %q", out[0].RegulationShort)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
t.Run("BDSG Teil 3 demoted below DSGVO on general DP question", func(t *testing.T) {
|
|
||||||
in := []LegalSearchResult{
|
|
||||||
bindingRes("§ 48 BDSG", "BDSG", "DE", 0.70), // Teil 3 (law enforcement)
|
|
||||||
bindingRes("Art. 9 DSGVO", "DSGVO", "EU", 0.62),
|
|
||||||
}
|
|
||||||
out := rerankByAuthority("Was sind besondere Kategorien personenbezogener Daten?", in)
|
|
||||||
if out[0].RegulationShort != "DSGVO" {
|
|
||||||
t.Fatalf("DSGVO must beat BDSG Teil 3 on a general DP question, got %q", out[0].RegulationShort)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
t.Run("nothing is dropped and topic amplifies", func(t *testing.T) {
|
|
||||||
in := []LegalSearchResult{
|
|
||||||
guidanceRes("ENISA", "ENISA", 0.72),
|
|
||||||
bindingRes("CRA Anhang I", "CRA", "EU", 0.66),
|
|
||||||
foreignRes("RevDSG", 0.5),
|
|
||||||
}
|
|
||||||
out := rerankByAuthority("Anforderungen an Security Updates?", in)
|
|
||||||
if len(out) != len(in) {
|
|
||||||
t.Fatalf("rerank must preserve all results, got %d want %d", len(out), len(in))
|
|
||||||
}
|
|
||||||
if out[0].ArticleLabel != "CRA Anhang I" {
|
|
||||||
t.Fatalf("topic+authority must lift CRA Anhang I to top, got %q", out[0].ArticleLabel)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
t.Run("single result returned unchanged", func(t *testing.T) {
|
|
||||||
in := []LegalSearchResult{bindingRes("Art. 1 CRA", "CRA", "EU", 0.5)}
|
|
||||||
if out := rerankByAuthority("x", in); len(out) != 1 {
|
|
||||||
t.Fatalf("len=%d", len(out))
|
|
||||||
}
|
|
||||||
})
|
|
||||||
}
|
|
||||||
@@ -1,130 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import "testing"
|
|
||||||
|
|
||||||
func TestClassifyAuthority(t *testing.T) {
|
|
||||||
tests := []struct {
|
|
||||||
name string
|
|
||||||
result LegalSearchResult
|
|
||||||
wantW int
|
|
||||||
wantSC string
|
|
||||||
wantJur string
|
|
||||||
}{
|
|
||||||
{"tagged binding EU", LegalSearchResult{AuthorityWeight: 100, SourceClass: "binding_law", Jurisdiction: "EU"}, 100, "binding_law", "EU"},
|
|
||||||
{"tagged guidance DE", LegalSearchResult{AuthorityWeight: 70, SourceClass: "supervisory_guidance", Jurisdiction: "DE"}, 70, "supervisory_guidance", "DE"},
|
|
||||||
{"tagged foreign CH", LegalSearchResult{AuthorityWeight: 0, SourceClass: "foreign_law", Jurisdiction: "CH"}, 0, "foreign_law", "CH"},
|
|
||||||
{"untagged ENISA guidance", LegalSearchResult{RegulationShort: "ENISA", ArticleLabel: "ENISA CRA Standards Mapping"}, 70, "supervisory_guidance", "EU"},
|
|
||||||
{"untagged NIST standard", LegalSearchResult{RegulationShort: "NIST SP 800-82r3", ArticleLabel: "AU-8"}, 80, "technical_standard", "EU"},
|
|
||||||
{"mis-tagged NIST guidance -> standard by name", LegalSearchResult{SourceClass: "supervisory_guidance", AuthorityWeight: 70, RegulationShort: "NIST SP 800-82r3", ArticleLabel: "NIST SP 800-82r3"}, 80, "technical_standard", "EU"},
|
|
||||||
{"BSI Grundschutz standard beats BSI guidance", LegalSearchResult{RegulationShort: "BSI Grundschutz", ArticleLabel: "BSI Grundschutz Baustein"}, 80, "technical_standard", "DE"},
|
|
||||||
{"weight-only 85 TRGS standard", LegalSearchResult{AuthorityWeight: 85, RegulationShort: "TRGS 529"}, 85, "technical_standard", "EU"},
|
|
||||||
{"tagged technical_standard", LegalSearchResult{AuthorityWeight: 80, SourceClass: "technical_standard", Jurisdiction: "EU"}, 80, "technical_standard", "EU"},
|
|
||||||
{"untagged CRA binding", LegalSearchResult{RegulationShort: "CRA", ArticleLabel: "Art. 13 CRA", Category: "regulation"}, 100, "binding_law", "EU"},
|
|
||||||
{"untagged BDSG binding DE", LegalSearchResult{RegulationShort: "BDSG", ArticleLabel: "§ 38 BDSG"}, 100, "binding_law", "DE"},
|
|
||||||
{"untagged RevDSG foreign", LegalSearchResult{RegulationShort: "RevDSG", ArticleLabel: "RevDSG (CH)"}, 0, "foreign_law", "CH"},
|
|
||||||
{"untagged unknown", LegalSearchResult{RegulationShort: "", ArticleLabel: ""}, 50, "unknown", "EU"},
|
|
||||||
}
|
|
||||||
for _, tt := range tests {
|
|
||||||
t.Run(tt.name, func(t *testing.T) {
|
|
||||||
got := classifyAuthority(tt.result)
|
|
||||||
if got.weight != tt.wantW || got.sourceClass != tt.wantSC || got.jurisdiction != tt.wantJur {
|
|
||||||
t.Errorf("classifyAuthority() = {%d %s %s}, want {%d %s %s}",
|
|
||||||
got.weight, got.sourceClass, got.jurisdiction, tt.wantW, tt.wantSC, tt.wantJur)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestQueryDomain(t *testing.T) {
|
|
||||||
tests := []struct{ q, want string }{
|
|
||||||
{"Welche Anforderungen an Security Updates?", "cyber"},
|
|
||||||
{"Wer braucht einen Datenschutzbeauftragten?", "data_protection"},
|
|
||||||
{"Was sind besondere Kategorien personenbezogener Daten?", "data_protection"},
|
|
||||||
{"Welche Pflichten beim Hochrisiko-KI-System?", "ai"},
|
|
||||||
{"Wie spaet ist es?", ""},
|
|
||||||
}
|
|
||||||
for _, tt := range tests {
|
|
||||||
if got := queryDomain(tt.q); got != tt.want {
|
|
||||||
t.Errorf("queryDomain(%q) = %q, want %q", tt.q, got, tt.want)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestChunkDomain(t *testing.T) {
|
|
||||||
tests := []struct {
|
|
||||||
name string
|
|
||||||
r LegalSearchResult
|
|
||||||
want string
|
|
||||||
}{
|
|
||||||
{"CRA cyber", LegalSearchResult{RegulationShort: "CRA", ArticleLabel: "Art. 13 CRA"}, "cyber"},
|
|
||||||
{"DSGVO dp", LegalSearchResult{RegulationShort: "DSGVO", ArticleLabel: "Art. 9 DSGVO"}, "data_protection"},
|
|
||||||
{"AI Act ai", LegalSearchResult{RegulationShort: "AI Act", ArticleLabel: "Art. 10 AI Act"}, "ai"},
|
|
||||||
{"MDR product", LegalSearchResult{RegulationShort: "MDR", ArticleLabel: "Art. 13 EU MDR"}, "product_safety"},
|
|
||||||
{"unknown", LegalSearchResult{RegulationShort: "XYZ"}, ""},
|
|
||||||
}
|
|
||||||
for _, tt := range tests {
|
|
||||||
t.Run(tt.name, func(t *testing.T) {
|
|
||||||
if got := chunkDomain(tt.r); got != tt.want {
|
|
||||||
t.Errorf("chunkDomain() = %q, want %q", got, tt.want)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestScopeClass(t *testing.T) {
|
|
||||||
tests := []struct {
|
|
||||||
name string
|
|
||||||
r LegalSearchResult
|
|
||||||
want string
|
|
||||||
}{
|
|
||||||
{"BDSG Teil 3 law enforcement", LegalSearchResult{RegulationShort: "BDSG", ArticleLabel: "§ 48 BDSG"}, "law_enforcement"},
|
|
||||||
{"BDSG general part", LegalSearchResult{RegulationShort: "BDSG", ArticleLabel: "§ 38 BDSG"}, "general"},
|
|
||||||
{"DSGVO general", LegalSearchResult{RegulationShort: "DSGVO", ArticleLabel: "Art. 9 DSGVO"}, "general"},
|
|
||||||
}
|
|
||||||
for _, tt := range tests {
|
|
||||||
t.Run(tt.name, func(t *testing.T) {
|
|
||||||
if got := scopeClass(tt.r); got != tt.want {
|
|
||||||
t.Errorf("scopeClass() = %q, want %q", got, tt.want)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestResultMatchesTopic(t *testing.T) {
|
|
||||||
tests := []struct {
|
|
||||||
name string
|
|
||||||
query string
|
|
||||||
r LegalSearchResult
|
|
||||||
want bool
|
|
||||||
}{
|
|
||||||
{"besondere Kategorien -> Art 9 match", "Was sind besondere Kategorien?", LegalSearchResult{ArticleLabel: "Art. 9 DSGVO"}, true},
|
|
||||||
{"besondere Kategorien -> Art 90 no match", "Was sind besondere Kategorien?", LegalSearchResult{ArticleLabel: "Art. 90 DSGVO"}, false},
|
|
||||||
{"security updates -> CRA Anhang I", "Anforderungen an Security Updates?", LegalSearchResult{ArticleLabel: "CRA Anhang I"}, true},
|
|
||||||
{"no topic keyword", "Wie spaet ist es?", LegalSearchResult{ArticleLabel: "Art. 9 DSGVO"}, false},
|
|
||||||
}
|
|
||||||
for _, tt := range tests {
|
|
||||||
t.Run(tt.name, func(t *testing.T) {
|
|
||||||
if got := resultMatchesTopic(tt.query, tt.r); got != tt.want {
|
|
||||||
t.Errorf("resultMatchesTopic() = %v, want %v", got, tt.want)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestNormMatches(t *testing.T) {
|
|
||||||
tests := []struct {
|
|
||||||
hay, norm string
|
|
||||||
want bool
|
|
||||||
}{
|
|
||||||
{"Art. 9 DSGVO", "Art. 9", true},
|
|
||||||
{"Art. 90 DSGVO", "Art. 9", false},
|
|
||||||
{"§ 38 BDSG", "§ 38 BDSG", true},
|
|
||||||
{"§ 380 BDSG", "§ 38", false},
|
|
||||||
{"Art. 14 CRA", "Art. 14 CRA", true},
|
|
||||||
}
|
|
||||||
for _, tt := range tests {
|
|
||||||
if got := normMatches(tt.hay, tt.norm); got != tt.want {
|
|
||||||
t.Errorf("normMatches(%q,%q) = %v, want %v", tt.hay, tt.norm, got, tt.want)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,151 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import (
|
|
||||||
"bufio"
|
|
||||||
"encoding/json"
|
|
||||||
"fmt"
|
|
||||||
"os"
|
|
||||||
"path/filepath"
|
|
||||||
"strings"
|
|
||||||
)
|
|
||||||
|
|
||||||
// ControlMapping is one persisted, versioned, REVIEWABLE link from a legal
|
|
||||||
// obligation/requirement to a concrete framework control — a node in the curated
|
|
||||||
// compliance graph (Regulation -> Obligation -> Control -> Evidence). The retriever only
|
|
||||||
// PROPOSES candidates (mapping_status=candidate); a human/rule decision turns the good ones
|
|
||||||
// into mapping_status=accepted, which is the audited truth the Advisor uses at runtime.
|
|
||||||
//
|
|
||||||
// There is intentionally NO probabilistic "confidence" field: once curated, a mapping is a
|
|
||||||
// professional statement, not an AI guess. The retriever's score lives only in the rationale
|
|
||||||
// of a candidate, never as structured truth.
|
|
||||||
type ControlMapping struct {
|
|
||||||
SourceNorm string `json:"source_norm"` // e.g. "CRA Annex I Part I (2)(c)"
|
|
||||||
SourceRole string `json:"source_role"` // source_role of the norm (operational_requirement, ...)
|
|
||||||
TargetFramework string `json:"target_framework"` // e.g. "OWASP ASVS"
|
|
||||||
TargetControl string `json:"target_control"` // e.g. "V6.3.1"
|
|
||||||
MappingType string `json:"mapping_type"` // supports | partially_supports | implements | related | contradicts
|
|
||||||
MappingStatus string `json:"mapping_status"` // candidate | accepted | rejected | superseded
|
|
||||||
Provenance string `json:"provenance"` // retriever_candidate | human_curated | rule_based
|
|
||||||
Rationale string `json:"rationale"`
|
|
||||||
ReviewedBy string `json:"reviewed_by,omitempty"` // who decided (human or rule id)
|
|
||||||
ReviewDate string `json:"review_date,omitempty"` // YYYY-MM-DD
|
|
||||||
ReviewReason string `json:"review_reason,omitempty"`
|
|
||||||
Version string `json:"version"`
|
|
||||||
}
|
|
||||||
|
|
||||||
// Allowed enum values — the deterministic "rule" layer that keeps the curated store clean.
|
|
||||||
var (
|
|
||||||
mappingTypeValues = map[string]bool{"supports": true, "partially_supports": true, "implements": true, "related": true, "contradicts": true}
|
|
||||||
mappingStatusValues = map[string]bool{"candidate": true, "accepted": true, "rejected": true, "superseded": true}
|
|
||||||
provenanceValues = map[string]bool{"retriever_candidate": true, "human_curated": true, "rule_based": true}
|
|
||||||
)
|
|
||||||
|
|
||||||
// Validate checks required fields + enum membership, and enforces the audit trail: any
|
|
||||||
// human/rule DECISION (accepted/rejected) must carry who/when/why. Fail-closed at load.
|
|
||||||
func (m ControlMapping) Validate() error {
|
|
||||||
switch {
|
|
||||||
case m.SourceNorm == "":
|
|
||||||
return fmt.Errorf("control mapping: source_norm required")
|
|
||||||
case m.TargetFramework == "":
|
|
||||||
return fmt.Errorf("control mapping: target_framework required")
|
|
||||||
case m.TargetControl == "":
|
|
||||||
return fmt.Errorf("control mapping: target_control required")
|
|
||||||
case !mappingTypeValues[m.MappingType]:
|
|
||||||
return fmt.Errorf("control mapping: invalid mapping_type %q", m.MappingType)
|
|
||||||
case !mappingStatusValues[m.MappingStatus]:
|
|
||||||
return fmt.Errorf("control mapping: invalid mapping_status %q", m.MappingStatus)
|
|
||||||
case !provenanceValues[m.Provenance]:
|
|
||||||
return fmt.Errorf("control mapping: invalid provenance %q", m.Provenance)
|
|
||||||
}
|
|
||||||
if m.MappingStatus == "accepted" || m.MappingStatus == "rejected" {
|
|
||||||
if m.ReviewedBy == "" || m.ReviewDate == "" || m.ReviewReason == "" {
|
|
||||||
return fmt.Errorf("control mapping %s->%s: status %q requires reviewed_by + review_date + review_reason (audit trail)",
|
|
||||||
m.SourceNorm, m.TargetControl, m.MappingStatus)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// IsAccepted reports whether this mapping is the active audited truth.
|
|
||||||
func (m ControlMapping) IsAccepted() bool { return m.MappingStatus == "accepted" }
|
|
||||||
|
|
||||||
// ControlMappingSet is the loaded, indexed mapping store (forward + reverse lookup).
|
|
||||||
type ControlMappingSet struct {
|
|
||||||
All []ControlMapping
|
|
||||||
bySourceNorm map[string][]ControlMapping
|
|
||||||
byControl map[string][]ControlMapping
|
|
||||||
}
|
|
||||||
|
|
||||||
func controlKey(framework, control string) string { return framework + ":" + control }
|
|
||||||
|
|
||||||
// ControlsFor returns the controls mapped to a source norm. acceptedOnly restricts to the
|
|
||||||
// audited truth (what the Advisor may treat as fact).
|
|
||||||
func (s *ControlMappingSet) ControlsFor(sourceNorm string, acceptedOnly bool) []ControlMapping {
|
|
||||||
return filterAccepted(s.bySourceNorm[sourceNorm], acceptedOnly)
|
|
||||||
}
|
|
||||||
|
|
||||||
// ObligationsFor returns the norms mapped to a framework control (reverse lookup).
|
|
||||||
func (s *ControlMappingSet) ObligationsFor(framework, control string, acceptedOnly bool) []ControlMapping {
|
|
||||||
return filterAccepted(s.byControl[controlKey(framework, control)], acceptedOnly)
|
|
||||||
}
|
|
||||||
|
|
||||||
func filterAccepted(in []ControlMapping, acceptedOnly bool) []ControlMapping {
|
|
||||||
if !acceptedOnly {
|
|
||||||
return in
|
|
||||||
}
|
|
||||||
out := make([]ControlMapping, 0, len(in))
|
|
||||||
for _, m := range in {
|
|
||||||
if m.IsAccepted() {
|
|
||||||
out = append(out, m)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return out
|
|
||||||
}
|
|
||||||
|
|
||||||
// LoadControlMappings reads every *.jsonl file under dir (one mapping per line; blank and
|
|
||||||
// //-prefixed lines ignored), validates each row, and builds the index. An invalid row
|
|
||||||
// aborts the whole load — fail-closed, because this is the audit truth, not best-effort.
|
|
||||||
func LoadControlMappings(dir string) (*ControlMappingSet, error) {
|
|
||||||
files, err := filepath.Glob(filepath.Join(dir, "*.jsonl"))
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
set := &ControlMappingSet{
|
|
||||||
bySourceNorm: map[string][]ControlMapping{},
|
|
||||||
byControl: map[string][]ControlMapping{},
|
|
||||||
}
|
|
||||||
for _, f := range files {
|
|
||||||
fh, err := os.Open(f)
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
sc := bufio.NewScanner(fh)
|
|
||||||
sc.Buffer(make([]byte, 0, 64*1024), 1024*1024)
|
|
||||||
line := 0
|
|
||||||
for sc.Scan() {
|
|
||||||
line++
|
|
||||||
raw := strings.TrimSpace(sc.Text())
|
|
||||||
if raw == "" || strings.HasPrefix(raw, "//") {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
var m ControlMapping
|
|
||||||
if err := json.Unmarshal([]byte(raw), &m); err != nil {
|
|
||||||
fh.Close()
|
|
||||||
return nil, fmt.Errorf("%s:%d: %w", f, line, err)
|
|
||||||
}
|
|
||||||
if err := m.Validate(); err != nil {
|
|
||||||
fh.Close()
|
|
||||||
return nil, fmt.Errorf("%s:%d: %w", f, line, err)
|
|
||||||
}
|
|
||||||
set.All = append(set.All, m)
|
|
||||||
set.bySourceNorm[m.SourceNorm] = append(set.bySourceNorm[m.SourceNorm], m)
|
|
||||||
k := controlKey(m.TargetFramework, m.TargetControl)
|
|
||||||
set.byControl[k] = append(set.byControl[k], m)
|
|
||||||
}
|
|
||||||
fh.Close()
|
|
||||||
if err := sc.Err(); err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return set, nil
|
|
||||||
}
|
|
||||||
@@ -1,85 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import (
|
|
||||||
"os"
|
|
||||||
"path/filepath"
|
|
||||||
"testing"
|
|
||||||
)
|
|
||||||
|
|
||||||
func TestControlMapping_Validate(t *testing.T) {
|
|
||||||
candidate := ControlMapping{SourceNorm: "CRA Annex I", TargetFramework: "OWASP ASVS", TargetControl: "V6.3.1", MappingType: "supports", MappingStatus: "candidate", Provenance: "retriever_candidate"}
|
|
||||||
if err := candidate.Validate(); err != nil {
|
|
||||||
t.Fatalf("valid candidate rejected: %v", err)
|
|
||||||
}
|
|
||||||
accepted := ControlMapping{SourceNorm: "A", TargetFramework: "X", TargetControl: "Y", MappingType: "implements", MappingStatus: "accepted", Provenance: "human_curated", ReviewedBy: "benjamin", ReviewDate: "2026-06-25", ReviewReason: "passt"}
|
|
||||||
if err := accepted.Validate(); err != nil {
|
|
||||||
t.Fatalf("valid accepted rejected: %v", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
bad := []struct {
|
|
||||||
name string
|
|
||||||
m ControlMapping
|
|
||||||
}{
|
|
||||||
{"no source_norm", ControlMapping{TargetFramework: "X", TargetControl: "Y", MappingType: "supports", MappingStatus: "candidate", Provenance: "retriever_candidate"}},
|
|
||||||
{"bad mapping_type", ControlMapping{SourceNorm: "A", TargetFramework: "X", TargetControl: "Y", MappingType: "nope", MappingStatus: "candidate", Provenance: "retriever_candidate"}},
|
|
||||||
{"bad mapping_status", ControlMapping{SourceNorm: "A", TargetFramework: "X", TargetControl: "Y", MappingType: "supports", MappingStatus: "maybe", Provenance: "retriever_candidate"}},
|
|
||||||
{"bad provenance", ControlMapping{SourceNorm: "A", TargetFramework: "X", TargetControl: "Y", MappingType: "supports", MappingStatus: "candidate", Provenance: "guessed"}},
|
|
||||||
{"accepted without audit trail", ControlMapping{SourceNorm: "A", TargetFramework: "X", TargetControl: "Y", MappingType: "supports", MappingStatus: "accepted", Provenance: "human_curated"}},
|
|
||||||
{"rejected without reason", ControlMapping{SourceNorm: "A", TargetFramework: "X", TargetControl: "Y", MappingType: "supports", MappingStatus: "rejected", Provenance: "human_curated", ReviewedBy: "b", ReviewDate: "2026-06-25"}},
|
|
||||||
}
|
|
||||||
for _, tt := range bad {
|
|
||||||
if err := tt.m.Validate(); err == nil {
|
|
||||||
t.Errorf("%s: expected rejection", tt.name)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestLoadControlMappings(t *testing.T) {
|
|
||||||
dir := t.TempDir()
|
|
||||||
content := `// header comment, ignored
|
|
||||||
{"source_norm":"CRA Annex I","source_role":"operational_requirement","target_framework":"OWASP ASVS","target_control":"V6.3.1","mapping_type":"supports","mapping_status":"accepted","provenance":"human_curated","reviewed_by":"benjamin","review_date":"2026-06-25","review_reason":"V6=Auth passt","rationale":"r","version":"2026-06-25"}
|
|
||||||
{"source_norm":"CRA Annex I","source_role":"operational_requirement","target_framework":"OWASP ASVS","target_control":"V14.2.4","mapping_type":"related","mapping_status":"candidate","provenance":"retriever_candidate","rationale":"r","version":"2026-06-25"}
|
|
||||||
|
|
||||||
`
|
|
||||||
if err := os.WriteFile(filepath.Join(dir, "m.jsonl"), []byte(content), 0o644); err != nil {
|
|
||||||
t.Fatal(err)
|
|
||||||
}
|
|
||||||
set, err := LoadControlMappings(dir)
|
|
||||||
if err != nil {
|
|
||||||
t.Fatalf("load: %v", err)
|
|
||||||
}
|
|
||||||
if len(set.All) != 2 {
|
|
||||||
t.Fatalf("want 2 mappings, got %d", len(set.All))
|
|
||||||
}
|
|
||||||
if got := set.ControlsFor("CRA Annex I", false); len(got) != 2 {
|
|
||||||
t.Errorf("ControlsFor(all): want 2, got %d", len(got))
|
|
||||||
}
|
|
||||||
if got := set.ControlsFor("CRA Annex I", true); len(got) != 1 {
|
|
||||||
t.Errorf("ControlsFor(acceptedOnly): want 1 (only accepted), got %d", len(got))
|
|
||||||
}
|
|
||||||
if got := set.ObligationsFor("OWASP ASVS", "V6.3.1", true); len(got) != 1 {
|
|
||||||
t.Errorf("ObligationsFor accepted reverse lookup: want 1, got %d", len(got))
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestLoadControlMappings_RejectsInvalid(t *testing.T) {
|
|
||||||
dir := t.TempDir()
|
|
||||||
// accepted without the who/when/why audit trail must fail-closed.
|
|
||||||
if err := os.WriteFile(filepath.Join(dir, "bad.jsonl"), []byte(`{"source_norm":"A","target_framework":"X","target_control":"Y","mapping_type":"supports","mapping_status":"accepted","provenance":"human_curated","rationale":"r","version":"v"}`), 0o644); err != nil {
|
|
||||||
t.Fatal(err)
|
|
||||||
}
|
|
||||||
if _, err := LoadControlMappings(dir); err == nil {
|
|
||||||
t.Error("accepted mapping without audit trail must fail the load (fail-closed)")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestControlMappings_SeedFileValid(t *testing.T) {
|
|
||||||
// The committed seed store must always load + validate.
|
|
||||||
set, err := LoadControlMappings("../../data/control_mappings")
|
|
||||||
if err != nil {
|
|
||||||
t.Fatalf("seed control_mappings failed to load: %v", err)
|
|
||||||
}
|
|
||||||
if len(set.All) == 0 {
|
|
||||||
t.Fatal("seed control_mappings is empty")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,174 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import "strings"
|
|
||||||
|
|
||||||
// source_role is the FUNCTIONAL role of a chunk — WHAT must be done (obligation),
|
|
||||||
// HOW to implement it (operational/procedural requirement, control standard,
|
|
||||||
// implementation guidance), or how to READ the norm (interpretation/definition).
|
|
||||||
// It is ORTHOGONAL to source_class (legal authority): source_class decides RANK,
|
|
||||||
// source_role decides CONTROL-POOL membership for implementation questions.
|
|
||||||
// Derived deterministically from markers, so the untagged corpus needs no re-tag.
|
|
||||||
const (
|
|
||||||
roleObligation = "obligation" // the abstract duty (the WHAT)
|
|
||||||
roleOperationalReq = "operational_requirement" // concrete binding requirement (CRA Annex I)
|
|
||||||
roleProceduralReq = "procedural_requirement" // a process: notification/registration/DPIA/incident report
|
|
||||||
roleControlStandard = "control_standard" // best-practice control catalog (NIST/OWASP/ISO/CIS)
|
|
||||||
roleImplGuidance = "implementation_guidance" // advisory how-to (ENISA good practices, BSI)
|
|
||||||
roleInterpretation = "interpretation" // interprets the norm's MEANING (EDPB guideline)
|
|
||||||
roleDefinition = "definition" // definitions / scope / recitals
|
|
||||||
)
|
|
||||||
|
|
||||||
var (
|
|
||||||
proceduralMarkers = []string{
|
|
||||||
"Meldung", "Meldepflicht", "Notification", "Notifizierung", "Registrierung",
|
|
||||||
"Registration", "Konformitätserklärung", "Declaration of Conformity", "Incident",
|
|
||||||
"Berichterstattung", "Reporting", "Folgenabschätzung", "DSFA", "DPIA", "Anzeigepflicht",
|
|
||||||
}
|
|
||||||
annexMarkers = []string{"Anhang", "Annex", "Appendix", "Anlage"}
|
|
||||||
operationalMarkers = []string{"Anforderung", "Requirement", "essential", "wesentliche"}
|
|
||||||
implMarkers = []string{
|
|
||||||
"Good Practice", "Best Practice", "Standards Mapping", "Umsetzung", "Implementation",
|
|
||||||
"Handreichung", "Maßnahmenkatalog", "ICS", "SCADA", "Technical Guideline", "TIG",
|
|
||||||
}
|
|
||||||
definitionMarkers = []string{"Begriffsbestimmung", "Definition"}
|
|
||||||
)
|
|
||||||
|
|
||||||
// classifyRole derives the functional source_role from chunk metadata + the authority
|
|
||||||
// class. technical_standard is always a control_standard; guidance splits into
|
|
||||||
// implementation_guidance (how-to) vs interpretation (meaning); binding splits into
|
|
||||||
// procedural / operational requirement / definition / plain obligation.
|
|
||||||
func classifyRole(r LegalSearchResult) string {
|
|
||||||
cls := classifyAuthority(r).sourceClass
|
|
||||||
hay := strings.ToLower(r.ArticleLabel + " " + r.RegulationShort + " " + r.RegulationName + " " + r.Article)
|
|
||||||
switch {
|
|
||||||
case r.IsRecital:
|
|
||||||
return roleDefinition
|
|
||||||
case cls == "technical_standard":
|
|
||||||
return roleControlStandard
|
|
||||||
case cls == "supervisory_guidance":
|
|
||||||
if containsAnyLower(hay, implMarkers) {
|
|
||||||
return roleImplGuidance
|
|
||||||
}
|
|
||||||
return roleInterpretation
|
|
||||||
case cls == "binding_law":
|
|
||||||
switch {
|
|
||||||
case containsAnyLower(hay, definitionMarkers):
|
|
||||||
return roleDefinition
|
|
||||||
case containsAnyLower(hay, proceduralMarkers):
|
|
||||||
return roleProceduralReq
|
|
||||||
case containsAnyLower(hay, annexMarkers) || containsAnyLower(hay, operationalMarkers):
|
|
||||||
return roleOperationalReq
|
|
||||||
default:
|
|
||||||
return roleObligation
|
|
||||||
}
|
|
||||||
default:
|
|
||||||
return roleObligation
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// controlRoleBonus is the soft intra-pool preference (User 2026-06-24):
|
|
||||||
// operational_requirement > procedural_requirement > control_standard > implementation_guidance.
|
|
||||||
var controlRoleBonus = map[string]float64{
|
|
||||||
roleOperationalReq: 0.100,
|
|
||||||
roleProceduralReq: 0.075,
|
|
||||||
roleControlStandard: 0.050,
|
|
||||||
roleImplGuidance: 0.000,
|
|
||||||
}
|
|
||||||
|
|
||||||
// controlPoolGain lifts EVERY control-pool role over the non-control roles (obligation/
|
|
||||||
// interpretation/definition) on an implementation question, so the binding abstract
|
|
||||||
// obligation does not dominate by authority alone. The obligation is not removed — it
|
|
||||||
// stays visible as "Rechtsgrundlage" context below the recommended measures.
|
|
||||||
const controlPoolGain = 0.15
|
|
||||||
|
|
||||||
// applyControlRoles boosts the control-pool (the four implementation roles) for an
|
|
||||||
// EXPLICIT implementation question, soft-ordered op_req > procedural > standard > guidance.
|
|
||||||
// Replaces the earlier "lift technical_standard above binding" — controls are not only
|
|
||||||
// technical_standard, and the binding operational_requirement (e.g. CRA Annex I) should win.
|
|
||||||
func applyControlRoles(out []LegalSearchResult) {
|
|
||||||
for i := range out {
|
|
||||||
if bonus, ok := controlRoleBonus[classifyRole(out[i])]; ok {
|
|
||||||
out[i].Score += controlPoolGain + bonus
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// isControlPoolRole reports whether a role belongs to the control-pool surfaced on
|
|
||||||
// implementation questions (the four "how to implement" roles).
|
|
||||||
func isControlPoolRole(role string) bool {
|
|
||||||
switch role {
|
|
||||||
case roleOperationalReq, roleProceduralReq, roleControlStandard, roleImplGuidance:
|
|
||||||
return true
|
|
||||||
}
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
|
|
||||||
// controlRoleOf classifies a raw Qdrant payload into a source_role, so searchControls can
|
|
||||||
// filter its deep dense pull to the control-pool BEFORE hits are mapped to LegalSearchResult.
|
|
||||||
func controlRoleOf(payload map[string]interface{}) string {
|
|
||||||
article := getString(payload, "article")
|
|
||||||
if article == "" {
|
|
||||||
article = getString(payload, "section")
|
|
||||||
}
|
|
||||||
return classifyRole(LegalSearchResult{
|
|
||||||
RegulationShort: getString(payload, "regulation_short"),
|
|
||||||
RegulationName: getString(payload, "regulation_name_de"),
|
|
||||||
ArticleLabel: getString(payload, "article_label"),
|
|
||||||
Article: article,
|
|
||||||
Category: getString(payload, "category"),
|
|
||||||
SourceClass: getString(payload, "source_class"),
|
|
||||||
AuthorityWeight: getInt(payload, "authority_weight"),
|
|
||||||
IsRecital: getBool(payload, "is_recital"),
|
|
||||||
})
|
|
||||||
}
|
|
||||||
|
|
||||||
// ensureControlDiversity guarantees that the returned top-K of a control question surfaces at
|
|
||||||
// least one operational_requirement and one control_standard WHEN the pool contains them —
|
|
||||||
// without forcing them to Top-1. implementation_guidance (e.g. ENISA good practices) keeps its
|
|
||||||
// earned semantic lead; the rule only promotes the best hit of a missing control role into the
|
|
||||||
// top-K by overwriting the lowest-ranked redundant guidance slot. So an implementation question
|
|
||||||
// shows the relevant source ROLES (binding requirement + standard + guidance) side by side
|
|
||||||
// instead of one role flooding the list. The promoted hit's original (now duplicate) position
|
|
||||||
// stays in the tail and is dropped by the caller's truncation to topK.
|
|
||||||
func ensureControlDiversity(results []LegalSearchResult, topK int) []LegalSearchResult {
|
|
||||||
if topK <= 0 || topK >= len(results) {
|
|
||||||
return results // everything is already returned — nothing to promote
|
|
||||||
}
|
|
||||||
roleAt := make([]string, len(results))
|
|
||||||
for i := range results {
|
|
||||||
roleAt[i] = classifyRole(results[i])
|
|
||||||
}
|
|
||||||
present := make(map[string]bool, topK)
|
|
||||||
for i := 0; i < topK; i++ {
|
|
||||||
present[roleAt[i]] = true
|
|
||||||
}
|
|
||||||
for _, want := range []string{roleOperationalReq, roleControlStandard} {
|
|
||||||
if present[want] {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
src := -1
|
|
||||||
for i := topK; i < len(results); i++ {
|
|
||||||
if roleAt[i] == want {
|
|
||||||
src = i
|
|
||||||
break
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if src < 0 {
|
|
||||||
continue // role absent from the whole pool — nothing to promote
|
|
||||||
}
|
|
||||||
dst := -1
|
|
||||||
for j := topK - 1; j >= 0; j-- {
|
|
||||||
if roleAt[j] == roleImplGuidance {
|
|
||||||
dst = j
|
|
||||||
break
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if dst < 0 {
|
|
||||||
continue // no redundant guidance to sacrifice — leave the head untouched
|
|
||||||
}
|
|
||||||
results[dst] = results[src]
|
|
||||||
roleAt[dst] = want
|
|
||||||
present[want] = true
|
|
||||||
}
|
|
||||||
return results
|
|
||||||
}
|
|
||||||
@@ -1,134 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import "testing"
|
|
||||||
|
|
||||||
func TestClassifyRole(t *testing.T) {
|
|
||||||
tests := []struct {
|
|
||||||
name string
|
|
||||||
r LegalSearchResult
|
|
||||||
want string
|
|
||||||
}{
|
|
||||||
{"NIST -> control_standard", LegalSearchResult{RegulationShort: "NIST SP 800-82r3", ArticleLabel: "AU-8"}, roleControlStandard},
|
|
||||||
{"OWASP -> control_standard", LegalSearchResult{RegulationShort: "OWASP ASVS"}, roleControlStandard},
|
|
||||||
{"CRA Anhang -> operational_requirement", LegalSearchResult{RegulationShort: "CRA", ArticleLabel: "CRA Anhang I", Category: "regulation"}, roleOperationalReq},
|
|
||||||
{"CRA Meldepflicht -> procedural_requirement", LegalSearchResult{RegulationShort: "CRA", ArticleLabel: "Art. 14 CRA Meldepflicht", Category: "regulation"}, roleProceduralReq},
|
|
||||||
{"ENISA Good Practices -> implementation_guidance", LegalSearchResult{RegulationShort: "ENISA Supply Chain Good Practices"}, roleImplGuidance},
|
|
||||||
{"EDPB Leitlinie -> interpretation", LegalSearchResult{RegulationShort: "EDPB DPO", ArticleLabel: "WP243 Leitlinien Datenschutzbeauftragte"}, roleInterpretation},
|
|
||||||
{"DORA article -> obligation", LegalSearchResult{RegulationShort: "DORA", ArticleLabel: "Art. 5 DORA", Category: "regulation"}, roleObligation},
|
|
||||||
{"DSGVO Begriffsbestimmungen -> definition", LegalSearchResult{RegulationShort: "DSGVO", ArticleLabel: "Art. 4 DSGVO Begriffsbestimmungen", Category: "regulation"}, roleDefinition},
|
|
||||||
{"recital -> definition", LegalSearchResult{RegulationShort: "CRA", IsRecital: true}, roleDefinition},
|
|
||||||
}
|
|
||||||
for _, tt := range tests {
|
|
||||||
t.Run(tt.name, func(t *testing.T) {
|
|
||||||
if got := classifyRole(tt.r); got != tt.want {
|
|
||||||
t.Errorf("classifyRole() = %q, want %q", got, tt.want)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestApplyControlRoles_PoolPreference(t *testing.T) {
|
|
||||||
// op_req > procedural > control_standard > impl_guidance; non-control roles get no boost.
|
|
||||||
roles := []struct {
|
|
||||||
r LegalSearchResult
|
|
||||||
wantGain float64
|
|
||||||
}{
|
|
||||||
{LegalSearchResult{ArticleLabel: "CRA Anhang I", Category: "regulation"}, controlPoolGain + 0.100},
|
|
||||||
{LegalSearchResult{ArticleLabel: "Art. 14 CRA Meldepflicht", Category: "regulation"}, controlPoolGain + 0.075},
|
|
||||||
{LegalSearchResult{RegulationShort: "NIST SP 800-53"}, controlPoolGain + 0.050},
|
|
||||||
{LegalSearchResult{RegulationShort: "ENISA Good Practices"}, controlPoolGain + 0.000},
|
|
||||||
{LegalSearchResult{ArticleLabel: "Art. 5 DORA", Category: "regulation"}, 0.0}, // obligation: no boost
|
|
||||||
}
|
|
||||||
for _, rc := range roles {
|
|
||||||
out := []LegalSearchResult{rc.r}
|
|
||||||
out[0].Score = 1.0
|
|
||||||
applyControlRoles(out)
|
|
||||||
if got := out[0].Score - 1.0; got < rc.wantGain-1e-9 || got > rc.wantGain+1e-9 {
|
|
||||||
t.Errorf("role %q: gain %.3f, want %.3f", classifyRole(rc.r), got, rc.wantGain)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestIsControlPoolRole(t *testing.T) {
|
|
||||||
for _, r := range []string{roleOperationalReq, roleProceduralReq, roleControlStandard, roleImplGuidance} {
|
|
||||||
if !isControlPoolRole(r) {
|
|
||||||
t.Errorf("%q should be in the control-pool", r)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
for _, r := range []string{roleObligation, roleInterpretation, roleDefinition} {
|
|
||||||
if isControlPoolRole(r) {
|
|
||||||
t.Errorf("%q should NOT be in the control-pool", r)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestControlRoleOf_Payload(t *testing.T) {
|
|
||||||
// searchControls filters its deep dense pull by classifying the raw Qdrant payload.
|
|
||||||
nist := map[string]interface{}{"regulation_short": "NIST SP 800-82r3", "article": "AU-8"}
|
|
||||||
if got := controlRoleOf(nist); got != roleControlStandard {
|
|
||||||
t.Errorf("untagged NIST payload role = %q, want control_standard", got)
|
|
||||||
}
|
|
||||||
craAnnex := map[string]interface{}{"regulation_short": "CRA", "article": "Anhang-I", "category": "regulation"}
|
|
||||||
if got := controlRoleOf(craAnnex); got != roleOperationalReq {
|
|
||||||
t.Errorf("CRA Anhang payload role = %q, want operational_requirement", got)
|
|
||||||
}
|
|
||||||
dora := map[string]interface{}{"regulation_short": "DORA", "article_label": "Art. 5 DORA", "category": "regulation"}
|
|
||||||
if got := controlRoleOf(dora); isControlPoolRole(got) {
|
|
||||||
t.Errorf("DORA abstract article role = %q must be excluded from the control-pool", got)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func headHasRole(head []LegalSearchResult, role string) bool {
|
|
||||||
for _, r := range head {
|
|
||||||
if classifyRole(r) == role {
|
|
||||||
return true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestEnsureControlDiversity(t *testing.T) {
|
|
||||||
ig := func(n string) LegalSearchResult {
|
|
||||||
return LegalSearchResult{RegulationShort: "ENISA " + n + " Good Practices"}
|
|
||||||
}
|
|
||||||
opReq := LegalSearchResult{RegulationShort: "CRA", ArticleLabel: "CRA Anhang I", Category: "regulation"}
|
|
||||||
std := LegalSearchResult{RegulationShort: "NIST SP 800-53"}
|
|
||||||
|
|
||||||
t.Run("injects missing op_req + control_standard, guidance keeps Top-1", func(t *testing.T) {
|
|
||||||
out := ensureControlDiversity([]LegalSearchResult{ig("A"), ig("B"), ig("C"), std, opReq}, 3)
|
|
||||||
head := out[:3]
|
|
||||||
if classifyRole(head[0]) != roleImplGuidance {
|
|
||||||
t.Errorf("Top-1 should stay implementation_guidance, got %q", classifyRole(head[0]))
|
|
||||||
}
|
|
||||||
if !headHasRole(head, roleOperationalReq) {
|
|
||||||
t.Error("top-K must contain an operational_requirement after diversity")
|
|
||||||
}
|
|
||||||
if !headHasRole(head, roleControlStandard) {
|
|
||||||
t.Error("top-K must contain a control_standard after diversity")
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
t.Run("no-op when both roles already present", func(t *testing.T) {
|
|
||||||
out := ensureControlDiversity([]LegalSearchResult{opReq, std, ig("A"), ig("B")}, 3)
|
|
||||||
if classifyRole(out[0]) != roleOperationalReq || classifyRole(out[1]) != roleControlStandard {
|
|
||||||
t.Error("already-diverse top-K must be left untouched")
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
t.Run("absent role is not forced (no panic)", func(t *testing.T) {
|
|
||||||
out := ensureControlDiversity([]LegalSearchResult{ig("A"), ig("B"), ig("C"), std}, 3)
|
|
||||||
if !headHasRole(out[:3], roleControlStandard) {
|
|
||||||
t.Error("present control_standard should be injected")
|
|
||||||
}
|
|
||||||
if headHasRole(out[:3], roleOperationalReq) {
|
|
||||||
t.Error("operational_requirement absent from the pool must NOT appear")
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
t.Run("topK covering the whole pool is unchanged", func(t *testing.T) {
|
|
||||||
out := ensureControlDiversity([]LegalSearchResult{ig("A"), opReq}, 5)
|
|
||||||
if len(out) != 2 || classifyRole(out[0]) != roleImplGuidance {
|
|
||||||
t.Error("topK >= len must return results unchanged")
|
|
||||||
}
|
|
||||||
})
|
|
||||||
}
|
|
||||||
@@ -1,117 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import (
|
|
||||||
"bufio"
|
|
||||||
"encoding/json"
|
|
||||||
"fmt"
|
|
||||||
"os"
|
|
||||||
"path/filepath"
|
|
||||||
"strings"
|
|
||||||
)
|
|
||||||
|
|
||||||
// EvidenceRequirement is the last edge of the compliance graph: it says WHAT concrete
|
|
||||||
// evidence proves a framework control is met, and how fresh that evidence must be. This is
|
|
||||||
// what lets the Advisor eventually state "the CRA requirement is fulfilled" — not because a
|
|
||||||
// document exists, but because the required, current evidence is present. Authored/curated,
|
|
||||||
// not retriever-generated.
|
|
||||||
type EvidenceRequirement struct {
|
|
||||||
Framework string `json:"framework"` // e.g. "OWASP ASVS"
|
|
||||||
Control string `json:"control"` // e.g. "V6.3.1"
|
|
||||||
EvidenceType string `json:"evidence_type"` // sbom|test_report|config_export|repo_scan|policy|ticket|audit_log|pentest
|
|
||||||
EvidenceSource string `json:"evidence_source"` // github|ci|scanner|manual_upload
|
|
||||||
FreshnessRequirement string `json:"freshness_requirement"` // per_release|quarterly|annually|continuous
|
|
||||||
Required bool `json:"required"`
|
|
||||||
Rationale string `json:"rationale"`
|
|
||||||
Version string `json:"version"`
|
|
||||||
}
|
|
||||||
|
|
||||||
// Allowed enum values — the rule layer that keeps the evidence catalog clean.
|
|
||||||
var (
|
|
||||||
evidenceTypeValues = map[string]bool{"sbom": true, "test_report": true, "config_export": true, "repo_scan": true, "policy": true, "ticket": true, "audit_log": true, "pentest": true}
|
|
||||||
evidenceSourceValues = map[string]bool{"github": true, "ci": true, "scanner": true, "manual_upload": true}
|
|
||||||
freshnessValues = map[string]bool{"per_release": true, "quarterly": true, "annually": true, "continuous": true}
|
|
||||||
)
|
|
||||||
|
|
||||||
// Validate checks required fields + enum membership. Fail-closed at load.
|
|
||||||
func (e EvidenceRequirement) Validate() error {
|
|
||||||
switch {
|
|
||||||
case e.Framework == "":
|
|
||||||
return fmt.Errorf("evidence requirement: framework required")
|
|
||||||
case e.Control == "":
|
|
||||||
return fmt.Errorf("evidence requirement: control required")
|
|
||||||
case !evidenceTypeValues[e.EvidenceType]:
|
|
||||||
return fmt.Errorf("evidence requirement: invalid evidence_type %q", e.EvidenceType)
|
|
||||||
case !evidenceSourceValues[e.EvidenceSource]:
|
|
||||||
return fmt.Errorf("evidence requirement: invalid evidence_source %q", e.EvidenceSource)
|
|
||||||
case !freshnessValues[e.FreshnessRequirement]:
|
|
||||||
return fmt.Errorf("evidence requirement: invalid freshness_requirement %q", e.FreshnessRequirement)
|
|
||||||
}
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// EvidenceRequirementSet is the loaded, indexed evidence catalog.
|
|
||||||
type EvidenceRequirementSet struct {
|
|
||||||
All []EvidenceRequirement
|
|
||||||
byControl map[string][]EvidenceRequirement
|
|
||||||
}
|
|
||||||
|
|
||||||
// For returns all evidence requirements declared for a framework control.
|
|
||||||
func (s *EvidenceRequirementSet) For(framework, control string) []EvidenceRequirement {
|
|
||||||
return s.byControl[controlKey(framework, control)]
|
|
||||||
}
|
|
||||||
|
|
||||||
// RequiredFor returns only the required evidence for a control — the minimum that must be
|
|
||||||
// present before the control may be treated as met.
|
|
||||||
func (s *EvidenceRequirementSet) RequiredFor(framework, control string) []EvidenceRequirement {
|
|
||||||
out := make([]EvidenceRequirement, 0)
|
|
||||||
for _, e := range s.byControl[controlKey(framework, control)] {
|
|
||||||
if e.Required {
|
|
||||||
out = append(out, e)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return out
|
|
||||||
}
|
|
||||||
|
|
||||||
// LoadEvidenceRequirements reads every *.jsonl file under dir (one requirement per line;
|
|
||||||
// blank and //-prefixed lines ignored), validates each, and builds the per-control index.
|
|
||||||
// An invalid row aborts the load — fail-closed.
|
|
||||||
func LoadEvidenceRequirements(dir string) (*EvidenceRequirementSet, error) {
|
|
||||||
files, err := filepath.Glob(filepath.Join(dir, "*.jsonl"))
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
set := &EvidenceRequirementSet{byControl: map[string][]EvidenceRequirement{}}
|
|
||||||
for _, f := range files {
|
|
||||||
fh, err := os.Open(f)
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
sc := bufio.NewScanner(fh)
|
|
||||||
sc.Buffer(make([]byte, 0, 64*1024), 1024*1024)
|
|
||||||
line := 0
|
|
||||||
for sc.Scan() {
|
|
||||||
line++
|
|
||||||
raw := strings.TrimSpace(sc.Text())
|
|
||||||
if raw == "" || strings.HasPrefix(raw, "//") {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
var e EvidenceRequirement
|
|
||||||
if err := json.Unmarshal([]byte(raw), &e); err != nil {
|
|
||||||
fh.Close()
|
|
||||||
return nil, fmt.Errorf("%s:%d: %w", f, line, err)
|
|
||||||
}
|
|
||||||
if err := e.Validate(); err != nil {
|
|
||||||
fh.Close()
|
|
||||||
return nil, fmt.Errorf("%s:%d: %w", f, line, err)
|
|
||||||
}
|
|
||||||
set.All = append(set.All, e)
|
|
||||||
k := controlKey(e.Framework, e.Control)
|
|
||||||
set.byControl[k] = append(set.byControl[k], e)
|
|
||||||
}
|
|
||||||
fh.Close()
|
|
||||||
if err := sc.Err(); err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return set, nil
|
|
||||||
}
|
|
||||||
@@ -1,84 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import (
|
|
||||||
"os"
|
|
||||||
"path/filepath"
|
|
||||||
"testing"
|
|
||||||
)
|
|
||||||
|
|
||||||
func TestEvidenceRequirement_Validate(t *testing.T) {
|
|
||||||
valid := EvidenceRequirement{Framework: "OWASP ASVS", Control: "V6.3.1", EvidenceType: "config_export", EvidenceSource: "github", FreshnessRequirement: "per_release", Required: true}
|
|
||||||
if err := valid.Validate(); err != nil {
|
|
||||||
t.Fatalf("valid rejected: %v", err)
|
|
||||||
}
|
|
||||||
bad := []struct {
|
|
||||||
name string
|
|
||||||
e EvidenceRequirement
|
|
||||||
}{
|
|
||||||
{"no control", EvidenceRequirement{Framework: "X", EvidenceType: "sbom", EvidenceSource: "ci", FreshnessRequirement: "per_release"}},
|
|
||||||
{"bad evidence_type", EvidenceRequirement{Framework: "X", Control: "Y", EvidenceType: "screenshot", EvidenceSource: "ci", FreshnessRequirement: "per_release"}},
|
|
||||||
{"bad evidence_source", EvidenceRequirement{Framework: "X", Control: "Y", EvidenceType: "sbom", EvidenceSource: "email", FreshnessRequirement: "per_release"}},
|
|
||||||
{"bad freshness", EvidenceRequirement{Framework: "X", Control: "Y", EvidenceType: "sbom", EvidenceSource: "ci", FreshnessRequirement: "weekly"}},
|
|
||||||
}
|
|
||||||
for _, tt := range bad {
|
|
||||||
if err := tt.e.Validate(); err == nil {
|
|
||||||
t.Errorf("%s: expected rejection", tt.name)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestLoadEvidenceRequirements(t *testing.T) {
|
|
||||||
dir := t.TempDir()
|
|
||||||
content := `// header
|
|
||||||
{"framework":"OWASP ASVS","control":"V6.3.1","evidence_type":"config_export","evidence_source":"github","freshness_requirement":"per_release","required":true,"version":"2026-06-25"}
|
|
||||||
{"framework":"OWASP ASVS","control":"V6.3.1","evidence_type":"pentest","evidence_source":"manual_upload","freshness_requirement":"annually","required":false,"version":"2026-06-25"}
|
|
||||||
`
|
|
||||||
if err := os.WriteFile(filepath.Join(dir, "e.jsonl"), []byte(content), 0o644); err != nil {
|
|
||||||
t.Fatal(err)
|
|
||||||
}
|
|
||||||
set, err := LoadEvidenceRequirements(dir)
|
|
||||||
if err != nil {
|
|
||||||
t.Fatalf("load: %v", err)
|
|
||||||
}
|
|
||||||
if len(set.All) != 2 {
|
|
||||||
t.Fatalf("want 2, got %d", len(set.All))
|
|
||||||
}
|
|
||||||
if got := set.For("OWASP ASVS", "V6.3.1"); len(got) != 2 {
|
|
||||||
t.Errorf("For: want 2, got %d", len(got))
|
|
||||||
}
|
|
||||||
if got := set.RequiredFor("OWASP ASVS", "V6.3.1"); len(got) != 1 {
|
|
||||||
t.Errorf("RequiredFor: want 1 (pentest is optional), got %d", len(got))
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestEvidenceRequirements_SeedFileValid(t *testing.T) {
|
|
||||||
set, err := LoadEvidenceRequirements("../../data/evidence_requirements")
|
|
||||||
if err != nil {
|
|
||||||
t.Fatalf("seed evidence_requirements failed to load: %v", err)
|
|
||||||
}
|
|
||||||
if len(set.All) == 0 {
|
|
||||||
t.Fatal("seed evidence_requirements is empty")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// TestGraph_AcceptedControlsHaveEvidence wires the two layers: every control an accepted
|
|
||||||
// CRA->OWASP mapping points to must have >=1 required evidence — the Obligation -> Control ->
|
|
||||||
// Evidence chain must be connected, no dangling control nodes.
|
|
||||||
func TestGraph_AcceptedControlsHaveEvidence(t *testing.T) {
|
|
||||||
maps, err := LoadControlMappings("../../data/control_mappings")
|
|
||||||
if err != nil {
|
|
||||||
t.Fatal(err)
|
|
||||||
}
|
|
||||||
ev, err := LoadEvidenceRequirements("../../data/evidence_requirements")
|
|
||||||
if err != nil {
|
|
||||||
t.Fatal(err)
|
|
||||||
}
|
|
||||||
for _, m := range maps.All {
|
|
||||||
if !m.IsAccepted() {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
if len(ev.RequiredFor(m.TargetFramework, m.TargetControl)) == 0 {
|
|
||||||
t.Errorf("accepted control %s %s has no required evidence (dangling graph node)", m.TargetFramework, m.TargetControl)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,167 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import (
|
|
||||||
"bytes"
|
|
||||||
"context"
|
|
||||||
"encoding/json"
|
|
||||||
"fmt"
|
|
||||||
"io"
|
|
||||||
"net/http"
|
|
||||||
"sort"
|
|
||||||
)
|
|
||||||
|
|
||||||
// LegalActStructure is the composition of one ingested eur-lex legal act — how
|
|
||||||
// many distinct articles, annexes and recitals it consists of (plus the raw
|
|
||||||
// chunk count). Backs the coverage page so the ingested corpus is not a black
|
|
||||||
// box: a developer SEES what each act actually contains, not only its name.
|
|
||||||
type LegalActStructure struct {
|
|
||||||
RegulationShort string `json:"regulation_short"`
|
|
||||||
RegulationName string `json:"regulation_name"`
|
|
||||||
Articles int `json:"articles"`
|
|
||||||
Annexes int `json:"annexes"`
|
|
||||||
Recitals int `json:"recitals"`
|
|
||||||
Chunks int `json:"chunks"`
|
|
||||||
}
|
|
||||||
|
|
||||||
const eurlexSource = "eur-lex.europa.eu"
|
|
||||||
|
|
||||||
// legalStructureCollections hold the clean eur-lex legal corpus (chunks tagged
|
|
||||||
// with chunk_scope = section | annex | recital).
|
|
||||||
var legalStructureCollections = []string{"bp_compliance_ce", "bp_compliance_datenschutz"}
|
|
||||||
|
|
||||||
// chunkScopeBucket maps a Qdrant chunk_scope to the structure field it feeds.
|
|
||||||
var chunkScopeBucket = map[string]string{"section": "articles", "annex": "annexes", "recital": "recitals"}
|
|
||||||
|
|
||||||
// CorpusStructure scrolls the eur-lex legal corpus across the legal collections
|
|
||||||
// and aggregates the per-act composition. The source filter keeps it to a few
|
|
||||||
// hundred points regardless of total corpus size. Read-only; a collection that
|
|
||||||
// fails to scroll is skipped rather than failing the whole call.
|
|
||||||
func (c *LegalRAGClient) CorpusStructure(ctx context.Context) ([]LegalActStructure, error) {
|
|
||||||
var all []qdrantScrollPoint
|
|
||||||
for _, coll := range legalStructureCollections {
|
|
||||||
pts, err := c.scrollLegalCorpus(ctx, coll)
|
|
||||||
if err != nil {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
all = append(all, pts...)
|
|
||||||
}
|
|
||||||
return aggregateStructure(all), nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// aggregateStructure counts distinct article labels per (regulation, scope).
|
|
||||||
// Pure → unit-testable without a vector store.
|
|
||||||
func aggregateStructure(points []qdrantScrollPoint) []LegalActStructure {
|
|
||||||
distinct := map[string]map[string]map[string]struct{}{}
|
|
||||||
names := map[string]string{}
|
|
||||||
chunks := map[string]int{}
|
|
||||||
order := []string{}
|
|
||||||
|
|
||||||
for _, pt := range points {
|
|
||||||
reg := getString(pt.Payload, "regulation_short")
|
|
||||||
if reg == "" {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
if _, seen := names[reg]; !seen {
|
|
||||||
name := getString(pt.Payload, "regulation_name_de")
|
|
||||||
if name == "" {
|
|
||||||
name = reg
|
|
||||||
}
|
|
||||||
names[reg] = name
|
|
||||||
distinct[reg] = map[string]map[string]struct{}{}
|
|
||||||
order = append(order, reg)
|
|
||||||
}
|
|
||||||
chunks[reg]++
|
|
||||||
bucket, ok := chunkScopeBucket[getString(pt.Payload, "chunk_scope")]
|
|
||||||
article := getString(pt.Payload, "article")
|
|
||||||
if !ok || article == "" {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
if distinct[reg][bucket] == nil {
|
|
||||||
distinct[reg][bucket] = map[string]struct{}{}
|
|
||||||
}
|
|
||||||
distinct[reg][bucket][article] = struct{}{}
|
|
||||||
}
|
|
||||||
|
|
||||||
out := make([]LegalActStructure, 0, len(order))
|
|
||||||
for _, reg := range order {
|
|
||||||
out = append(out, LegalActStructure{
|
|
||||||
RegulationShort: reg,
|
|
||||||
RegulationName: names[reg],
|
|
||||||
Articles: len(distinct[reg]["articles"]),
|
|
||||||
Annexes: len(distinct[reg]["annexes"]),
|
|
||||||
Recitals: len(distinct[reg]["recitals"]),
|
|
||||||
Chunks: chunks[reg],
|
|
||||||
})
|
|
||||||
}
|
|
||||||
sort.SliceStable(out, func(i, j int) bool {
|
|
||||||
if out[i].Articles != out[j].Articles {
|
|
||||||
return out[i].Articles > out[j].Articles
|
|
||||||
}
|
|
||||||
return out[i].RegulationShort < out[j].RegulationShort
|
|
||||||
})
|
|
||||||
return out
|
|
||||||
}
|
|
||||||
|
|
||||||
// scrollLegalCorpus pages through one collection, filtered to the eur-lex legal
|
|
||||||
// corpus, returning minimal-payload points (no text/vectors).
|
|
||||||
func (c *LegalRAGClient) scrollLegalCorpus(ctx context.Context, collection string) ([]qdrantScrollPoint, error) {
|
|
||||||
var all []qdrantScrollPoint
|
|
||||||
var offset interface{}
|
|
||||||
for {
|
|
||||||
points, next, err := c.scrollLegalPage(ctx, collection, offset)
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
all = append(all, points...)
|
|
||||||
if next == nil {
|
|
||||||
break
|
|
||||||
}
|
|
||||||
offset = next
|
|
||||||
}
|
|
||||||
return all, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// scrollLegalPage fetches one page of the filtered scroll and returns the
|
|
||||||
// points plus the next-page offset (nil when exhausted).
|
|
||||||
func (c *LegalRAGClient) scrollLegalPage(ctx context.Context, collection string, offset interface{}) ([]qdrantScrollPoint, interface{}, error) {
|
|
||||||
reqBody := map[string]interface{}{
|
|
||||||
"limit": 500,
|
|
||||||
"with_payload": map[string]interface{}{"include": []string{"regulation_short", "regulation_name_de", "chunk_scope", "article"}},
|
|
||||||
"with_vectors": false,
|
|
||||||
"filter": map[string]interface{}{
|
|
||||||
"must": []map[string]interface{}{
|
|
||||||
{"key": "source", "match": map[string]interface{}{"value": eurlexSource}},
|
|
||||||
},
|
|
||||||
},
|
|
||||||
}
|
|
||||||
if offset != nil {
|
|
||||||
reqBody["offset"] = offset
|
|
||||||
}
|
|
||||||
jsonBody, err := json.Marshal(reqBody)
|
|
||||||
if err != nil {
|
|
||||||
return nil, nil, err
|
|
||||||
}
|
|
||||||
url := fmt.Sprintf("%s/collections/%s/points/scroll", c.qdrantURL, collection)
|
|
||||||
req, err := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(jsonBody))
|
|
||||||
if err != nil {
|
|
||||||
return nil, nil, err
|
|
||||||
}
|
|
||||||
req.Header.Set("Content-Type", "application/json")
|
|
||||||
if c.qdrantAPIKey != "" {
|
|
||||||
req.Header.Set("api-key", c.qdrantAPIKey)
|
|
||||||
}
|
|
||||||
resp, err := c.httpClient.Do(req)
|
|
||||||
if err != nil {
|
|
||||||
return nil, nil, err
|
|
||||||
}
|
|
||||||
defer func() { _ = resp.Body.Close() }()
|
|
||||||
if resp.StatusCode != http.StatusOK {
|
|
||||||
body, _ := io.ReadAll(resp.Body)
|
|
||||||
return nil, nil, fmt.Errorf("qdrant returned %d: %s", resp.StatusCode, string(body))
|
|
||||||
}
|
|
||||||
var scrollResp qdrantScrollResponse
|
|
||||||
if err := json.NewDecoder(resp.Body).Decode(&scrollResp); err != nil {
|
|
||||||
return nil, nil, err
|
|
||||||
}
|
|
||||||
return scrollResp.Result.Points, scrollResp.Result.NextPageOffset, nil
|
|
||||||
}
|
|
||||||
@@ -1,50 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import "testing"
|
|
||||||
|
|
||||||
func structPoint(reg, name, scope, article string) qdrantScrollPoint {
|
|
||||||
return qdrantScrollPoint{Payload: map[string]interface{}{
|
|
||||||
"regulation_short": reg,
|
|
||||||
"regulation_name_de": name,
|
|
||||||
"chunk_scope": scope,
|
|
||||||
"article": article,
|
|
||||||
}}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestAggregateStructure_CountsDistinctPerScope(t *testing.T) {
|
|
||||||
points := []qdrantScrollPoint{
|
|
||||||
structPoint("CRA", "Cyber Resilience Act", "section", "13"),
|
|
||||||
structPoint("CRA", "Cyber Resilience Act", "section", "13"), // duplicate article → still 1
|
|
||||||
structPoint("CRA", "Cyber Resilience Act", "section", "14"),
|
|
||||||
structPoint("CRA", "Cyber Resilience Act", "annex", "Anhang-I"),
|
|
||||||
structPoint("CRA", "Cyber Resilience Act", "annex", "Anhang-VII"),
|
|
||||||
structPoint("DORA", "", "section", "6"), // first sighting has no name →
|
|
||||||
structPoint("DORA", "", "section", "19"), // regulation_name falls back to short
|
|
||||||
structPoint("DORA", "", "recital", ""), // empty article → ignored for distinct
|
|
||||||
structPoint("", "x", "section", "1"), // missing regulation → skipped entirely
|
|
||||||
}
|
|
||||||
|
|
||||||
got := aggregateStructure(points)
|
|
||||||
|
|
||||||
if len(got) != 2 {
|
|
||||||
t.Fatalf("want 2 acts, got %d (%+v)", len(got), got)
|
|
||||||
}
|
|
||||||
// CRA has more articles → sorts first.
|
|
||||||
cra := got[0]
|
|
||||||
if cra.RegulationShort != "CRA" || cra.Articles != 2 || cra.Annexes != 2 || cra.Recitals != 0 || cra.Chunks != 5 {
|
|
||||||
t.Errorf("CRA wrong: %+v", cra)
|
|
||||||
}
|
|
||||||
dora := got[1]
|
|
||||||
if dora.RegulationShort != "DORA" || dora.Articles != 2 || dora.Chunks != 3 {
|
|
||||||
t.Errorf("DORA wrong: %+v", dora)
|
|
||||||
}
|
|
||||||
if dora.RegulationName != "DORA" {
|
|
||||||
t.Errorf("DORA name fallback failed: %q", dora.RegulationName)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestAggregateStructure_Empty(t *testing.T) {
|
|
||||||
if got := aggregateStructure(nil); len(got) != 0 {
|
|
||||||
t.Errorf("want empty, got %+v", got)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,134 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import (
|
|
||||||
"fmt"
|
|
||||||
"strings"
|
|
||||||
)
|
|
||||||
|
|
||||||
const (
|
|
||||||
assessConnectedCap = 12 // cap connected norms surfaced in the assessment
|
|
||||||
assessCrossRegimeTopN = 5 // window over which "cross regime" is judged
|
|
||||||
assessReviewMargin = 0.05 // a tighter winner gap → recommend human review
|
|
||||||
)
|
|
||||||
|
|
||||||
// Assess builds the auditable explanation layer over a ranked result set:
|
|
||||||
// primary norm, the norms it connects to (citation graph), cross-regime, a
|
|
||||||
// human-review flag, the winner margin and a short reasoning string. Pure →
|
|
||||||
// unit-testable. It EXPLAINS the ranking, it does not change it. Returns nil for
|
|
||||||
// an empty result set.
|
|
||||||
func Assess(results []LegalSearchResult) *LegalAssessment {
|
|
||||||
if len(results) == 0 {
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
// Norm-level view: collapse multiple chunks of the same article/annex so the
|
|
||||||
// margin and cross-regime are judged between DISTINCT norms, not near-identical
|
|
||||||
// chunks of one norm (which would make every winner margin ~0).
|
|
||||||
norms := distinctNorms(results)
|
|
||||||
p := norms[0]
|
|
||||||
|
|
||||||
primary := primaryLabel(p)
|
|
||||||
connected := dedupStrings(p.ReferencesOut, p.ReferencesIn, p.CitationUnit)
|
|
||||||
if len(connected) > assessConnectedCap {
|
|
||||||
connected = connected[:assessConnectedCap]
|
|
||||||
}
|
|
||||||
|
|
||||||
window := norms
|
|
||||||
if len(window) > assessCrossRegimeTopN {
|
|
||||||
window = window[:assessCrossRegimeTopN]
|
|
||||||
}
|
|
||||||
regimes := make(map[string]bool)
|
|
||||||
for _, r := range window {
|
|
||||||
if r.RegulationShort != "" {
|
|
||||||
regimes[r.RegulationShort] = true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
crossRegime := len(regimes) > 1
|
|
||||||
|
|
||||||
margin := 0.0
|
|
||||||
if len(norms) > 1 {
|
|
||||||
margin = norms[0].Score - norms[1].Score
|
|
||||||
}
|
|
||||||
|
|
||||||
primaryBinding := p.SourceClass == "binding_law"
|
|
||||||
humanReview := margin < assessReviewMargin || crossRegime || !primaryBinding
|
|
||||||
|
|
||||||
return &LegalAssessment{
|
|
||||||
PrimaryNorm: primary,
|
|
||||||
PrimaryRegulation: p.RegulationShort,
|
|
||||||
ConnectedNorms: connected,
|
|
||||||
CrossRegime: crossRegime,
|
|
||||||
HumanReviewFlag: humanReview,
|
|
||||||
WinnerMargin: margin,
|
|
||||||
ScoreReasoning: assessReasoning(p, margin, crossRegime, primaryBinding),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func primaryLabel(p LegalSearchResult) string {
|
|
||||||
if p.CitationUnit != "" {
|
|
||||||
return p.CitationUnit
|
|
||||||
}
|
|
||||||
if p.ArticleLabel != "" {
|
|
||||||
return p.ArticleLabel
|
|
||||||
}
|
|
||||||
return strings.TrimSpace(p.RegulationShort + " " + p.Article)
|
|
||||||
}
|
|
||||||
|
|
||||||
// assessReasoning renders a short, human-readable justification (German).
|
|
||||||
func assessReasoning(p LegalSearchResult, margin float64, crossRegime, primaryBinding bool) string {
|
|
||||||
label := primaryLabel(p)
|
|
||||||
parts := make([]string, 0, 4)
|
|
||||||
if primaryBinding {
|
|
||||||
parts = append(parts, fmt.Sprintf("Primärtreffer %s: bindendes Recht (Autorität %d).", label, p.AuthorityWeight))
|
|
||||||
} else {
|
|
||||||
parts = append(parts, fmt.Sprintf("Primärtreffer %s ist keine bindende Norm (Leitlinie/Standard) — Quelle prüfen.", label))
|
|
||||||
}
|
|
||||||
if margin > 0 {
|
|
||||||
parts = append(parts, fmt.Sprintf("Vorsprung %.2f vor #2.", margin))
|
|
||||||
}
|
|
||||||
if margin < assessReviewMargin {
|
|
||||||
parts = append(parts, "Knapper Vorsprung — Alternativtreffer prüfen.")
|
|
||||||
}
|
|
||||||
if crossRegime {
|
|
||||||
parts = append(parts, "Mehrere Regime betroffen — Querbezug prüfen.")
|
|
||||||
}
|
|
||||||
return strings.Join(parts, " ")
|
|
||||||
}
|
|
||||||
|
|
||||||
// distinctNorms collapses results that share a citation (multiple chunks of the
|
|
||||||
// same article/annex) to the first — i.e. highest-ranked — occurrence. Results
|
|
||||||
// without any citation identity are each kept, since they cannot be matched.
|
|
||||||
func distinctNorms(results []LegalSearchResult) []LegalSearchResult {
|
|
||||||
seen := make(map[string]bool, len(results))
|
|
||||||
out := make([]LegalSearchResult, 0, len(results))
|
|
||||||
for _, r := range results {
|
|
||||||
key := r.CitationUnit
|
|
||||||
if key == "" {
|
|
||||||
key = r.ArticleLabel
|
|
||||||
}
|
|
||||||
if key != "" {
|
|
||||||
if seen[key] {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
seen[key] = true
|
|
||||||
}
|
|
||||||
out = append(out, r)
|
|
||||||
}
|
|
||||||
return out
|
|
||||||
}
|
|
||||||
|
|
||||||
// dedupStrings concatenates out+in, drops empties and the excluded value, and
|
|
||||||
// returns a stable de-duplicated slice (insertion order preserved).
|
|
||||||
func dedupStrings(out, in []string, exclude string) []string {
|
|
||||||
seen := map[string]bool{exclude: true}
|
|
||||||
res := make([]string, 0, len(out)+len(in))
|
|
||||||
for _, list := range [][]string{out, in} {
|
|
||||||
for _, s := range list {
|
|
||||||
if s == "" || seen[s] {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
seen[s] = true
|
|
||||||
res = append(res, s)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return res
|
|
||||||
}
|
|
||||||
@@ -1,112 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import "testing"
|
|
||||||
|
|
||||||
func ares(reg, cu, sc string, score float64, weight int, out, in []string) LegalSearchResult {
|
|
||||||
return LegalSearchResult{
|
|
||||||
RegulationShort: reg, CitationUnit: cu, SourceClass: sc, Score: score,
|
|
||||||
AuthorityWeight: weight, ReferencesOut: out, ReferencesIn: in,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestAssess_Empty(t *testing.T) {
|
|
||||||
if Assess(nil) != nil {
|
|
||||||
t.Error("empty results → nil assessment")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestAssess_BindingPrimary_NoReview(t *testing.T) {
|
|
||||||
results := []LegalSearchResult{
|
|
||||||
ares("CRA", "Art. 13 CRA", "binding_law", 1.05, 100,
|
|
||||||
[]string{"CRA Anhang I", "Art. 14 CRA"}, []string{"Art. 12 CRA"}),
|
|
||||||
ares("CRA", "Art. 14 CRA", "binding_law", 0.80, 100, nil, nil),
|
|
||||||
}
|
|
||||||
a := Assess(results)
|
|
||||||
if a == nil {
|
|
||||||
t.Fatal("nil assessment")
|
|
||||||
}
|
|
||||||
if a.PrimaryNorm != "Art. 13 CRA" || a.PrimaryRegulation != "CRA" {
|
|
||||||
t.Errorf("primary wrong: %+v", a)
|
|
||||||
}
|
|
||||||
if len(a.ConnectedNorms) != 3 { // out(2) + in(1), self excluded, deduped
|
|
||||||
t.Errorf("connected norms: %v", a.ConnectedNorms)
|
|
||||||
}
|
|
||||||
if a.CrossRegime {
|
|
||||||
t.Error("single regime must not be cross-regime")
|
|
||||||
}
|
|
||||||
if a.WinnerMargin < 0.24 || a.WinnerMargin > 0.26 {
|
|
||||||
t.Errorf("margin = %v, want ~0.25", a.WinnerMargin)
|
|
||||||
}
|
|
||||||
if a.HumanReviewFlag {
|
|
||||||
t.Error("clean binding + healthy margin + single regime → no review")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestAssess_CrossRegimeFlagsReview(t *testing.T) {
|
|
||||||
a := Assess([]LegalSearchResult{
|
|
||||||
ares("CRA", "Art. 13 CRA", "binding_law", 1.05, 100, nil, nil),
|
|
||||||
ares("DORA", "Art. 6 DORA", "binding_law", 0.70, 100, nil, nil),
|
|
||||||
})
|
|
||||||
if !a.CrossRegime || !a.HumanReviewFlag {
|
|
||||||
t.Errorf("cross-regime must flag review: %+v", a)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestAssess_NonBindingFlagsReview(t *testing.T) {
|
|
||||||
a := Assess([]LegalSearchResult{
|
|
||||||
ares("ENISA", "ENISA SBOM", "supervisory_guidance", 0.90, 70, nil, nil),
|
|
||||||
ares("ENISA", "ENISA X", "supervisory_guidance", 0.40, 70, nil, nil),
|
|
||||||
})
|
|
||||||
if !a.HumanReviewFlag {
|
|
||||||
t.Error("non-binding primary → review")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestAssess_TightMarginFlagsReview(t *testing.T) {
|
|
||||||
a := Assess([]LegalSearchResult{
|
|
||||||
ares("CRA", "Art. 13 CRA", "binding_law", 1.00, 100, nil, nil),
|
|
||||||
ares("CRA", "Art. 14 CRA", "binding_law", 0.98, 100, nil, nil),
|
|
||||||
})
|
|
||||||
if a.WinnerMargin >= 0.05 || !a.HumanReviewFlag {
|
|
||||||
t.Errorf("tight margin → review: %+v", a)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestAssess_MarginIsNormLevelNotChunkLevel(t *testing.T) {
|
|
||||||
// Two near-identical chunks of the SAME norm at the top, then a distinct norm.
|
|
||||||
results := []LegalSearchResult{
|
|
||||||
ares("CRA", "Art. 13 CRA", "binding_law", 1.050, 100, []string{"CRA Anhang I"}, nil),
|
|
||||||
ares("CRA", "Art. 13 CRA", "binding_law", 1.049, 100, nil, nil), // same norm
|
|
||||||
ares("CRA", "Art. 14 CRA", "binding_law", 0.800, 100, nil, nil),
|
|
||||||
}
|
|
||||||
a := Assess(results)
|
|
||||||
if a.WinnerMargin < 0.24 || a.WinnerMargin > 0.26 { // Art.13 vs Art.14, not chunk vs chunk
|
|
||||||
t.Errorf("margin must be norm-level (~0.25), got %v", a.WinnerMargin)
|
|
||||||
}
|
|
||||||
if a.HumanReviewFlag {
|
|
||||||
t.Error("healthy norm-level margin → no review")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestDistinctNorms(t *testing.T) {
|
|
||||||
got := distinctNorms([]LegalSearchResult{
|
|
||||||
{CitationUnit: "Art. 13 CRA"},
|
|
||||||
{CitationUnit: "Art. 13 CRA"}, // duplicate norm → collapsed
|
|
||||||
{CitationUnit: "Art. 14 CRA"},
|
|
||||||
{CitationUnit: ""}, // no identity → kept
|
|
||||||
{CitationUnit: ""}, // no identity → kept
|
|
||||||
})
|
|
||||||
if len(got) != 4 {
|
|
||||||
t.Errorf("want 4 (2 distinct + 2 unidentified), got %d", len(got))
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestDedupStrings(t *testing.T) {
|
|
||||||
got := dedupStrings([]string{"a", "b", "", "a"}, []string{"b", "c"}, "self")
|
|
||||||
if len(got) != 3 || got[0] != "a" || got[1] != "b" || got[2] != "c" {
|
|
||||||
t.Errorf("dedup: %v", got)
|
|
||||||
}
|
|
||||||
if len(dedupStrings([]string{"self"}, nil, "self")) != 0 {
|
|
||||||
t.Error("excluded value must be dropped")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -20,7 +20,6 @@ type LegalRAGClient struct {
|
|||||||
httpClient *http.Client
|
httpClient *http.Client
|
||||||
textIndexEnsured map[string]bool
|
textIndexEnsured map[string]bool
|
||||||
hybridEnabled bool
|
hybridEnabled bool
|
||||||
graphEnabled bool
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewLegalRAGClient creates a new Legal RAG client using Ollama bge-m3 embeddings.
|
// NewLegalRAGClient creates a new Legal RAG client using Ollama bge-m3 embeddings.
|
||||||
@@ -39,11 +38,6 @@ func NewLegalRAGClient() *LegalRAGClient {
|
|||||||
}
|
}
|
||||||
|
|
||||||
hybridEnabled := os.Getenv("RAG_HYBRID_SEARCH") != "false"
|
hybridEnabled := os.Getenv("RAG_HYBRID_SEARCH") != "false"
|
||||||
// Graph-Expansion ist OPT-IN: kein gemessener Rang-Nutzen ggue. der Binding-Augmentation,
|
|
||||||
// +1 Qdrant-Call/Suche, Flutungsrisiko ueber Reverse-Kanten. Bleibt als Recall-Sicherheitsnetz
|
|
||||||
// fuer spaetere Luecken (RAG_GRAPH_EXPANSION=true). Die Graph-Kanten werden in der Response
|
|
||||||
// zur Begruendung/Vollstaendigkeit genutzt, nicht zur Pool-Expansion (Default).
|
|
||||||
graphEnabled := os.Getenv("RAG_GRAPH_EXPANSION") == "true"
|
|
||||||
|
|
||||||
return &LegalRAGClient{
|
return &LegalRAGClient{
|
||||||
qdrantURL: qdrantURL,
|
qdrantURL: qdrantURL,
|
||||||
@@ -53,7 +47,6 @@ func NewLegalRAGClient() *LegalRAGClient {
|
|||||||
collection: "bp_compliance_ce",
|
collection: "bp_compliance_ce",
|
||||||
textIndexEnsured: make(map[string]bool),
|
textIndexEnsured: make(map[string]bool),
|
||||||
hybridEnabled: hybridEnabled,
|
hybridEnabled: hybridEnabled,
|
||||||
graphEnabled: graphEnabled,
|
|
||||||
httpClient: &http.Client{
|
httpClient: &http.Client{
|
||||||
Timeout: 60 * time.Second,
|
Timeout: 60 * time.Second,
|
||||||
},
|
},
|
||||||
@@ -100,29 +93,6 @@ func (c *LegalRAGClient) searchInternal(ctx context.Context, collection string,
|
|||||||
hits = denseHits
|
hits = denseHits
|
||||||
}
|
}
|
||||||
|
|
||||||
// Stratified: den binding_law-Pool ERGAENZEN (nicht ersetzen), damit die Pflichtquelle
|
|
||||||
// immer Kandidat ist — Guidance bleibt als Auslegungskontext erhalten. Best-effort:
|
|
||||||
// Fehler beim Binding-Query degradieren still auf den semantischen Pool.
|
|
||||||
if bindingHits, bErr := c.searchBinding(ctx, collection, embedding, topK); bErr == nil {
|
|
||||||
hits = mergeDedupHits(hits, bindingHits)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Control-Augmentation: bei expliziter Umsetzungsfrage einen tiefen dense-Pool ziehen und
|
|
||||||
// nur die Control-Pool-Rollen behalten — so werden NIST/CRA-Anhang (dense rank ~8-9, unter
|
|
||||||
// dem kleinen top-K) Kandidaten. Re-Rank/applyControlRoles ordnen sie danach.
|
|
||||||
if queryWantsControls(query) {
|
|
||||||
if controlHits, cErr := c.searchControls(ctx, collection, embedding); cErr == nil {
|
|
||||||
hits = mergeDedupHits(hits, controlHits)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Graph-Augmentation: verbundene Normen (references_out/in) der Top-Hits ueber die
|
|
||||||
// praezise Zitations-Kante in den Pool ziehen — z.B. Art. 13 CRA zieht Anhang I (die
|
|
||||||
// eigentliche Pflichtquelle). Pool-Augmentation only; Re-Rank + topK bleiben.
|
|
||||||
if c.graphEnabled {
|
|
||||||
hits = c.expandViaGraph(ctx, collection, hits)
|
|
||||||
}
|
|
||||||
|
|
||||||
results := make([]LegalSearchResult, len(hits))
|
results := make([]LegalSearchResult, len(hits))
|
||||||
for i, hit := range hits {
|
for i, hit := range hits {
|
||||||
// Legal-Metadaten nach rag_reingest_spec.md §2: bevorzugt die normalisierten Felder
|
// Legal-Metadaten nach rag_reingest_spec.md §2: bevorzugt die normalisierten Felder
|
||||||
@@ -151,54 +121,12 @@ func (c *LegalRAGClient) searchInternal(ctx context.Context, collection string,
|
|||||||
Pages: getIntSlice(hit.Payload, "pages"),
|
Pages: getIntSlice(hit.Payload, "pages"),
|
||||||
SourceURL: getString(hit.Payload, "source"),
|
SourceURL: getString(hit.Payload, "source"),
|
||||||
Score: hit.Score,
|
Score: hit.Score,
|
||||||
AuthorityWeight: getInt(hit.Payload, "authority_weight"),
|
|
||||||
SourceClass: getString(hit.Payload, "source_class"),
|
|
||||||
Jurisdiction: getString(hit.Payload, "jurisdiction"),
|
|
||||||
CitationUnit: getString(hit.Payload, "citation_unit"),
|
|
||||||
ReferencesOut: getStringSlice(hit.Payload, "references_out"),
|
|
||||||
ReferencesIn: getStringSlice(hit.Payload, "references_in"),
|
|
||||||
Superseded: getString(hit.Payload, "status") == "superseded",
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Authority-aware Re-Ranking: bindendes Recht der passenden Jurisdiktion/Domaene nach
|
|
||||||
// oben, Guidance/Fremdrecht/Off-Domain runter (nichts wird geloescht). Reihenfolge only,
|
|
||||||
// Response-Schema unveraendert. Score traegt den Authority-Score, damit nachgelagerte
|
|
||||||
// Multi-Collection-Merges (Advisor) die Ordnung bewahren.
|
|
||||||
results = rerankByAuthority(query, results)
|
|
||||||
|
|
||||||
// Control-Diversity: auf einer Umsetzungsfrage darf impl_guidance (ENISA) Top-1 bleiben,
|
|
||||||
// aber die Top-K soll mindestens eine binding operational_requirement (CRA Anhang I) und
|
|
||||||
// einen control_standard (NIST/ISO) zeigen, falls im Pool — Quellenarten sichtbar machen
|
|
||||||
// statt sie kuenstlich auf Top-1 zu heben. Nur Reihenfolge, vor der Truncation.
|
|
||||||
if queryWantsControls(query) {
|
|
||||||
results = ensureControlDiversity(results, topK)
|
|
||||||
}
|
|
||||||
|
|
||||||
if topK > 0 && len(results) > topK {
|
|
||||||
results = results[:topK]
|
|
||||||
}
|
|
||||||
|
|
||||||
return results, nil
|
return results, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// mergeDedupHits concatenates two hit lists, keeping the first occurrence of each point ID.
|
|
||||||
func mergeDedupHits(primary, extra []qdrantSearchHit) []qdrantSearchHit {
|
|
||||||
seen := make(map[string]bool, len(primary)+len(extra))
|
|
||||||
out := make([]qdrantSearchHit, 0, len(primary)+len(extra))
|
|
||||||
for _, list := range [][]qdrantSearchHit{primary, extra} {
|
|
||||||
for _, h := range list {
|
|
||||||
id := fmt.Sprint(h.ID)
|
|
||||||
if seen[id] {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
seen[id] = true
|
|
||||||
out = append(out, h)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return out
|
|
||||||
}
|
|
||||||
|
|
||||||
// FormatLegalContextForPrompt formats the legal context for inclusion in an LLM prompt.
|
// FormatLegalContextForPrompt formats the legal context for inclusion in an LLM prompt.
|
||||||
func (c *LegalRAGClient) FormatLegalContextForPrompt(lc *LegalContext) string {
|
func (c *LegalRAGClient) FormatLegalContextForPrompt(lc *LegalContext) string {
|
||||||
if lc == nil || len(lc.Results) == 0 {
|
if lc == nil || len(lc.Results) == 0 {
|
||||||
|
|||||||
@@ -1,162 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import (
|
|
||||||
"bytes"
|
|
||||||
"context"
|
|
||||||
"encoding/json"
|
|
||||||
"fmt"
|
|
||||||
"io"
|
|
||||||
"net/http"
|
|
||||||
"sort"
|
|
||||||
)
|
|
||||||
|
|
||||||
// Graph-augmented retrieval: when a top hit cites an annex/article (references_out)
|
|
||||||
// or is cited by one (references_in), pull that connected norm into the candidate
|
|
||||||
// pool via the PRECISE citation graph instead of hoping semantic search surfaces
|
|
||||||
// it. E.g. a hit on CRA Art. 13 pulls in CRA Anhang I (the actual requirement).
|
|
||||||
// Pool-augmentation only — authority re-rank + topK slice still apply, so the
|
|
||||||
// response schema is unchanged.
|
|
||||||
const (
|
|
||||||
graphSeedCount = 5 // only the top hits seed the expansion
|
|
||||||
graphMaxExpand = 15 // cap connected norms pulled in (avoid pool explosion)
|
|
||||||
graphHopPenalty = 0.05 // a one-hop neighbour ranks just below its seed
|
|
||||||
)
|
|
||||||
|
|
||||||
// expandViaGraph augments hits with the norms they cite and the norms that cite
|
|
||||||
// them. Best-effort: on any error (or nothing to expand) the original hits are
|
|
||||||
// returned unchanged.
|
|
||||||
func (c *LegalRAGClient) expandViaGraph(ctx context.Context, collection string, hits []qdrantSearchHit) []qdrantSearchHit {
|
|
||||||
if len(hits) == 0 {
|
|
||||||
return hits
|
|
||||||
}
|
|
||||||
present := make(map[string]bool, len(hits))
|
|
||||||
for _, h := range hits {
|
|
||||||
if cu := getString(h.Payload, "citation_unit"); cu != "" {
|
|
||||||
present[cu] = true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
seeds := hits
|
|
||||||
if len(seeds) > graphSeedCount {
|
|
||||||
seeds = seeds[:graphSeedCount]
|
|
||||||
}
|
|
||||||
// Forward edges only (references_out = the detail a hit explicitly points to,
|
|
||||||
// e.g. Art. 13 → Anhang I). Reverse (references_in) has high fan-out for popular
|
|
||||||
// annexes (Anhang I is cited by 23 articles) → pool flooding; it is surfaced as
|
|
||||||
// connected-norm metadata in the Phase 2 response instead of expanding the pool.
|
|
||||||
want := make(map[string]float64) // connected citation_unit -> best seeding score
|
|
||||||
for _, h := range seeds {
|
|
||||||
for _, cu := range getStringSlice(h.Payload, "references_out") {
|
|
||||||
if cu == "" || present[cu] {
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
if s, ok := want[cu]; !ok || h.Score > s {
|
|
||||||
want[cu] = h.Score
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if len(want) == 0 {
|
|
||||||
return hits
|
|
||||||
}
|
|
||||||
|
|
||||||
units := topByScore(want, graphMaxExpand)
|
|
||||||
fetched, err := c.fetchByCitationUnits(ctx, collection, units)
|
|
||||||
if err != nil || len(fetched) == 0 {
|
|
||||||
return hits
|
|
||||||
}
|
|
||||||
neighbours := make([]qdrantSearchHit, 0, len(fetched))
|
|
||||||
for cu, pt := range fetched {
|
|
||||||
neighbours = append(neighbours, qdrantSearchHit{ID: pt.ID, Score: want[cu] - graphHopPenalty, Payload: pt.Payload})
|
|
||||||
}
|
|
||||||
return mergeDedupHits(hits, neighbours)
|
|
||||||
}
|
|
||||||
|
|
||||||
// topByScore returns up to n keys with the highest values. Deterministic: ties
|
|
||||||
// broken by the key string so the cap is stable across runs.
|
|
||||||
func topByScore(m map[string]float64, n int) []string {
|
|
||||||
keys := make([]string, 0, len(m))
|
|
||||||
for k := range m {
|
|
||||||
keys = append(keys, k)
|
|
||||||
}
|
|
||||||
sort.Slice(keys, func(i, j int) bool {
|
|
||||||
if m[keys[i]] != m[keys[j]] {
|
|
||||||
return m[keys[i]] > m[keys[j]]
|
|
||||||
}
|
|
||||||
return keys[i] < keys[j]
|
|
||||||
})
|
|
||||||
if len(keys) > n {
|
|
||||||
keys = keys[:n]
|
|
||||||
}
|
|
||||||
return keys
|
|
||||||
}
|
|
||||||
|
|
||||||
// fetchByCitationUnits loads one representative point (the first chunk) per
|
|
||||||
// citation_unit from the given collection.
|
|
||||||
func (c *LegalRAGClient) fetchByCitationUnits(ctx context.Context, collection string, units []string) (map[string]qdrantScrollPoint, error) {
|
|
||||||
should := make([]map[string]interface{}, 0, len(units))
|
|
||||||
for _, cu := range units {
|
|
||||||
should = append(should, map[string]interface{}{"key": "citation_unit", "match": map[string]interface{}{"value": cu}})
|
|
||||||
}
|
|
||||||
reqBody := map[string]interface{}{
|
|
||||||
"limit": len(units) * 4,
|
|
||||||
"with_payload": true,
|
|
||||||
"with_vectors": false,
|
|
||||||
"filter": map[string]interface{}{"should": should},
|
|
||||||
}
|
|
||||||
jsonBody, err := json.Marshal(reqBody)
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
url := fmt.Sprintf("%s/collections/%s/points/scroll", c.qdrantURL, collection)
|
|
||||||
req, err := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(jsonBody))
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
req.Header.Set("Content-Type", "application/json")
|
|
||||||
if c.qdrantAPIKey != "" {
|
|
||||||
req.Header.Set("api-key", c.qdrantAPIKey)
|
|
||||||
}
|
|
||||||
resp, err := c.httpClient.Do(req)
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
defer func() { _ = resp.Body.Close() }()
|
|
||||||
if resp.StatusCode != http.StatusOK {
|
|
||||||
body, _ := io.ReadAll(resp.Body)
|
|
||||||
return nil, fmt.Errorf("qdrant scroll returned %d: %s", resp.StatusCode, string(body))
|
|
||||||
}
|
|
||||||
var scrollResp qdrantScrollResponse
|
|
||||||
if err := json.NewDecoder(resp.Body).Decode(&scrollResp); err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
out := make(map[string]qdrantScrollPoint, len(units))
|
|
||||||
for _, pt := range scrollResp.Result.Points {
|
|
||||||
cu := getString(pt.Payload, "citation_unit")
|
|
||||||
if cu != "" {
|
|
||||||
if _, seen := out[cu]; !seen {
|
|
||||||
out[cu] = pt
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return out, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// getStringSlice extracts a []string from a Qdrant payload list field
|
|
||||||
// (references_out / references_in are stored as JSON arrays of strings).
|
|
||||||
func getStringSlice(m map[string]interface{}, key string) []string {
|
|
||||||
v, ok := m[key]
|
|
||||||
if !ok {
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
arr, ok := v.([]interface{})
|
|
||||||
if !ok {
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
out := make([]string, 0, len(arr))
|
|
||||||
for _, item := range arr {
|
|
||||||
if s, ok := item.(string); ok {
|
|
||||||
out = append(out, s)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return out
|
|
||||||
}
|
|
||||||
@@ -1,89 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import (
|
|
||||||
"context"
|
|
||||||
"encoding/json"
|
|
||||||
"net/http"
|
|
||||||
"net/http/httptest"
|
|
||||||
"testing"
|
|
||||||
)
|
|
||||||
|
|
||||||
func TestGetStringSlice(t *testing.T) {
|
|
||||||
m := map[string]interface{}{
|
|
||||||
"refs": []interface{}{"a", "b", 3, "c"}, // non-strings are skipped
|
|
||||||
"str": "not-a-list",
|
|
||||||
}
|
|
||||||
got := getStringSlice(m, "refs")
|
|
||||||
if len(got) != 3 || got[0] != "a" || got[2] != "c" {
|
|
||||||
t.Errorf("refs: %v", got)
|
|
||||||
}
|
|
||||||
if getStringSlice(m, "missing") != nil {
|
|
||||||
t.Error("missing key should be nil")
|
|
||||||
}
|
|
||||||
if getStringSlice(m, "str") != nil {
|
|
||||||
t.Error("non-list should be nil")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestTopByScore_DeterministicCap(t *testing.T) {
|
|
||||||
m := map[string]float64{"x": 0.5, "y": 0.9, "z": 0.5, "w": 0.7}
|
|
||||||
got := topByScore(m, 2)
|
|
||||||
if len(got) != 2 || got[0] != "y" || got[1] != "w" {
|
|
||||||
t.Errorf("want [y w], got %v", got)
|
|
||||||
}
|
|
||||||
all := topByScore(m, 10)
|
|
||||||
if all[2] != "x" || all[3] != "z" { // tie 0.5 broken by key string
|
|
||||||
t.Errorf("tie-break not deterministic: %v", all)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestExpandViaGraph_NoSeedsOrRefs(t *testing.T) {
|
|
||||||
c := &LegalRAGClient{} // nil httpClient → must not be called on these paths
|
|
||||||
if out := c.expandViaGraph(context.Background(), "x", nil); out != nil {
|
|
||||||
t.Error("empty hits should return nil")
|
|
||||||
}
|
|
||||||
hits := []qdrantSearchHit{{ID: 1, Score: 0.8, Payload: map[string]interface{}{"citation_unit": "Art. 1 CRA"}}}
|
|
||||||
if out := c.expandViaGraph(context.Background(), "x", hits); len(out) != 1 {
|
|
||||||
t.Errorf("no references → unchanged, got %d", len(out))
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestExpandViaGraph_PullsConnectedNorm(t *testing.T) {
|
|
||||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
|
|
||||||
_ = json.NewEncoder(w).Encode(map[string]interface{}{
|
|
||||||
"result": map[string]interface{}{
|
|
||||||
"points": []map[string]interface{}{
|
|
||||||
{"id": 99, "payload": map[string]interface{}{
|
|
||||||
"citation_unit": "CRA Anhang I", "chunk_text": "Sicherheitsanforderungen",
|
|
||||||
"source_class": "binding_law", "authority_weight": 100, "regulation_short": "CRA",
|
|
||||||
}},
|
|
||||||
},
|
|
||||||
"next_page_offset": nil,
|
|
||||||
},
|
|
||||||
})
|
|
||||||
}))
|
|
||||||
defer srv.Close()
|
|
||||||
|
|
||||||
c := &LegalRAGClient{qdrantURL: srv.URL, httpClient: srv.Client()}
|
|
||||||
hits := []qdrantSearchHit{
|
|
||||||
{ID: 1, Score: 0.70, Payload: map[string]interface{}{
|
|
||||||
"citation_unit": "Art. 13 CRA", "references_out": []interface{}{"CRA Anhang I"},
|
|
||||||
}},
|
|
||||||
}
|
|
||||||
out := c.expandViaGraph(context.Background(), "bp_compliance_ce", hits)
|
|
||||||
if len(out) != 2 {
|
|
||||||
t.Fatalf("want 2 hits (seed + connected annex), got %d", len(out))
|
|
||||||
}
|
|
||||||
var found *qdrantSearchHit
|
|
||||||
for i := range out {
|
|
||||||
if getString(out[i].Payload, "citation_unit") == "CRA Anhang I" {
|
|
||||||
found = &out[i]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if found == nil {
|
|
||||||
t.Fatal("connected norm CRA Anhang I was not pulled into the pool")
|
|
||||||
}
|
|
||||||
if found.Score < 0.64 || found.Score > 0.66 { // 0.70 seed − 0.05 hop penalty
|
|
||||||
t.Errorf("connected score = %v, want ~0.65", found.Score)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -185,55 +185,6 @@ func (c *LegalRAGClient) searchDense(ctx context.Context, collection string, emb
|
|||||||
searchReq.Filter = &qdrantFilter{Should: conditions}
|
searchReq.Filter = &qdrantFilter{Should: conditions}
|
||||||
}
|
}
|
||||||
|
|
||||||
return c.doPointsSearch(ctx, collection, searchReq)
|
|
||||||
}
|
|
||||||
|
|
||||||
// searchBinding fetches the top binding_law hits (authority-stratified pool) so the
|
|
||||||
// obligation source is always a candidate even when guidance dominates semantically.
|
|
||||||
// It AUGMENTS the semantic pool — guidance is preserved as interpretation context.
|
|
||||||
func (c *LegalRAGClient) searchBinding(ctx context.Context, collection string, embedding []float64, topK int) ([]qdrantSearchHit, error) {
|
|
||||||
searchReq := qdrantSearchRequest{
|
|
||||||
Vector: embedding,
|
|
||||||
Limit: topK,
|
|
||||||
WithPayload: true,
|
|
||||||
Filter: &qdrantFilter{Must: []qdrantCondition{
|
|
||||||
{Key: "source_class", Match: qdrantMatch{Value: "binding_law"}},
|
|
||||||
}},
|
|
||||||
}
|
|
||||||
|
|
||||||
return c.doPointsSearch(ctx, collection, searchReq)
|
|
||||||
}
|
|
||||||
|
|
||||||
// controlPoolDepth is how deep the dense control pull reaches. Measured: for an EU-cyber
|
|
||||||
// control query the relevant control sources sit at dense rank ~8-9 (NIST, CRA Annex), far
|
|
||||||
// below the client's small top-K — so a fixed dense depth of 60 reliably surfaces them.
|
|
||||||
const controlPoolDepth = 60
|
|
||||||
|
|
||||||
// searchControls fetches a DEEP dense pool and keeps only the control-pool roles, so control
|
|
||||||
// sources that the small top-K (hybrid) search misses become candidates on an implementation
|
|
||||||
// question. Role is derived in code (no source_role tag needed). AUGMENTS the pool — the
|
|
||||||
// caller gates it on control-intent.
|
|
||||||
func (c *LegalRAGClient) searchControls(ctx context.Context, collection string, embedding []float64) ([]qdrantSearchHit, error) {
|
|
||||||
searchReq := qdrantSearchRequest{
|
|
||||||
Vector: embedding,
|
|
||||||
Limit: controlPoolDepth,
|
|
||||||
WithPayload: true,
|
|
||||||
}
|
|
||||||
hits, err := c.doPointsSearch(ctx, collection, searchReq)
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
kept := make([]qdrantSearchHit, 0, len(hits))
|
|
||||||
for _, h := range hits {
|
|
||||||
if isControlPoolRole(controlRoleOf(h.Payload)) {
|
|
||||||
kept = append(kept, h)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return kept, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// doPointsSearch issues a POST /points/search and decodes the hits.
|
|
||||||
func (c *LegalRAGClient) doPointsSearch(ctx context.Context, collection string, searchReq qdrantSearchRequest) ([]qdrantSearchHit, error) {
|
|
||||||
jsonBody, err := json.Marshal(searchReq)
|
jsonBody, err := json.Marshal(searchReq)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("failed to marshal search request: %w", err)
|
return nil, fmt.Errorf("failed to marshal search request: %w", err)
|
||||||
|
|||||||
@@ -1,135 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import "testing"
|
|
||||||
|
|
||||||
func intentRes(reg, sourceClass string, sem float64, weight int) LegalSearchResult {
|
|
||||||
return LegalSearchResult{
|
|
||||||
RegulationShort: reg, SourceClass: sourceClass, Score: sem,
|
|
||||||
AuthorityWeight: weight, Jurisdiction: "EU",
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestQueryWantsGuidance(t *testing.T) {
|
|
||||||
wants := []string{
|
|
||||||
"Was empfiehlt der EDPB zum DSB?",
|
|
||||||
"Was sagt die ENISA zu Security Updates?",
|
|
||||||
"laut DSK ...",
|
|
||||||
"Orientierungshilfe zur DSFA",
|
|
||||||
"Welche BSI-Empfehlung gilt?",
|
|
||||||
"Auslegung der Aufsichtsbehörde",
|
|
||||||
}
|
|
||||||
plain := []string{
|
|
||||||
"Ab wann braucht man einen Datenschutzbeauftragten?",
|
|
||||||
"Welche Anforderungen bestehen an Security Updates?",
|
|
||||||
}
|
|
||||||
for _, q := range wants {
|
|
||||||
if !queryWantsGuidance(q) {
|
|
||||||
t.Errorf("should detect interpretation intent: %q", q)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
for _, q := range plain {
|
|
||||||
if queryWantsGuidance(q) {
|
|
||||||
t.Errorf("should NOT detect intent (norm question): %q", q)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRerank_NormQuestion_BindingStaysTop(t *testing.T) {
|
|
||||||
// No intent signal → binding wins even though guidance is semantically higher.
|
|
||||||
results := []LegalSearchResult{
|
|
||||||
intentRes("EDPB DPO", "supervisory_guidance", 0.64, 70),
|
|
||||||
intentRes("DSGVO", "binding_law", 0.58, 100),
|
|
||||||
}
|
|
||||||
out := rerankByAuthority("Ab wann braucht man einen Datenschutzbeauftragten?", results)
|
|
||||||
if out[0].SourceClass != "binding_law" {
|
|
||||||
t.Errorf("norm question: binding must stay Top-1, got %s", out[0].SourceClass)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRerank_InterpretationQuestion_GuidanceMayWin(t *testing.T) {
|
|
||||||
// Explicit intent + guidance semantically competitive → guidance wins.
|
|
||||||
results := []LegalSearchResult{
|
|
||||||
intentRes("EDPB DPO", "supervisory_guidance", 0.64, 70),
|
|
||||||
intentRes("DSGVO", "binding_law", 0.58, 100),
|
|
||||||
}
|
|
||||||
out := rerankByAuthority("Was empfiehlt der EDPB zum Datenschutzbeauftragten?", results)
|
|
||||||
if out[0].SourceClass != "supervisory_guidance" {
|
|
||||||
t.Errorf("interpretation question: guidance should win Top-1, got %s", out[0].SourceClass)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRerank_OffTopicGuidance_BlockedByGuard(t *testing.T) {
|
|
||||||
// Intent present, but guidance semantic is far below the best binding hit →
|
|
||||||
// the margin guard keeps binding on top (no off-topic guideline override).
|
|
||||||
results := []LegalSearchResult{
|
|
||||||
intentRes("EDPB DPO", "supervisory_guidance", 0.40, 70),
|
|
||||||
intentRes("DSGVO", "binding_law", 0.58, 100),
|
|
||||||
}
|
|
||||||
out := rerankByAuthority("Was empfiehlt der EDPB zum Datenschutzbeauftragten?", results)
|
|
||||||
if out[0].SourceClass != "binding_law" {
|
|
||||||
t.Errorf("off-topic guidance must not win even with intent, got %s", out[0].SourceClass)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestQueryWantsControls(t *testing.T) {
|
|
||||||
wants := []string{
|
|
||||||
"Welche Controls passen zu Security Updates?",
|
|
||||||
"Welche Maßnahmen sollten wir umsetzen?",
|
|
||||||
"Wie härten wir den Server ab?",
|
|
||||||
"Gibt es NIST-Controls dafür?",
|
|
||||||
"OWASP Best Practice für Logging?",
|
|
||||||
"BSI Grundschutz Bausteine",
|
|
||||||
}
|
|
||||||
plain := []string{
|
|
||||||
"Welche Anforderungen bestehen an Security Updates?",
|
|
||||||
"Ab wann braucht man einen Datenschutzbeauftragten?",
|
|
||||||
}
|
|
||||||
for _, q := range wants {
|
|
||||||
if !queryWantsControls(q) {
|
|
||||||
t.Errorf("should detect control/implementation intent: %q", q)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
for _, q := range plain {
|
|
||||||
if queryWantsControls(q) {
|
|
||||||
t.Errorf("should NOT detect control intent (norm question): %q", q)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRerank_ControlQuestion_OperationalReqTop(t *testing.T) {
|
|
||||||
// User priority for implementation questions: operational_requirement (binding concrete,
|
|
||||||
// CRA Anhang I) > control_standard (NIST). Both are in the control-pool; op_req wins.
|
|
||||||
results := []LegalSearchResult{
|
|
||||||
{RegulationShort: "NIST SP 800-82r3", ArticleLabel: "AU-8", SourceClass: "technical_standard", AuthorityWeight: 80, Jurisdiction: "EU", Score: 0.60},
|
|
||||||
{RegulationShort: "CRA", ArticleLabel: "CRA Anhang I", Category: "regulation", Score: 0.58},
|
|
||||||
}
|
|
||||||
out := rerankByAuthority("Welche Controls und Massnahmen passen zu Security Updates?", results)
|
|
||||||
if out[0].RegulationShort != "CRA" {
|
|
||||||
t.Errorf("operational_requirement (CRA Anhang I) should be Top-1 over control_standard, got %q", out[0].RegulationShort)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRerank_NormQuestion_BindingOverStandard(t *testing.T) {
|
|
||||||
// "Anforderungen" → no control intent → binding obligation stays Top-1 over the standard.
|
|
||||||
results := []LegalSearchResult{
|
|
||||||
intentRes("NIST SP 800-82", "technical_standard", 0.62, 80),
|
|
||||||
intentRes("CRA", "binding_law", 0.58, 100),
|
|
||||||
}
|
|
||||||
out := rerankByAuthority("Welche Anforderungen bestehen an Security Updates?", results)
|
|
||||||
if out[0].SourceClass != "binding_law" {
|
|
||||||
t.Errorf("norm question: binding must stay Top-1 over standard, got %s", out[0].SourceClass)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRerank_ControlQuestion_PoolBeatsBareObligation(t *testing.T) {
|
|
||||||
// A control-pool source (NIST control_standard) outranks an abstract obligation with no
|
|
||||||
// domain/topic advantage, because the implementation intent boosts the control-pool.
|
|
||||||
results := []LegalSearchResult{
|
|
||||||
{RegulationShort: "NIST SP 800-82r3", ArticleLabel: "AU-8", SourceClass: "technical_standard", AuthorityWeight: 80, Jurisdiction: "EU", Score: 0.55},
|
|
||||||
{RegulationShort: "XYZ", ArticleLabel: "Art. 5 XYZ", Category: "regulation", Score: 0.58},
|
|
||||||
}
|
|
||||||
out := rerankByAuthority("Welche Controls und Massnahmen passen zu Security Updates?", results)
|
|
||||||
if out[0].RegulationShort != "NIST SP 800-82r3" {
|
|
||||||
t.Errorf("control_standard should beat a bare abstract obligation on a control question, got %q", out[0].RegulationShort)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -225,18 +225,6 @@ func getIntSlice(m map[string]interface{}, key string) []int {
|
|||||||
return result
|
return result
|
||||||
}
|
}
|
||||||
|
|
||||||
func getInt(m map[string]interface{}, key string) int {
|
|
||||||
if v, ok := m[key]; ok {
|
|
||||||
switch n := v.(type) {
|
|
||||||
case float64:
|
|
||||||
return int(n)
|
|
||||||
case int:
|
|
||||||
return n
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return 0
|
|
||||||
}
|
|
||||||
|
|
||||||
func contains(slice []string, item string) bool {
|
func contains(slice []string, item string) bool {
|
||||||
for _, s := range slice {
|
for _, s := range slice {
|
||||||
if s == item {
|
if s == item {
|
||||||
|
|||||||
@@ -1,30 +0,0 @@
|
|||||||
package ucca
|
|
||||||
|
|
||||||
import "testing"
|
|
||||||
|
|
||||||
// A superseded alt-source must rank below the same result when it is NOT
|
|
||||||
// superseded (the eu-v1 norm), but only demoted — the penalty is finite, so it
|
|
||||||
// stays in the pool and remains findable for history/transition questions.
|
|
||||||
func TestAuthorityScore_SupersededIsDemotedNotRemoved(t *testing.T) {
|
|
||||||
fresh := LegalSearchResult{
|
|
||||||
Score: 0.65, SourceClass: "binding_law", AuthorityWeight: 100,
|
|
||||||
Jurisdiction: "EU", RegulationShort: "CRA", Article: "13",
|
|
||||||
}
|
|
||||||
old := fresh
|
|
||||||
old.Superseded = true
|
|
||||||
|
|
||||||
sFresh := authorityScore("CRA Sicherheitsupdates Hersteller", fresh, "", false)
|
|
||||||
sOld := authorityScore("CRA Sicherheitsupdates Hersteller", old, "", false)
|
|
||||||
|
|
||||||
if sOld >= sFresh {
|
|
||||||
t.Errorf("superseded must score lower: fresh=%.3f superseded=%.3f", sFresh, sOld)
|
|
||||||
}
|
|
||||||
gap := sFresh - sOld
|
|
||||||
if gap < supersededPenalty-0.001 || gap > supersededPenalty+0.001 {
|
|
||||||
t.Errorf("demotion should equal supersededPenalty (%.2f), got %.3f", supersededPenalty, gap)
|
|
||||||
}
|
|
||||||
// Still a positive, finite score → present in the pool, not hidden.
|
|
||||||
if sOld <= -1 {
|
|
||||||
t.Errorf("superseded score collapsed (%.3f) — must remain findable", sOld)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -399,9 +399,8 @@ func TestHybridSearch_UsesQueryAPI(t *testing.T) {
|
|||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
// /points/search is now the stratified binding-law augmentation query (it AUGMENTS
|
// Fallback: should not reach dense search
|
||||||
// the hybrid pool, it is not a dense fallback). Return empty so the hybrid hit
|
t.Error("Unexpected dense search call when hybrid succeeded")
|
||||||
// remains the sole result for this test.
|
|
||||||
json.NewEncoder(w).Encode(qdrantSearchResponse{Result: []qdrantSearchHit{}})
|
json.NewEncoder(w).Encode(qdrantSearchResponse{Result: []qdrantSearchHit{}})
|
||||||
}))
|
}))
|
||||||
defer qdrantMock.Close()
|
defer qdrantMock.Close()
|
||||||
@@ -447,59 +446,6 @@ func TestHybridSearch_UsesQueryAPI(t *testing.T) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// TestSearch_StratifiedBindingRerank verifies that the binding-law pool augments the
|
|
||||||
// semantic pool and that authority re-ranking lifts binding law above higher-semantic guidance.
|
|
||||||
func TestSearch_StratifiedBindingRerank(t *testing.T) {
|
|
||||||
ollamaMock := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
|
||||||
json.NewEncoder(w).Encode(ollamaEmbeddingResponse{Embedding: make([]float64, 1024)})
|
|
||||||
}))
|
|
||||||
defer ollamaMock.Close()
|
|
||||||
|
|
||||||
qdrantMock := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
|
||||||
if strings.Contains(r.URL.Path, "/index") {
|
|
||||||
w.WriteHeader(http.StatusOK)
|
|
||||||
w.Write([]byte(`{"result":{"status":"completed"}}`))
|
|
||||||
return
|
|
||||||
}
|
|
||||||
if strings.Contains(r.URL.Path, "/points/query") {
|
|
||||||
json.NewEncoder(w).Encode(qdrantQueryResponse{Result: []qdrantSearchHit{
|
|
||||||
{ID: "g1", Score: 0.72, Payload: map[string]interface{}{
|
|
||||||
"chunk_text": "ENISA guidance", "regulation_short": "ENISA",
|
|
||||||
"article_label": "ENISA CRA Mapping", "source_class": "supervisory_guidance",
|
|
||||||
"authority_weight": float64(70), "jurisdiction": "EU",
|
|
||||||
}},
|
|
||||||
}})
|
|
||||||
return
|
|
||||||
}
|
|
||||||
// /points/search = stratified binding-law pool (source_class=binding_law)
|
|
||||||
json.NewEncoder(w).Encode(qdrantSearchResponse{Result: []qdrantSearchHit{
|
|
||||||
{ID: "b1", Score: 0.66, Payload: map[string]interface{}{
|
|
||||||
"chunk_text": "CRA Anhang I requirement", "regulation_short": "CRA",
|
|
||||||
"article_label": "CRA Anhang I", "source_class": "binding_law",
|
|
||||||
"authority_weight": float64(100), "jurisdiction": "EU",
|
|
||||||
}},
|
|
||||||
}})
|
|
||||||
}))
|
|
||||||
defer qdrantMock.Close()
|
|
||||||
|
|
||||||
client := &LegalRAGClient{
|
|
||||||
qdrantURL: qdrantMock.URL, ollamaURL: ollamaMock.URL, embeddingModel: "bge-m3",
|
|
||||||
collection: "bp_compliance_ce", textIndexEnsured: make(map[string]bool),
|
|
||||||
hybridEnabled: true, httpClient: http.DefaultClient,
|
|
||||||
}
|
|
||||||
|
|
||||||
results, err := client.Search(context.Background(), "Was gilt hier?", nil, 5)
|
|
||||||
if err != nil {
|
|
||||||
t.Fatalf("search failed: %v", err)
|
|
||||||
}
|
|
||||||
if len(results) != 2 {
|
|
||||||
t.Fatalf("expected 2 merged results (guidance + binding), got %d", len(results))
|
|
||||||
}
|
|
||||||
if results[0].RegulationShort != "CRA" {
|
|
||||||
t.Errorf("binding CRA must rank first over higher-semantic guidance, got %q", results[0].RegulationShort)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestHybridSearch_FallbackToDense(t *testing.T) {
|
func TestHybridSearch_FallbackToDense(t *testing.T) {
|
||||||
var requestedPaths []string
|
var requestedPaths []string
|
||||||
|
|
||||||
|
|||||||
@@ -20,38 +20,6 @@ type LegalSearchResult struct {
|
|||||||
Pages []int `json:"pages,omitempty"`
|
Pages []int `json:"pages,omitempty"`
|
||||||
SourceURL string `json:"source_url"`
|
SourceURL string `json:"source_url"`
|
||||||
Score float64 `json:"score"`
|
Score float64 `json:"score"`
|
||||||
|
|
||||||
// Interne Felder fuer das Authority-Re-Ranking (Phase 1) — NICHT serialisiert
|
|
||||||
// (json:"-"), daher kein Contract-Change. Aus dem Qdrant-Payload befuellt und nur
|
|
||||||
// fuer die Sortierung in rerankByAuthority verwendet.
|
|
||||||
AuthorityWeight int `json:"-"`
|
|
||||||
SourceClass string `json:"-"`
|
|
||||||
Jurisdiction string `json:"-"`
|
|
||||||
|
|
||||||
// Zitations-Graph (Phase 2) — intern, speist nur die Assessment-Berechnung
|
|
||||||
// (verbundene Normen, Begruendung). Pro-Result-Schema bleibt eingefroren.
|
|
||||||
CitationUnit string `json:"-"`
|
|
||||||
ReferencesOut []string `json:"-"`
|
|
||||||
ReferencesIn []string `json:"-"`
|
|
||||||
|
|
||||||
// Supersede-Status (status="superseded", use_for_primary=false) — Alt-Quelle,
|
|
||||||
// die fuer Default-Fragen demoted wird (nicht versteckt; fuer Historie auffindbar).
|
|
||||||
Superseded bool `json:"-"`
|
|
||||||
}
|
|
||||||
|
|
||||||
// LegalAssessment is the auditable explanation layer over a ranked result set:
|
|
||||||
// which norm is primary, which norms connect to it via the citation graph,
|
|
||||||
// whether the answer crosses regulatory regimes, and whether a human should
|
|
||||||
// review. Computed from the already-ranked results — it EXPLAINS retrieval, it
|
|
||||||
// does not change it (graph edges for reasoning/completeness, not pool-expansion).
|
|
||||||
type LegalAssessment struct {
|
|
||||||
PrimaryNorm string `json:"primary_norm"`
|
|
||||||
PrimaryRegulation string `json:"primary_regulation"`
|
|
||||||
ConnectedNorms []string `json:"connected_norms"`
|
|
||||||
CrossRegime bool `json:"cross_regime"`
|
|
||||||
HumanReviewFlag bool `json:"human_review_flag"`
|
|
||||||
WinnerMargin float64 `json:"winner_margin"`
|
|
||||||
ScoreReasoning string `json:"score_reasoning"`
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// LegalContext represents aggregated legal context for an assessment.
|
// LegalContext represents aggregated legal context for an assessment.
|
||||||
|
|||||||
@@ -45,11 +45,6 @@ class LLMChecker:
|
|||||||
text = doc.text or ""
|
text = doc.text or ""
|
||||||
if len(text) < 50:
|
if len(text) < 50:
|
||||||
return CheckResult(present=None, source="llm")
|
return CheckResult(present=None, source="llm")
|
||||||
# decision_method=LLM mit judge='haiku': Sufficiency-Pfad (validiert
|
|
||||||
# P0.89/R0.91). Der Qwen-first-Cascade ist als Sufficiency-Judge
|
|
||||||
# widerlegt -> hier Haiku direkt, kriteriengeführte Subsumtion.
|
|
||||||
if (ctrl.extra or {}).get("judge") == "haiku":
|
|
||||||
return await self._haiku(ctrl, text)
|
|
||||||
secs = _sections(text)
|
secs = _sections(text)
|
||||||
if ctrl.topic_regex:
|
if ctrl.topic_regex:
|
||||||
rel = [s for s in secs if re.search(ctrl.topic_regex, s, re.I)][:6] or secs[:6]
|
rel = [s for s in secs if re.search(ctrl.topic_regex, s, re.I)][:6] or secs[:6]
|
||||||
@@ -76,31 +71,3 @@ class LLMChecker:
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.info("llm checker fail %s: %s", ctrl.control_id, str(e)[:80])
|
logger.info("llm checker fail %s: %s", ctrl.control_id, str(e)[:80])
|
||||||
return CheckResult(present=None, source="error")
|
return CheckResult(present=None, source="error")
|
||||||
|
|
||||||
async def _haiku(self, ctrl: ControlSpec, text: str) -> CheckResult:
|
|
||||||
"""Sufficiency via Haiku direkt (validierter Judge). Kriteriengeführt:
|
|
||||||
die Rechts-Elemente stehen in ctrl.paraphrases; wiederverwendet den
|
|
||||||
validierten deep_check-Sufficiency-Prompt."""
|
|
||||||
try:
|
|
||||||
from compliance.services.llm_cascade import _call_anthropic
|
|
||||||
from compliance.services.specialist_agents.dse.deep_check import (
|
|
||||||
_JUDGE_SYS, _build_user, _parse as _parse_judge,
|
|
||||||
)
|
|
||||||
crit = ctrl.paraphrases or [ctrl.label or ctrl.control_id]
|
|
||||||
user = _build_user(text, ctrl.label or ctrl.control_id, crit)
|
|
||||||
obj = None
|
|
||||||
for _ in range(2):
|
|
||||||
obj = _parse_judge(await _call_anthropic(_JUDGE_SYS, user, max_tokens=400))
|
|
||||||
if obj:
|
|
||||||
break
|
|
||||||
if not obj:
|
|
||||||
return CheckResult(present=None, source="haiku")
|
|
||||||
return CheckResult(
|
|
||||||
present=bool(obj.get("erfuellt")),
|
|
||||||
evidence=(obj.get("begruendung") or "")[:120],
|
|
||||||
confidence=float(obj.get("confidence") or 0.0),
|
|
||||||
source="haiku",
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.info("llm haiku checker fail %s: %s", ctrl.control_id, str(e)[:80])
|
|
||||||
return CheckResult(present=None, source="error")
|
|
||||||
|
|||||||
@@ -1,68 +0,0 @@
|
|||||||
"""Prüfer-Router — method-agnostischer Dispatch.
|
|
||||||
|
|
||||||
control → sensor_classification (verification_method + decision_method) → Checker.
|
|
||||||
Ein neues Modul liefert nur ControlSpecs; der Router wählt den Prüfer. Damit wird
|
|
||||||
der „Embedding findet, Claude entscheidet"-Pfad EIN gemeinsamer CONTENT/LLM-Prüfer
|
|
||||||
statt Cookie-Sonderlogik. Nicht-gebaute Prüfer (PLAYWRIGHT/AUDIT/SCANNER/REGEX-
|
|
||||||
FIELD) → present=None (fail-safe: Aufrufer behält sein deterministisches Ergebnis).
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from typing import Any, Optional
|
|
||||||
|
|
||||||
from .base import CheckResult, ControlSpec, DecisionMethod, DocContext
|
|
||||||
from .embedding_checker import EmbeddingChecker
|
|
||||||
from .llm_checker import LLMChecker
|
|
||||||
from .reference_checker import ReferenceChecker
|
|
||||||
|
|
||||||
_LLM = LLMChecker()
|
|
||||||
_EMB = EmbeddingChecker()
|
|
||||||
_REF = ReferenceChecker()
|
|
||||||
|
|
||||||
# decision_method → Checker. Fehlende Mechanismen bewusst None (noch nicht gebaut).
|
|
||||||
_BY_DECISION: dict[str, Any] = {
|
|
||||||
DecisionMethod.LLM: _LLM,
|
|
||||||
DecisionMethod.EMBEDDING: _EMB,
|
|
||||||
DecisionMethod.LINK_RESOLVER: _REF,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
async def route_and_check(ctrl: ControlSpec, doc: DocContext) -> CheckResult:
|
|
||||||
checker = _BY_DECISION.get((ctrl.decision_method or "").upper())
|
|
||||||
if checker is None:
|
|
||||||
return CheckResult(present=None,
|
|
||||||
source=f"no_checker:{ctrl.decision_method}")
|
|
||||||
return await checker.check(ctrl, doc)
|
|
||||||
|
|
||||||
|
|
||||||
def build_spec(
|
|
||||||
control_id: str,
|
|
||||||
sensor_classification: Optional[dict[str, Any]],
|
|
||||||
*,
|
|
||||||
label: str = "",
|
|
||||||
criteria: Optional[list] = None,
|
|
||||||
question: str = "",
|
|
||||||
patterns: Optional[list[str]] = None,
|
|
||||||
embed_threshold: Optional[float] = None,
|
|
||||||
) -> ControlSpec:
|
|
||||||
"""Baut ein ControlSpec aus der GESPEICHERTEN sensor_classification
|
|
||||||
(canonical_controls.generation_metadata.sensor_classification) + den
|
|
||||||
Control-Kriterien. CONTENT/LLM → judge='haiku' (validierter Sufficiency-
|
|
||||||
Judge; Default für Sufficiency lt. Entscheidung 2026-06-22)."""
|
|
||||||
sc = sensor_classification or {}
|
|
||||||
vm = (sc.get("verification_method") or "").upper()
|
|
||||||
dm = (sc.get("decision_method") or "").upper()
|
|
||||||
extra: dict[str, Any] = {}
|
|
||||||
if vm == "CONTENT" and dm == "LLM":
|
|
||||||
extra["judge"] = "haiku"
|
|
||||||
return ControlSpec(
|
|
||||||
control_id=control_id,
|
|
||||||
verification_method=vm,
|
|
||||||
decision_method=dm,
|
|
||||||
label=label,
|
|
||||||
paraphrases=[str(c) for c in (criteria or []) if c],
|
|
||||||
question=question,
|
|
||||||
patterns=patterns or [],
|
|
||||||
embed_threshold=embed_threshold,
|
|
||||||
extra=extra,
|
|
||||||
)
|
|
||||||
@@ -142,26 +142,19 @@ async def _call_ovh(system: str, user: str, max_tokens: int = 6000) -> str:
|
|||||||
headers = {"Content-Type": "application/json"}
|
headers = {"Content-Type": "application/json"}
|
||||||
if key:
|
if key:
|
||||||
headers["Authorization"] = f"Bearer {key}"
|
headers["Authorization"] = f"Bearer {key}"
|
||||||
# gpt-oss-120b is a REASONING model: it spends output tokens on
|
|
||||||
# chain-of-thought before emitting the answer. A low cap (e.g. deep_check's
|
|
||||||
# max_tokens=400) makes it hit the length limit mid-reasoning and return
|
|
||||||
# content=null — the whole tier then silently yields nothing. Floor the
|
|
||||||
# budget so the reasoning AND the JSON answer fit.
|
|
||||||
payload = {
|
payload = {
|
||||||
"model": model, "temperature": 0.05, "max_tokens": max(max_tokens, 2000),
|
"model": model, "temperature": 0.05, "max_tokens": max_tokens,
|
||||||
"messages": [{"role": "system", "content": system},
|
"messages": [{"role": "system", "content": system},
|
||||||
{"role": "user", "content": user}],
|
{"role": "user", "content": user}],
|
||||||
"response_format": {"type": "json_object"},
|
"response_format": {"type": "json_object"},
|
||||||
}
|
}
|
||||||
try:
|
try:
|
||||||
async with httpx.AsyncClient(timeout=90.0) as c:
|
async with httpx.AsyncClient(timeout=45.0) as c:
|
||||||
r = await c.post(f"{base.rstrip('/')}/v1/chat/completions",
|
r = await c.post(f"{base.rstrip('/')}/v1/chat/completions",
|
||||||
json=payload, headers=headers)
|
json=payload, headers=headers)
|
||||||
r.raise_for_status()
|
r.raise_for_status()
|
||||||
msg = (r.json().get("choices") or [{}])[0].get("message") or {}
|
choice = (r.json().get("choices") or [{}])[0]
|
||||||
# Answer is normally in content; if the model was length-capped the
|
return (choice.get("message") or {}).get("content", "") or ""
|
||||||
# JSON can land in reasoning_content instead — fall back to it.
|
|
||||||
return (msg.get("content") or "") or (msg.get("reasoning_content") or "")
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.warning("ovh cascade tier 2 failed: %s", e)
|
logger.warning("ovh cascade tier 2 failed: %s", e)
|
||||||
return ""
|
return ""
|
||||||
|
|||||||
@@ -1,179 +0,0 @@
|
|||||||
"""Obligation Aggregation Engine — Ausführung des Legal Obligation Layer v1.
|
|
||||||
|
|
||||||
Aggregiert Bewertungen auf KRITERIUM-Ebene (pro Control) zu Ergebnissen auf
|
|
||||||
OBLIGATION-Ebene. Das ist die erstmalige Ausführung des Modells
|
|
||||||
|
|
||||||
Regulation → Legal Obligation → Control → Criterion
|
|
||||||
|
|
||||||
— das Finding entsteht auf der OBLIGATION, nicht pro Control. Damit kollabiert
|
|
||||||
die im Katalog gemessene Redundanz (portability 11×, recipients 14×): N Controls,
|
|
||||||
die dieselbe Pflicht prüfen, ergeben EIN Obligation-Finding statt N Control-Findings.
|
|
||||||
|
|
||||||
Regulierungs-agnostisch: kennt nur obligation_id, tier, met, legal_basis,
|
|
||||||
conditional. DSGVO/CRA/NIS2/DORA/MaschVO/AI-Act speisen dieselbe Funktion.
|
|
||||||
|
|
||||||
Fail-safe (docs-src/development/legal_obligation_layer_v1.md, §Aggregation):
|
|
||||||
LEGAL_MINIMUM-Obligation:
|
|
||||||
applicable=false → NA (kein Finding)
|
|
||||||
keine LM-Anforderung erfüllt → FAILED (Pflicht-Lücke)
|
|
||||||
alle LM-Anforderungen erfüllt → MET
|
|
||||||
nur ein Teil erfüllt → PARTIAL
|
|
||||||
LM nicht bewertbar (Prüfer down) → UNDETERMINED (Aufrufer behält Legacy)
|
|
||||||
BEST_PRACTICE/OPTIONAL-Obligation (kein LM):
|
|
||||||
mind. ein Kriterium erfüllt → MET (abgedeckt)
|
|
||||||
keines → OPEN (nur Empfehlung, NIE FAILED)
|
|
||||||
|
|
||||||
Redundanz-Kollaps: LM-Kriterien EINER Obligation werden zu „Anforderungen" nach
|
|
||||||
`legal_basis` gruppiert; eine Anforderung gilt als erfüllt, sobald IRGENDEIN Control
|
|
||||||
sie bestätigt (OR). 9× recipients_disclosed (alle Art 13(1)(e)) = eine Anforderung.
|
|
||||||
PARTIAL entsteht nur bei mehreren DISTINKTEN LM-Anforderungen (verschiedene
|
|
||||||
legal_basis) innerhalb einer Obligation.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from collections import Counter, defaultdict
|
|
||||||
from dataclasses import dataclass, field
|
|
||||||
from typing import Callable, Optional
|
|
||||||
|
|
||||||
LM, BP, OPT = "LEGAL_MINIMUM", "BEST_PRACTICE", "OPTIONAL"
|
|
||||||
MET, PARTIAL, FAILED = "MET", "PARTIAL", "FAILED"
|
|
||||||
NA, UNDETERMINED, OPEN = "NA", "UNDETERMINED", "OPEN"
|
|
||||||
PFLICHT, EMPFEHLUNG, NICHT_ANWENDBAR = "PFLICHT", "EMPFEHLUNG", "NICHT_ANWENDBAR"
|
|
||||||
|
|
||||||
# Predikat-Hook: (conditional, doc_text) → True (anwendbar) / False (→ NA) / None (unbekannt → anwendbar)
|
|
||||||
ApplicableFn = Callable[[str, str], Optional[bool]]
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
|
||||||
class CriterionEval:
|
|
||||||
"""Eine Kriteriums-Bewertung eines Controls, einer Obligation zugeordnet."""
|
|
||||||
obligation_id: str
|
|
||||||
tier: str # LEGAL_MINIMUM / BEST_PRACTICE / OPTIONAL
|
|
||||||
met: Optional[bool] # True erfüllt · False fehlt · None unbestimmt
|
|
||||||
control_id: str
|
|
||||||
legal_basis: str = ""
|
|
||||||
criterion: str = ""
|
|
||||||
conditional: Optional[str] = None # Applicability-Prädikat der Obligation
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class ObligationResult:
|
|
||||||
obligation_id: str
|
|
||||||
status: str # MET / PARTIAL / FAILED / NA / UNDETERMINED / OPEN
|
|
||||||
bucket: str # PFLICHT / EMPFEHLUNG / NICHT_ANWENDBAR
|
|
||||||
tier: str # bestimmende Tier der Obligation
|
|
||||||
applicable: bool
|
|
||||||
evidence: list[str] # beitragende control_ids
|
|
||||||
lm_met: int # erfüllte LM-Anforderungen
|
|
||||||
lm_total: int # distinkte LM-Anforderungen (bewertbar)
|
|
||||||
recommendations: list[dict] = field(default_factory=list)
|
|
||||||
|
|
||||||
|
|
||||||
def _governing_tier(evals: list[CriterionEval]) -> str:
|
|
||||||
tiers = {e.tier for e in evals}
|
|
||||||
if LM in tiers:
|
|
||||||
return LM
|
|
||||||
return BP if BP in tiers else OPT
|
|
||||||
|
|
||||||
|
|
||||||
def _requirement_state(evals: list[CriterionEval]) -> Optional[bool]:
|
|
||||||
"""Zustand EINER LM-Anforderung über alle prüfenden Controls (OR/Redundanz):
|
|
||||||
True (irgendwer bestätigt) · None (alle unbestimmt) · False (bewertet, fehlt)."""
|
|
||||||
if any(e.met is True for e in evals):
|
|
||||||
return True
|
|
||||||
if all(e.met is None for e in evals):
|
|
||||||
return None
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
def _recommendations(evals: list[CriterionEval]) -> list[dict]:
|
|
||||||
"""Nicht erfüllte BEST_PRACTICE/OPTIONAL-Kriterien → Empfehlungen."""
|
|
||||||
return [{"criterion": e.criterion, "tier": e.tier, "legal_basis": e.legal_basis,
|
|
||||||
"control_id": e.control_id}
|
|
||||||
for e in evals if e.tier in (BP, OPT) and e.met is False]
|
|
||||||
|
|
||||||
|
|
||||||
def aggregate_obligation(obligation_id: str, evals: list[CriterionEval], *,
|
|
||||||
applicable_fn: Optional[ApplicableFn] = None,
|
|
||||||
doc_text: str = "") -> ObligationResult:
|
|
||||||
evidence = sorted({e.control_id for e in evals if e.control_id})
|
|
||||||
conditional = next((e.conditional for e in evals if e.conditional), None)
|
|
||||||
tier = _governing_tier(evals)
|
|
||||||
recs = _recommendations(evals)
|
|
||||||
|
|
||||||
applicable = True
|
|
||||||
if applicable_fn is not None and conditional:
|
|
||||||
verdict = applicable_fn(conditional, doc_text)
|
|
||||||
applicable = True if verdict is None else bool(verdict)
|
|
||||||
if not applicable:
|
|
||||||
return ObligationResult(obligation_id, NA, NICHT_ANWENDBAR, tier, False,
|
|
||||||
evidence, 0, 0, recs)
|
|
||||||
|
|
||||||
lm_evals = [e for e in evals if e.tier == LM]
|
|
||||||
if lm_evals:
|
|
||||||
reqs: dict[str, list[CriterionEval]] = defaultdict(list)
|
|
||||||
for e in lm_evals:
|
|
||||||
reqs[e.legal_basis or obligation_id].append(e)
|
|
||||||
states = [_requirement_state(v) for v in reqs.values()]
|
|
||||||
determinable = [s for s in states if s is not None]
|
|
||||||
if not determinable:
|
|
||||||
return ObligationResult(obligation_id, UNDETERMINED, PFLICHT, LM, True,
|
|
||||||
evidence, 0, len(states), recs)
|
|
||||||
met = sum(1 for s in determinable if s)
|
|
||||||
total = len(determinable)
|
|
||||||
status = MET if met == total else (FAILED if met == 0 else PARTIAL)
|
|
||||||
return ObligationResult(obligation_id, status, PFLICHT, LM, True,
|
|
||||||
evidence, met, total, recs)
|
|
||||||
|
|
||||||
# Reine BEST_PRACTICE/OPTIONAL-Obligation: nie Pflicht, nie FAILED.
|
|
||||||
covered = any(e.met is True for e in evals)
|
|
||||||
return ObligationResult(obligation_id, MET if covered else OPEN, EMPFEHLUNG,
|
|
||||||
tier, True, evidence, 0, 0, recs)
|
|
||||||
|
|
||||||
|
|
||||||
def aggregate_obligations(evals: list[CriterionEval], *,
|
|
||||||
applicable_fn: Optional[ApplicableFn] = None,
|
|
||||||
doc_text: str = "") -> list[ObligationResult]:
|
|
||||||
"""Flache Kriteriums-Liste → ein ObligationResult je obligation_id."""
|
|
||||||
groups: dict[str, list[CriterionEval]] = defaultdict(list)
|
|
||||||
for e in evals:
|
|
||||||
if e.obligation_id:
|
|
||||||
groups[e.obligation_id].append(e)
|
|
||||||
return [aggregate_obligation(oid, g, applicable_fn=applicable_fn, doc_text=doc_text)
|
|
||||||
for oid, g in groups.items()]
|
|
||||||
|
|
||||||
|
|
||||||
def evals_from_tiered(control_id: str, tiered_criteria: list[dict],
|
|
||||||
detail: list[dict], conditional: Optional[str] = None
|
|
||||||
) -> list[CriterionEval]:
|
|
||||||
"""Adapter: tiered_criteria (obligation_id/tier/legal_basis) + das
|
|
||||||
evaluate_tiered-`detail` (met pro Index, gleiche Reihenfolge) → CriterionEvals.
|
|
||||||
`conditional` kommt aus der Control-`applicability` (gilt für die Obligation)."""
|
|
||||||
out: list[CriterionEval] = []
|
|
||||||
for i, c in enumerate(tiered_criteria or []):
|
|
||||||
oid = c.get("obligation_id")
|
|
||||||
if not oid:
|
|
||||||
continue
|
|
||||||
d = detail[i] if i < len(detail) else {}
|
|
||||||
out.append(CriterionEval(
|
|
||||||
obligation_id=oid,
|
|
||||||
tier=(c.get("compliance_tier") or "").upper(),
|
|
||||||
met=d.get("met"),
|
|
||||||
control_id=control_id,
|
|
||||||
legal_basis=c.get("legal_basis") or "",
|
|
||||||
criterion=c.get("criterion") or "",
|
|
||||||
conditional=conditional,
|
|
||||||
))
|
|
||||||
return out
|
|
||||||
|
|
||||||
|
|
||||||
def summarize(results: list[ObligationResult]) -> dict:
|
|
||||||
"""Phase-C-Kennzahlen: Obligation-Anzahl + Verteilung nach Bucket/Status."""
|
|
||||||
return {
|
|
||||||
"obligations": len(results),
|
|
||||||
"buckets": dict(Counter(r.bucket for r in results)),
|
|
||||||
"statuses": dict(Counter(r.status for r in results)),
|
|
||||||
"pflicht_failed": sum(1 for r in results if r.bucket == PFLICHT and r.status == FAILED),
|
|
||||||
"pflicht_partial": sum(1 for r in results if r.bucket == PFLICHT and r.status == PARTIAL),
|
|
||||||
"recommendations": sum(len(r.recommendations) for r in results),
|
|
||||||
}
|
|
||||||
@@ -1,76 +0,0 @@
|
|||||||
"""Applicability-Prädikate (minimal) für die Obligation Aggregation Engine.
|
|
||||||
|
|
||||||
Jedes Prädikat entscheidet aus dem Dokumenttext, ob eine BEDINGTE Obligation
|
|
||||||
anwendbar ist:
|
|
||||||
True → anwendbar (normal bewerten)
|
|
||||||
False → NICHT anwendbar (→ NA statt FEHLT)
|
|
||||||
None → Prädikat unbekannt → Aufrufer behält Default=anwendbar (fail-safe,
|
|
||||||
KEINE stille NA)
|
|
||||||
|
|
||||||
Bewusst KLEIN gehalten: nur die bereits modellierten Bedingungen
|
|
||||||
has_third_country_transfer · uses_legitimate_interest · direct_marketing
|
|
||||||
(+ legitimate_interest_or_public_task, weil objection_general_art21_1 dieselbe
|
|
||||||
Rechtsgrundlage als Anknüpfung nutzt). profiling/employment/telecom/health/
|
|
||||||
data_act folgen in der nächsten Charge — bis dahin → None → anwendbar.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from typing import Optional
|
|
||||||
|
|
||||||
_THIRD_COUNTRY = (
|
|
||||||
"drittland", "drittstaat", "drittländ", "third countr", "außerhalb der eu",
|
|
||||||
"ausserhalb der eu", "außerhalb des ewr", "ausserhalb des ewr",
|
|
||||||
"angemessenheitsbeschluss", "standardvertragsklausel", "standarddatenschutzklausel",
|
|
||||||
"binding corporate rules", "verbindliche interne datenschutzvorschriften",
|
|
||||||
"data privacy framework", "privacy shield", "in die usa", "in den usa",
|
|
||||||
"vereinigte staaten", "international transfer", "internationale übermittlung",
|
|
||||||
"art. 44", "art. 46",
|
|
||||||
)
|
|
||||||
_LEGIT = (
|
|
||||||
"berechtigtes interesse", "berechtigten interesse", "berechtigte interesse",
|
|
||||||
"legitimate interest", "art. 6 abs. 1 lit. f", "art. 6 abs. 1 f",
|
|
||||||
"art. 6 (1) (f)", "abs. 1 buchstabe f", "interessenabwägung",
|
|
||||||
)
|
|
||||||
_PUBLIC_TASK = (
|
|
||||||
"öffentliche aufgabe", "öffentlichen aufgabe", "im öffentlichen interesse",
|
|
||||||
"art. 6 abs. 1 lit. e", "ausübung öffentlicher gewalt", "official authority",
|
|
||||||
)
|
|
||||||
_DIRECT_MKT = (
|
|
||||||
"direktwerbung", "direktmarketing", "direkt-werbung", "werbe-e-mail", "werbe-mail",
|
|
||||||
"newsletter", "werbliche", "marketingzweck", "marketing-zweck", "zwecke der werbung",
|
|
||||||
"zu werbezwecken", "e-mail-marketing", "postwerbung", "telefonwerbung",
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def _has(text: str, kws: tuple[str, ...]) -> bool:
|
|
||||||
return any(k in text for k in kws)
|
|
||||||
|
|
||||||
|
|
||||||
def has_third_country_transfer(text: str) -> bool:
|
|
||||||
return _has(text, _THIRD_COUNTRY)
|
|
||||||
|
|
||||||
|
|
||||||
def uses_legitimate_interest(text: str) -> bool:
|
|
||||||
return _has(text, _LEGIT)
|
|
||||||
|
|
||||||
|
|
||||||
def direct_marketing(text: str) -> bool:
|
|
||||||
return _has(text, _DIRECT_MKT)
|
|
||||||
|
|
||||||
|
|
||||||
_PREDICATES = {
|
|
||||||
"has_third_country_transfer": has_third_country_transfer,
|
|
||||||
"uses_legitimate_interest": uses_legitimate_interest,
|
|
||||||
"legitimate_interest_or_public_task":
|
|
||||||
lambda t: _has(t, _LEGIT) or _has(t, _PUBLIC_TASK),
|
|
||||||
"direct_marketing": direct_marketing,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def applicable(conditional: str, doc_text: str) -> Optional[bool]:
|
|
||||||
"""applicable_fn-Hook für `aggregate_obligations`. Unbekanntes Prädikat → None
|
|
||||||
(Aufrufer behält Default=anwendbar; NIE stille NA)."""
|
|
||||||
fn = _PREDICATES.get(conditional)
|
|
||||||
if fn is None:
|
|
||||||
return None
|
|
||||||
return fn((doc_text or "").lower())
|
|
||||||
@@ -1,26 +0,0 @@
|
|||||||
"""Obligation-Taxonomie-Registry — versioniertes Artefakt bis zur DB-Owner-Tabelle
|
|
||||||
(Legal Obligation Layer v1, docs-src/development/legal_obligation_layer_v1.md).
|
|
||||||
|
|
||||||
Hält Metadaten auf OBLIGATION-Ebene, die (noch) keine eigene DB-Tabelle haben.
|
|
||||||
|
|
||||||
`decision_method_required`: Obligations, deren Erkennung Keyword/Embedding
|
|
||||||
NACHWEISLICH nicht zuverlässig leistet (kompakte/synonymreiche Offenlegung) und
|
|
||||||
die CONTENT/LLM brauchen. Empirisch belegt am TeamViewer-Recall-Defekt: 0/22
|
|
||||||
recipients+international_transfer Controls trafen, obwohl die Pflicht erfüllt war
|
|
||||||
(„…außerhalb EU/EWR … Standardvertragsklauseln/Schutzmaßnahmen"); Embedding cos
|
|
||||||
0.49–0.57 < 0.62, teils falscher Chunk → kein Schwellen-Fix, sondern LLM-Klasse.
|
|
||||||
|
|
||||||
Wirkung: der Shadow zählt ein FAILED solcher Obligations NICHT als „echte Lücke",
|
|
||||||
sondern als RECALL_LIMITED (Prüfer kann sie mit aktueller Methode nicht verifizieren).
|
|
||||||
"""
|
|
||||||
OBLIGATION_META: dict[str, dict] = {
|
|
||||||
"recipients_disclosed": {"decision_method_required": "LLM"},
|
|
||||||
"third_country_transfer_disclosed": {"decision_method_required": "LLM"},
|
|
||||||
"safeguards_disclosed": {"decision_method_required": "LLM"},
|
|
||||||
"safeguards_accessible": {"decision_method_required": "LLM"},
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def requires_llm(obligation_id: str) -> bool:
|
|
||||||
"""True, wenn diese Obligation CONTENT/LLM braucht (Keyword/Embedding-Recall belegt unzureichend)."""
|
|
||||||
return OBLIGATION_META.get(obligation_id, {}).get("decision_method_required") == "LLM"
|
|
||||||
-78
@@ -1,78 +0,0 @@
|
|||||||
"""Applicability-Gate fuer den Cookie-Policy-Scan.
|
|
||||||
|
|
||||||
Schliesst Controls aus dem Cookie-Findings-Scan aus, die laut
|
|
||||||
`compliance.control_classification` NICHT gegen eine Cookie-Policy laufen
|
|
||||||
('COOKIE_POLICY' nicht in applicable_artifacts). Diese gehoeren zu einem
|
|
||||||
anderen Artefakt/Pruefer — Banner (BEHAVIOR/Playwright), Security/TOM/Audit
|
|
||||||
(PROCESS) — und erzeugen sonst Unsinn-Findings (z.B. 'TOMs nicht dokumentiert'
|
|
||||||
gegen eine Cookie-Richtlinie). Sie werden NICHT geloescht, sondern als
|
|
||||||
Routing-Liste zurueckgegeben.
|
|
||||||
|
|
||||||
Anders als das DSE-Gate OHNE needs_review-Ausnahme: das Artefakt-Signal ist
|
|
||||||
hier entscheidend und per Inventar (2026-06-21) belegt; die mis-scopeten 11
|
|
||||||
sind geprueft. Fail-safe: fehlt die Tabelle / DB nicht erreichbar -> leeres
|
|
||||||
Dict -> es wird NICHT gefiltert (kein stiller Recall-Verlust).
|
|
||||||
"""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import logging
|
|
||||||
import os
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
async def load_cookie_gate(db_url: str = "") -> dict[str, dict[str, Any]]:
|
|
||||||
"""Liefert {control_id: meta} fuer Controls, die aus dem Cookie-Findings-
|
|
||||||
Scan auszuschliessen sind (kein COOKIE_POLICY-Artefakt). Leeres Dict =
|
|
||||||
kein Filter."""
|
|
||||||
dsn = (db_url or os.getenv("DATABASE_URL")
|
|
||||||
or os.getenv("COMPLIANCE_DATABASE_URL") or "")
|
|
||||||
if not dsn:
|
|
||||||
return {}
|
|
||||||
try:
|
|
||||||
import asyncpg
|
|
||||||
conn = await asyncpg.connect(dsn)
|
|
||||||
try:
|
|
||||||
rows = await conn.fetch(
|
|
||||||
"""SELECT control_id, obligation_type, check_intent,
|
|
||||||
applicable_artifacts
|
|
||||||
FROM compliance.control_classification
|
|
||||||
WHERE is_active
|
|
||||||
AND NOT ('COOKIE_POLICY' = ANY(applicable_artifacts))""")
|
|
||||||
finally:
|
|
||||||
await conn.close()
|
|
||||||
except Exception as e: # Tabelle fehlt / DB weg -> kein Filter
|
|
||||||
logger.info("cookie classification gate inaktiv: %s", str(e)[:90])
|
|
||||||
return {}
|
|
||||||
return {
|
|
||||||
r["control_id"]: {
|
|
||||||
"obligation_type": r["obligation_type"],
|
|
||||||
"check_intent": r["check_intent"],
|
|
||||||
"applicable_artifacts": list(r["applicable_artifacts"] or []),
|
|
||||||
}
|
|
||||||
for r in rows if r["control_id"]
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def apply_gate(
|
|
||||||
controls: list[dict[str, Any]],
|
|
||||||
gate: dict[str, dict[str, Any]],
|
|
||||||
) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
|
|
||||||
"""Teilt geladene Controls in (kept, routed_out).
|
|
||||||
|
|
||||||
kept: laufen normal durch den Cookie-Scan.
|
|
||||||
routed_out: aus dem Scan genommen (control_id + title + Klassifikations-
|
|
||||||
Metadaten fuer das Routing zu Banner/Security/Audit).
|
|
||||||
"""
|
|
||||||
kept: list[dict[str, Any]] = []
|
|
||||||
routed_out: list[dict[str, Any]] = []
|
|
||||||
for c in controls:
|
|
||||||
cid = c.get("control_id")
|
|
||||||
meta = gate.get(cid) if cid else None
|
|
||||||
if meta:
|
|
||||||
routed_out.append({"control_id": cid, "title": c.get("title"), **meta})
|
|
||||||
else:
|
|
||||||
kept.append(c)
|
|
||||||
return kept, routed_out
|
|
||||||
-63
@@ -1,63 +0,0 @@
|
|||||||
"""Layer-3 Sufficiency-Judge fuer Cookie-Policy.
|
|
||||||
|
|
||||||
Das Embedding/Boost-Auto-Rescue (Layer 0/2) ist BEWUSST optimistisch — es findet
|
|
||||||
das Thema, beweist aber nicht die Erfuellung. Messung (2026-06-22): 159 FN
|
|
||||||
(Over-Rescue) gegen Opus-GT, weil 'Thema erwaehnt' als 'erfuellt' durchgewunken
|
|
||||||
wurde. Diese Schicht prueft GENAU die rescued Controls mit dem validierten
|
|
||||||
Haiku-Judge (Cohort cookie_sufficiency_v1: P0.89/R0.91) — NICHT die Qwen-first-
|
|
||||||
Kaskade (lokal ist als Sufficiency-Judge widerlegt) — und nimmt 'passed' zurueck,
|
|
||||||
wenn die konkrete Pflicht nicht erfuellt ist. 'Embedding findet, Claude entscheidet.'
|
|
||||||
|
|
||||||
Nur fuer den NICHT-skip_llm-Pfad (voller Check); der schnelle/interaktive Pfad
|
|
||||||
behaelt das deterministische Rescue.
|
|
||||||
"""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import logging
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
_RESCUE_MARKERS = ("+embedding", "+regex_boost")
|
|
||||||
|
|
||||||
|
|
||||||
def _is_rescued(r: dict[str, Any]) -> bool:
|
|
||||||
src = r.get("source") or ""
|
|
||||||
return r.get("passed") and any(m in src for m in _RESCUE_MARKERS)
|
|
||||||
|
|
||||||
|
|
||||||
async def judge_rescued(text: str, results: list[dict[str, Any]]) -> int:
|
|
||||||
"""Prueft alle rescued (embedding/boost) passed-Controls mit Haiku.
|
|
||||||
Nimmt passed zurueck, wenn der Judge die Pflicht als NICHT erfuellt sieht.
|
|
||||||
Gibt die Anzahl zurueckgenommener (korrigierter) Rescues zurueck.
|
|
||||||
"""
|
|
||||||
# Über den gemeinsamen Prüfer-Router (kein Cookie-Sonderfall mehr):
|
|
||||||
# CONTENT/LLM → build_spec setzt judge='haiku' → LLMChecker (validierter
|
|
||||||
# Sufficiency-Judge). Damit ist Cookie der erste echte Router-Consumer.
|
|
||||||
from compliance.services.checkers.base import DocContext
|
|
||||||
from compliance.services.checkers.router import build_spec, route_and_check
|
|
||||||
|
|
||||||
candidates = [r for r in results if _is_rescued(r)]
|
|
||||||
if not candidates:
|
|
||||||
return 0
|
|
||||||
doc = DocContext(text=text)
|
|
||||||
sc = {"verification_method": "CONTENT", "decision_method": "LLM"}
|
|
||||||
corrected = 0
|
|
||||||
for r in candidates:
|
|
||||||
crit = r.get("_pass_criteria") or [r.get("label") or r.get("hint") or ""]
|
|
||||||
if not isinstance(crit, list):
|
|
||||||
crit = [str(crit)]
|
|
||||||
label = r.get("label") or r.get("hint") or r.get("control_id") or ""
|
|
||||||
spec = build_spec(r.get("control_id") or "", sc, label=label, criteria=crit)
|
|
||||||
res = await route_and_check(spec, doc)
|
|
||||||
if res.present is False:
|
|
||||||
r["passed"] = False
|
|
||||||
r["source"] = (r.get("source") or "") + "+llm_failed"
|
|
||||||
r["matched_text"] = "[layer-3 sufficiency-judge: nicht erfuellt]"
|
|
||||||
r["_judge_reason"] = (res.evidence or "")[:200]
|
|
||||||
corrected += 1
|
|
||||||
if corrected:
|
|
||||||
logger.info("cookie layer-3 sufficiency-judge: %d/%d rescues zurueckgenommen",
|
|
||||||
corrected, len(candidates))
|
|
||||||
return corrected
|
|
||||||
@@ -96,22 +96,6 @@ class CookiePolicyAgent(BaseSpecialistAgent):
|
|||||||
"Branchen-MCs entfernt"
|
"Branchen-MCs entfernt"
|
||||||
)
|
)
|
||||||
|
|
||||||
# Layer 3 — Sufficiency-Judge (Haiku) auf die embedding/boost-rescued
|
|
||||||
# Controls: Embedding findet das Thema, Claude entscheidet ob die Pflicht
|
|
||||||
# konkret erfuellt ist. Nur im vollen Check (nicht skip_llm).
|
|
||||||
skip_llm = bool((agent_input.context or {}).get("skip_llm"))
|
|
||||||
if not skip_llm:
|
|
||||||
try:
|
|
||||||
from ._sufficiency_judge import judge_rescued
|
|
||||||
corrected = await judge_rescued(text, results)
|
|
||||||
if corrected:
|
|
||||||
notes_parts.append(
|
|
||||||
f"layer-3 sufficiency-judge: {corrected} Rescues "
|
|
||||||
"zurückgenommen"
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning("cookie layer-3 judge skipped: %s", e)
|
|
||||||
|
|
||||||
seen: set[str] = set()
|
seen: set[str] = set()
|
||||||
for r in results:
|
for r in results:
|
||||||
mc_id = r.get("control_id") or ""
|
mc_id = r.get("control_id") or ""
|
||||||
|
|||||||
@@ -45,15 +45,6 @@ async def run_v3_pipeline(
|
|||||||
controls = []
|
controls = []
|
||||||
_normalize_criteria(controls)
|
_normalize_criteria(controls)
|
||||||
controls, sector_dropped = _filter_sector(controls, business_scope)
|
controls, sector_dropped = _filter_sector(controls, business_scope)
|
||||||
# Artefakt-Gate: Controls ohne COOKIE_POLICY-Artefakt (Security/TOM/Audit,
|
|
||||||
# Banner) raus — sie gehoeren zu anderem Pruefer/Artefakt und erzeugen sonst
|
|
||||||
# Unsinn-Findings. Siehe _classification_gate.
|
|
||||||
routed_out: list[dict[str, Any]] = []
|
|
||||||
try:
|
|
||||||
from ._classification_gate import apply_gate, load_cookie_gate
|
|
||||||
controls, routed_out = apply_gate(controls, await load_cookie_gate(db_url))
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning("cookie classification gate skipped: %s", e)
|
|
||||||
results: list[dict[str, Any]] = []
|
results: list[dict[str, Any]] = []
|
||||||
if controls:
|
if controls:
|
||||||
try:
|
try:
|
||||||
@@ -120,7 +111,6 @@ async def run_v3_pipeline(
|
|||||||
"layer_0_boost_overrides": boost_overrides,
|
"layer_0_boost_overrides": boost_overrides,
|
||||||
"total_mcs": len(results),
|
"total_mcs": len(results),
|
||||||
"sector_dropped": sector_dropped,
|
"sector_dropped": sector_dropped,
|
||||||
"artifact_gated": len(routed_out),
|
|
||||||
}
|
}
|
||||||
return results, telemetry
|
return results, telemetry
|
||||||
|
|
||||||
|
|||||||
@@ -1,130 +0,0 @@
|
|||||||
"""DSE Shadow-Verdrahtung der Obligation Aggregation Engine.
|
|
||||||
|
|
||||||
Erzeugt aus den v3-`results` zusätzlich Obligation-Ergebnisse — AUSSCHLIESSLICH
|
|
||||||
für die Telemetrie (Shadow Mode). Ändert KEINE nutzer-sichtbaren Findings.
|
|
||||||
|
|
||||||
Mapping control-level über generation_metadata.legal_obligations +
|
|
||||||
applicability.conditional; das `met`-Signal ist das Legacy-`passed` des Controls
|
|
||||||
(kein zusätzlicher Prüfer-Call, kein Key). Liefert die Vergleichszahlen, mit denen
|
|
||||||
sich der Umschalt-Entscheid später absichern lässt:
|
|
||||||
legacy_control_findings · obligation_shadow_results · collapse_factor ·
|
|
||||||
na_count · met_failed_delta · top_collapsed_obligations
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import logging
|
|
||||||
import os
|
|
||||||
from typing import Any, Optional
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
async def fetch_obligation_markers(cids: list[str], db_url: str = "") -> dict[str, dict]:
|
|
||||||
"""legal_obligations + applicability.conditional der Controls laden.
|
|
||||||
Leeres Dict bei Fehler/keiner DB (Shadow fällt still aus)."""
|
|
||||||
cids = [c for c in cids if c]
|
|
||||||
if not cids:
|
|
||||||
return {}
|
|
||||||
import json
|
|
||||||
dsn = db_url or os.getenv("DATABASE_URL") or os.getenv("COMPLIANCE_DATABASE_URL")
|
|
||||||
if not dsn:
|
|
||||||
return {}
|
|
||||||
try:
|
|
||||||
import asyncpg
|
|
||||||
conn = await asyncpg.connect(dsn)
|
|
||||||
rows = await conn.fetch(
|
|
||||||
"select control_id, generation_metadata->'legal_obligations' obl, "
|
|
||||||
"generation_metadata->'applicability'->>'conditional' cond "
|
|
||||||
"from compliance.canonical_controls "
|
|
||||||
"where control_id = any($1::text[]) "
|
|
||||||
"and generation_metadata ? 'legal_obligations'", cids)
|
|
||||||
await conn.close()
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning("fetch_obligation_markers failed: %s", e)
|
|
||||||
return {}
|
|
||||||
out: dict[str, dict] = {}
|
|
||||||
for r in rows:
|
|
||||||
obl = r["obl"]
|
|
||||||
obl = json.loads(obl) if isinstance(obl, str) else obl
|
|
||||||
if obl:
|
|
||||||
out[r["control_id"]] = {"obl": obl, "cond": r["cond"]}
|
|
||||||
return out
|
|
||||||
|
|
||||||
|
|
||||||
def compute_obligation_shadow(results: list[dict], text: str,
|
|
||||||
markers: dict[str, dict]) -> dict[str, Any]:
|
|
||||||
"""Reiner Shadow-Vergleich (keine DB, keine Seiteneffekte). `markers`:
|
|
||||||
{control_id: {obl:[...], cond:str|None}}. `met` = Legacy-`passed`."""
|
|
||||||
from compliance.services.obligation_aggregation import (
|
|
||||||
FAILED, LM, MET, NA, PARTIAL, CriterionEval, aggregate_obligations,
|
|
||||||
)
|
|
||||||
from compliance.services.obligation_applicability import applicable
|
|
||||||
from compliance.services.obligation_taxonomy import requires_llm
|
|
||||||
|
|
||||||
legacy = 0
|
|
||||||
evals: list[Any] = []
|
|
||||||
contrib: dict[str, list] = {}
|
|
||||||
for r in results:
|
|
||||||
cid = r.get("control_id")
|
|
||||||
m = markers.get(cid)
|
|
||||||
if not m:
|
|
||||||
continue
|
|
||||||
passed = bool(r.get("passed"))
|
|
||||||
if not passed:
|
|
||||||
legacy += 1
|
|
||||||
for ob in m["obl"]:
|
|
||||||
evals.append(CriterionEval(ob, LM, passed, cid, "", "", m.get("cond")))
|
|
||||||
contrib.setdefault(ob, []).append((cid, passed))
|
|
||||||
if not evals:
|
|
||||||
return {"status": "no obligation markers on result controls"}
|
|
||||||
|
|
||||||
obls = aggregate_obligations(evals, applicable_fn=applicable, doc_text=text)
|
|
||||||
# FAILED/PARTIAL ehrlich trennen: echte Lücke (failed_by_current_checker) vs
|
|
||||||
# RECALL_LIMITED (Obligation braucht LLM, aktueller Prüfer kann sie nicht verifizieren).
|
|
||||||
findings = failed_current = recall_limited = na = 0
|
|
||||||
for o in obls:
|
|
||||||
if o.status == NA:
|
|
||||||
na += 1
|
|
||||||
elif o.status in (FAILED, PARTIAL):
|
|
||||||
findings += 1
|
|
||||||
if requires_llm(o.obligation_id):
|
|
||||||
recall_limited += 1
|
|
||||||
else:
|
|
||||||
failed_current += 1
|
|
||||||
top = []
|
|
||||||
for o in obls:
|
|
||||||
cs = contrib.get(o.obligation_id, [])
|
|
||||||
fehlt = sum(1 for _, p in cs if not p)
|
|
||||||
if fehlt >= 2:
|
|
||||||
top.append({"obligation": o.obligation_id, "fehlt": fehlt,
|
|
||||||
"total": len(cs), "status": o.status,
|
|
||||||
"recall_limited": bool(requires_llm(o.obligation_id)
|
|
||||||
and o.status in (FAILED, PARTIAL))})
|
|
||||||
top.sort(key=lambda x: -x["fehlt"])
|
|
||||||
met_count = sum(1 for o in obls if o.status == MET)
|
|
||||||
recall_limited_obls = sorted({o.obligation_id for o in obls
|
|
||||||
if o.status in (FAILED, PARTIAL)
|
|
||||||
and requires_llm(o.obligation_id)})
|
|
||||||
return {
|
|
||||||
"legacy_control_findings": legacy,
|
|
||||||
"obligation_shadow_results": len(obls),
|
|
||||||
"obligation_findings": findings,
|
|
||||||
"failed_by_current_checker": failed_current,
|
|
||||||
"recall_limited": recall_limited,
|
|
||||||
"met_count": met_count,
|
|
||||||
"collapse_factor": round(legacy / findings, 2) if findings else None,
|
|
||||||
"na_count": na,
|
|
||||||
"met_failed_delta": legacy - findings,
|
|
||||||
"top_collapsed_obligations": top[:10],
|
|
||||||
"recall_limited_obligations": recall_limited_obls,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
async def build_obligation_shadow(results: list[dict], text: str,
|
|
||||||
db_url: str = "") -> dict[str, Any]:
|
|
||||||
"""Async-Wrapper: Marker laden, dann Shadow rechnen. NIE in `results` schreiben."""
|
|
||||||
cids = [r.get("control_id") for r in results if r.get("control_id")]
|
|
||||||
markers = await fetch_obligation_markers(cids, db_url)
|
|
||||||
if not markers:
|
|
||||||
return {"status": "no markers"}
|
|
||||||
return compute_obligation_shadow(results, text, markers)
|
|
||||||
@@ -1,183 +0,0 @@
|
|||||||
"""Getierte 3-Status-Auswertung für DSE-Controls mit `tiered_criteria`.
|
|
||||||
|
|
||||||
Pro Kriterium wird nach `decision_method` bewertet:
|
|
||||||
- EMBEDDING (Präsenz): deterministisch (festes Modell), Doc EINMAL pro Scan
|
|
||||||
eingebettet → reproduzierbar, kein LLM. Trägt den GROSSTEIL.
|
|
||||||
- LLM (Sufficiency): Haiku-Judge, GECACHT pro (doc_hash, control_id#idx,
|
|
||||||
PROMPT_VERSION, criterion) → gleicher Scan = gleiches Ergebnis. Löst die
|
|
||||||
empirisch gemessene Judge-Varianz (ein Live-Call ist NICHT reproduzierbar).
|
|
||||||
|
|
||||||
Status NUR aus LEGAL_MINIMUM:
|
|
||||||
ERFÜLLT (alle LM erfüllt ODER kein LM) · FEHLT (kein LM erfüllt) ·
|
|
||||||
TEILWEISE (Teil der LM erfüllt) · UNBESTIMMT (LM nicht bewertbar, z. B.
|
|
||||||
Embedding-Service down → Aufrufer behält sein Legacy-Ergebnis).
|
|
||||||
BEST_PRACTICE/OPTIONAL fließen NIE in den Status, nur in `recommendations`.
|
|
||||||
Siehe docs-src/development/criterion_meta_model.md.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import asyncio
|
|
||||||
import hashlib
|
|
||||||
import logging
|
|
||||||
import os
|
|
||||||
import sqlite3
|
|
||||||
from typing import Any, Optional
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
PROMPT_VERSION = "dse-tier-v1"
|
|
||||||
_CACHE_DB = os.getenv("TIERED_JUDGE_CACHE", "/data/tiered_judge_cache.db")
|
|
||||||
_EMBED_THR = float(os.getenv("DSE_CRITERION_EMBED_THRESHOLD", "0.62"))
|
|
||||||
LM = "LEGAL_MINIMUM"
|
|
||||||
|
|
||||||
|
|
||||||
def _doc_hash(text: str) -> str:
|
|
||||||
return hashlib.sha256(text.encode("utf-8", "ignore")).hexdigest()[:20]
|
|
||||||
|
|
||||||
|
|
||||||
def _ckey(dh: str, cid: str, idx: int, crit: str) -> str:
|
|
||||||
ch = hashlib.sha256(crit.encode("utf-8", "ignore")).hexdigest()[:12]
|
|
||||||
return f"{dh}|{cid}#{idx}|{PROMPT_VERSION}|{ch}"
|
|
||||||
|
|
||||||
|
|
||||||
def _cache_get(key: str) -> Optional[bool]:
|
|
||||||
try:
|
|
||||||
with sqlite3.connect(_CACHE_DB) as c:
|
|
||||||
c.execute("create table if not exists judge(k text primary key, met int)")
|
|
||||||
row = c.execute("select met from judge where k=?", (key,)).fetchone()
|
|
||||||
return None if row is None else bool(row[0])
|
|
||||||
except Exception:
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
def _cache_put(key: str, met: bool) -> None:
|
|
||||||
try:
|
|
||||||
with sqlite3.connect(_CACHE_DB) as c:
|
|
||||||
c.execute("create table if not exists judge(k text primary key, met int)")
|
|
||||||
c.execute("insert or replace into judge values(?,?)", (key, int(met)))
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning("tiered judge cache put: %s", e)
|
|
||||||
|
|
||||||
|
|
||||||
async def prepare_doc(text: str) -> dict[str, Any]:
|
|
||||||
"""Doc EINMAL pro Scan einbetten. Liefert {hash, chunk_vecs}. Bei Embedding-
|
|
||||||
Ausfall: chunk_vecs=None → EMBEDDING-Kriterien werden UNBESTIMMT (Fallback)."""
|
|
||||||
ctx: dict[str, Any] = {"hash": _doc_hash(text or ""), "chunk_vecs": None}
|
|
||||||
if not text or len(text) < 100:
|
|
||||||
return ctx
|
|
||||||
try:
|
|
||||||
from compliance.services.mc_embedding_matcher import DIM, _chunk_text, _embed_texts
|
|
||||||
vecs = await asyncio.wait_for(_embed_texts(_chunk_text(text)), timeout=90.0)
|
|
||||||
ctx["chunk_vecs"] = [v for v in vecs if v and len(v) == DIM]
|
|
||||||
except (Exception, asyncio.TimeoutError) as e:
|
|
||||||
logger.warning("tiered prepare_doc embedding inaktiv: %s", e)
|
|
||||||
return ctx
|
|
||||||
|
|
||||||
|
|
||||||
async def _embed_present(crits: list[str], ctx: dict, thr: float) -> dict[str, Optional[bool]]:
|
|
||||||
cvecs = ctx.get("chunk_vecs")
|
|
||||||
if not cvecs:
|
|
||||||
return {c: None for c in crits}
|
|
||||||
try:
|
|
||||||
from compliance.services.mc_embedding_matcher import DIM, _cosine, _embed_texts
|
|
||||||
pv = await _embed_texts(crits)
|
|
||||||
out: dict[str, Optional[bool]] = {}
|
|
||||||
for crit, v in zip(crits, pv):
|
|
||||||
if not v or len(v) != DIM:
|
|
||||||
out[crit] = None
|
|
||||||
else:
|
|
||||||
out[crit] = max((_cosine(v, cv) for cv in cvecs), default=0.0) >= thr
|
|
||||||
return out
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning("tiered embed present: %s", e)
|
|
||||||
return {c: None for c in crits}
|
|
||||||
|
|
||||||
|
|
||||||
async def _llm_met(cid: str, idx: int, crit: str, doc, dh: str) -> Optional[bool]:
|
|
||||||
key = _ckey(dh, cid, idx, crit)
|
|
||||||
cached = _cache_get(key)
|
|
||||||
if cached is not None:
|
|
||||||
return cached
|
|
||||||
from compliance.services.checkers.router import build_spec, route_and_check
|
|
||||||
spec = build_spec(cid, {"verification_method": "CONTENT", "decision_method": "LLM"},
|
|
||||||
label=crit, criteria=[crit])
|
|
||||||
res = await route_and_check(spec, doc)
|
|
||||||
if res.present is None:
|
|
||||||
return None
|
|
||||||
_cache_put(key, bool(res.present))
|
|
||||||
return bool(res.present)
|
|
||||||
|
|
||||||
|
|
||||||
def _status(lm_vals: list[Optional[bool]]) -> str:
|
|
||||||
if not lm_vals:
|
|
||||||
return "ERFÜLLT" # kein gesetzliches Minimum → nie rot
|
|
||||||
if any(m is None for m in lm_vals):
|
|
||||||
return "UNBESTIMMT" # Aufrufer behält Legacy
|
|
||||||
n = sum(1 for m in lm_vals if m)
|
|
||||||
if n == len(lm_vals):
|
|
||||||
return "ERFÜLLT"
|
|
||||||
return "FEHLT" if n == 0 else "TEILWEISE"
|
|
||||||
|
|
||||||
|
|
||||||
async def evaluate_tiered(control_id: str, tiered_criteria: list[dict],
|
|
||||||
ctx: dict, doc) -> dict[str, Any]:
|
|
||||||
dh = ctx.get("hash") or _doc_hash(getattr(doc, "text", "") or "")
|
|
||||||
emb_texts = [c["criterion"] for c in (tiered_criteria or [])
|
|
||||||
if c.get("criterion")
|
|
||||||
and (c.get("decision_method") or "EMBEDDING").upper() != "LLM"]
|
|
||||||
emb_res = await _embed_present(emb_texts, ctx, _EMBED_THR) if emb_texts else {}
|
|
||||||
|
|
||||||
lm_vals: list[Optional[bool]] = []
|
|
||||||
recs: list[dict] = []
|
|
||||||
detail: list[dict] = []
|
|
||||||
for idx, c in enumerate(tiered_criteria or []):
|
|
||||||
crit = c.get("criterion") or ""
|
|
||||||
if not crit:
|
|
||||||
continue
|
|
||||||
tier = (c.get("compliance_tier") or "").upper()
|
|
||||||
if (c.get("decision_method") or "EMBEDDING").upper() == "LLM":
|
|
||||||
met = await _llm_met(control_id, idx, crit, doc, dh)
|
|
||||||
src = "haiku-cache"
|
|
||||||
else:
|
|
||||||
met = emb_res.get(crit)
|
|
||||||
src = "embedding"
|
|
||||||
detail.append({"criterion": crit, "tier": tier, "met": met, "source": src})
|
|
||||||
if tier == LM:
|
|
||||||
lm_vals.append(met)
|
|
||||||
elif met is False:
|
|
||||||
recs.append({"criterion": crit, "tier": tier or "OPTIONAL",
|
|
||||||
"legal_basis": c.get("legal_basis")})
|
|
||||||
|
|
||||||
return {"status": _status(lm_vals), "lm_met": sum(1 for m in lm_vals if m),
|
|
||||||
"lm_total": len(lm_vals), "recommendations": recs, "detail": detail}
|
|
||||||
|
|
||||||
|
|
||||||
async def fetch_tiered_criteria(cids: list[str], db_url: str = "") -> dict[str, list]:
|
|
||||||
"""tiered_criteria der angegebenen Controls aus canonical_controls laden.
|
|
||||||
Leeres Dict bei Fehler/keiner DB (Fallback: kein Tiering, Legacy trägt)."""
|
|
||||||
cids = [c for c in cids if c]
|
|
||||||
if not cids:
|
|
||||||
return {}
|
|
||||||
import json
|
|
||||||
dsn = db_url or os.getenv("DATABASE_URL") or os.getenv("COMPLIANCE_DATABASE_URL")
|
|
||||||
if not dsn:
|
|
||||||
return {}
|
|
||||||
try:
|
|
||||||
import asyncpg
|
|
||||||
conn = await asyncpg.connect(dsn)
|
|
||||||
rows = await conn.fetch(
|
|
||||||
"select control_id, generation_metadata->'tiered_criteria' tc "
|
|
||||||
"from compliance.canonical_controls "
|
|
||||||
"where control_id = any($1::text[]) "
|
|
||||||
"and generation_metadata ? 'tiered_criteria'", cids)
|
|
||||||
await conn.close()
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning("fetch_tiered_criteria failed: %s", e)
|
|
||||||
return {}
|
|
||||||
out: dict[str, list] = {}
|
|
||||||
for r in rows:
|
|
||||||
tc = r["tc"]
|
|
||||||
tc = json.loads(tc) if isinstance(tc, str) else tc
|
|
||||||
if tc:
|
|
||||||
out[r["control_id"]] = tc
|
|
||||||
return out
|
|
||||||
@@ -129,58 +129,16 @@ async def run_v3_pipeline(
|
|||||||
r["source"] = (r.get("source") or "") + "+embedding"
|
r["source"] = (r.get("source") or "") + "+embedding"
|
||||||
embedding_passes += 1
|
embedding_passes += 1
|
||||||
|
|
||||||
# Layer 3: getierte 3-Status-Auswertung (nur Controls mit tiered_criteria).
|
|
||||||
# Reproduzierbar: EMBEDDING-Präsenz (deterministisch) + GECACHTER Haiku-Judge
|
|
||||||
# nur für Sufficiency. UNBESTIMMT → Legacy-Pass bleibt. Gated + fail-safe.
|
|
||||||
tiered_evaluated = 0
|
|
||||||
try:
|
|
||||||
from compliance.services.checkers.base import DocContext
|
|
||||||
from ._tiered_eval import (
|
|
||||||
evaluate_tiered, fetch_tiered_criteria, prepare_doc,
|
|
||||||
)
|
|
||||||
result_cids = [r.get("control_id") for r in results if r.get("control_id")]
|
|
||||||
tiered_map = await fetch_tiered_criteria(result_cids, db_url)
|
|
||||||
if tiered_map:
|
|
||||||
ctx = await prepare_doc(text)
|
|
||||||
doc_ctx = DocContext(text=text)
|
|
||||||
for r in results:
|
|
||||||
tc = tiered_map.get(r.get("control_id"))
|
|
||||||
if not tc:
|
|
||||||
continue
|
|
||||||
ev = await evaluate_tiered(r["control_id"], tc, ctx, doc_ctx)
|
|
||||||
if ev["status"] == "UNBESTIMMT":
|
|
||||||
continue
|
|
||||||
r["compliance_status"] = ev["status"]
|
|
||||||
r["recommendations"] = ev["recommendations"]
|
|
||||||
r["tier_lm"] = f"{ev['lm_met']}/{ev['lm_total']}"
|
|
||||||
r["passed"] = ev["status"] == "ERFÜLLT"
|
|
||||||
tiered_evaluated += 1
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning("dse tiered eval skipped: %s", e)
|
|
||||||
|
|
||||||
# Layer 4 (SHADOW): Obligation-Aggregation NUR in die Telemetrie. Greift NICHT
|
|
||||||
# in `results` ein — nutzer-sichtbare Findings bleiben unverändert. Liefert die
|
|
||||||
# Vergleichszahlen für den späteren Umschalt-Entscheid (collapse_factor etc.).
|
|
||||||
obligation_shadow: dict[str, Any] = {}
|
|
||||||
try:
|
|
||||||
from ._obligation_shadow import build_obligation_shadow
|
|
||||||
obligation_shadow = await build_obligation_shadow(results, text, db_url)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning("dse obligation shadow skipped: %s", e)
|
|
||||||
obligation_shadow = {"error": str(e)}
|
|
||||||
|
|
||||||
telemetry = {
|
telemetry = {
|
||||||
"layer_0_field_hits": len(boost_field_ids),
|
"layer_0_field_hits": len(boost_field_ids),
|
||||||
"layer_0_field_ids": boost_field_ids,
|
"layer_0_field_ids": boost_field_ids,
|
||||||
"layer_1_pass": layer_1_pass,
|
"layer_1_pass": layer_1_pass,
|
||||||
"embedding_passes": embedding_passes,
|
"embedding_passes": embedding_passes,
|
||||||
"tiered_evaluated": tiered_evaluated,
|
|
||||||
"total_mcs": len(results),
|
"total_mcs": len(results),
|
||||||
"sector_dropped": drop_stats.get("sector_dropped", 0),
|
"sector_dropped": drop_stats.get("sector_dropped", 0),
|
||||||
"offtopic_dropped": drop_stats.get("offtopic_dropped", 0),
|
"offtopic_dropped": drop_stats.get("offtopic_dropped", 0),
|
||||||
"gate_excluded": len(organizational),
|
"gate_excluded": len(organizational),
|
||||||
"organizational_checklist": organizational,
|
"organizational_checklist": organizational,
|
||||||
"obligation_shadow": obligation_shadow,
|
|
||||||
}
|
}
|
||||||
logger.info("dse v3 telemetry: %s", telemetry)
|
logger.info("dse v3 telemetry: %s", telemetry)
|
||||||
return results, telemetry
|
return results, telemetry
|
||||||
|
|||||||
@@ -1,51 +0,0 @@
|
|||||||
"""Prüfer-Router: build_spec aus sensor_classification + method-agnostischer
|
|
||||||
Dispatch. CONTENT/LLM -> Haiku-Sufficiency-Tier (validiert), unbekannte
|
|
||||||
decision_methods -> fail-safe present=None."""
|
|
||||||
import pytest
|
|
||||||
from unittest.mock import AsyncMock, patch
|
|
||||||
|
|
||||||
from compliance.services.checkers.base import DocContext
|
|
||||||
from compliance.services.checkers.router import build_spec, route_and_check
|
|
||||||
|
|
||||||
_ANTHROPIC = "compliance.services.llm_cascade._call_anthropic"
|
|
||||||
|
|
||||||
|
|
||||||
def test_build_spec_content_llm_uses_haiku():
|
|
||||||
s = build_spec("X", {"verification_method": "CONTENT", "decision_method": "LLM"},
|
|
||||||
label="L", criteria=["a", "b"])
|
|
||||||
assert s.verification_method == "CONTENT" and s.decision_method == "LLM"
|
|
||||||
assert s.extra.get("judge") == "haiku"
|
|
||||||
assert s.paraphrases == ["a", "b"]
|
|
||||||
|
|
||||||
|
|
||||||
def test_build_spec_embedding_no_haiku():
|
|
||||||
s = build_spec("X", {"verification_method": "CONTENT", "decision_method": "EMBEDDING"})
|
|
||||||
assert s.extra.get("judge") is None
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_route_unknown_decision_is_failsafe():
|
|
||||||
s = build_spec("X", {"verification_method": "BEHAVIOR", "decision_method": "PLAYWRIGHT"})
|
|
||||||
r = await route_and_check(s, DocContext(text="x" * 200))
|
|
||||||
assert r.present is None and "no_checker" in r.source
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_route_content_llm_haiku_fehlt():
|
|
||||||
s = build_spec("X", {"verification_method": "CONTENT", "decision_method": "LLM"},
|
|
||||||
label="Speicherdauer", criteria=["Höchstdauer pro Kategorie"])
|
|
||||||
fake = AsyncMock(return_value='{"erfuellt": false, "confidence": 0.9, "begruendung": "fehlt"}')
|
|
||||||
with patch(_ANTHROPIC, new=fake):
|
|
||||||
r = await route_and_check(s, DocContext(text="Wir nutzen Cookies. " * 30))
|
|
||||||
assert r.present is False and r.source == "haiku"
|
|
||||||
assert fake.call_count >= 1
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_route_content_llm_haiku_erfuellt():
|
|
||||||
s = build_spec("X", {"verification_method": "CONTENT", "decision_method": "LLM"},
|
|
||||||
label="L", criteria=["x"])
|
|
||||||
fake = AsyncMock(return_value='{"erfuellt": true, "confidence": 0.8}')
|
|
||||||
with patch(_ANTHROPIC, new=fake):
|
|
||||||
r = await route_and_check(s, DocContext(text="text " * 40))
|
|
||||||
assert r.present is True
|
|
||||||
@@ -1,42 +0,0 @@
|
|||||||
"""Tests for the cookie-policy applicability gate: controls without a
|
|
||||||
COOKIE_POLICY artifact are routed out of the findings scan (not deleted),
|
|
||||||
and the gate is fail-safe (no DSN -> no filter)."""
|
|
||||||
import pytest
|
|
||||||
|
|
||||||
from compliance.services.specialist_agents.cookie_policy._classification_gate import (
|
|
||||||
apply_gate, load_cookie_gate,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def test_apply_gate_splits_kept_and_routed():
|
|
||||||
controls = [
|
|
||||||
{"control_id": "COOK-1", "title": "Kategorien"},
|
|
||||||
{"control_id": "TOM-1", "title": "Verschlüsselung"},
|
|
||||||
{"control_id": "BAN-1", "title": "Consent vor Setzen"},
|
|
||||||
]
|
|
||||||
gate = {
|
|
||||||
"TOM-1": {"obligation_type": "TECHNICAL", "check_intent": "DIRECT_TECHNICAL",
|
|
||||||
"applicable_artifacts": ["TOM", "AUDIT"]},
|
|
||||||
"BAN-1": {"obligation_type": "TECHNICAL", "check_intent": "DIRECT_TECHNICAL",
|
|
||||||
"applicable_artifacts": ["COOKIE_BANNER", "SYSTEMSCAN"]},
|
|
||||||
}
|
|
||||||
kept, routed = apply_gate(controls, gate)
|
|
||||||
assert [c["control_id"] for c in kept] == ["COOK-1"]
|
|
||||||
assert {c["control_id"] for c in routed} == {"TOM-1", "BAN-1"}
|
|
||||||
# routed entries carry title + classification metadata for downstream routing
|
|
||||||
tom = next(c for c in routed if c["control_id"] == "TOM-1")
|
|
||||||
assert tom["title"] == "Verschlüsselung"
|
|
||||||
assert tom["applicable_artifacts"] == ["TOM", "AUDIT"]
|
|
||||||
|
|
||||||
|
|
||||||
def test_apply_gate_empty_gate_keeps_all():
|
|
||||||
controls = [{"control_id": "A"}, {"control_id": "B"}]
|
|
||||||
kept, routed = apply_gate(controls, {})
|
|
||||||
assert len(kept) == 2 and routed == []
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_load_cookie_gate_no_dsn_is_failsafe(monkeypatch):
|
|
||||||
monkeypatch.delenv("DATABASE_URL", raising=False)
|
|
||||||
monkeypatch.delenv("COMPLIANCE_DATABASE_URL", raising=False)
|
|
||||||
assert await load_cookie_gate("") == {}
|
|
||||||
@@ -1,68 +0,0 @@
|
|||||||
"""Layer-3 cookie sufficiency-judge: only embedding/boost-RESCUED passes are
|
|
||||||
re-judged by Haiku; keyword passes are untouched; a FEHLT verdict un-passes."""
|
|
||||||
import pytest
|
|
||||||
from unittest.mock import AsyncMock, patch
|
|
||||||
|
|
||||||
from compliance.services.specialist_agents.cookie_policy._sufficiency_judge import (
|
|
||||||
judge_rescued,
|
|
||||||
)
|
|
||||||
|
|
||||||
_ANTHROPIC = "compliance.services.llm_cascade._call_anthropic"
|
|
||||||
_DOC = "Volltext der Cookie-Richtlinie mit ausreichend Inhalt. " * 4
|
|
||||||
|
|
||||||
|
|
||||||
def _r(cid, source, passed=True):
|
|
||||||
return {"control_id": cid, "source": source, "passed": passed,
|
|
||||||
"label": cid, "_pass_criteria": ["konkrete Angabe nötig"]}
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_rescued_unpassed_when_judge_fehlt():
|
|
||||||
results = [_r("A", "keyword+embedding")]
|
|
||||||
fake = AsyncMock(return_value='{"erfuellt": false, "confidence": 0.9, "begruendung": "fehlt"}')
|
|
||||||
with patch(_ANTHROPIC, new=fake):
|
|
||||||
n = await judge_rescued(_DOC, results)
|
|
||||||
assert n == 1
|
|
||||||
assert results[0]["passed"] is False
|
|
||||||
assert "+llm_failed" in results[0]["source"]
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_rescued_kept_when_judge_erfuellt():
|
|
||||||
results = [_r("A", "keyword+embedding")]
|
|
||||||
fake = AsyncMock(return_value='{"erfuellt": true, "confidence": 0.9}')
|
|
||||||
with patch(_ANTHROPIC, new=fake):
|
|
||||||
n = await judge_rescued(_DOC, results)
|
|
||||||
assert n == 0
|
|
||||||
assert results[0]["passed"] is True
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_keyword_pass_not_judged():
|
|
||||||
"""Deterministisch (keyword) bestandene Controls werden NICHT befragt."""
|
|
||||||
results = [_r("A", "keyword")]
|
|
||||||
fake = AsyncMock(return_value='{"erfuellt": false}')
|
|
||||||
with patch(_ANTHROPIC, new=fake):
|
|
||||||
n = await judge_rescued(_DOC, results)
|
|
||||||
assert n == 0
|
|
||||||
assert results[0]["passed"] is True
|
|
||||||
assert fake.call_count == 0
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_boost_rescue_is_judged():
|
|
||||||
results = [_r("A", "keyword+regex_boost")]
|
|
||||||
fake = AsyncMock(return_value='{"erfuellt": false}')
|
|
||||||
with patch(_ANTHROPIC, new=fake):
|
|
||||||
n = await judge_rescued(_DOC, results)
|
|
||||||
assert n == 1 and results[0]["passed"] is False
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_failed_controls_ignored():
|
|
||||||
"""Nicht-bestandene (failed) Controls sind nicht Sache dieser Schicht."""
|
|
||||||
results = [_r("A", "keyword+embedding", passed=False)]
|
|
||||||
fake = AsyncMock(return_value='{"erfuellt": false}')
|
|
||||||
with patch(_ANTHROPIC, new=fake):
|
|
||||||
n = await judge_rescued(_DOC, results)
|
|
||||||
assert n == 0 and fake.call_count == 0
|
|
||||||
@@ -1,77 +0,0 @@
|
|||||||
"""Regression tests for the OVH (gpt-oss-120b) tier of the LLM cascade.
|
|
||||||
|
|
||||||
gpt-oss-120b is a reasoning model: it spends output tokens on chain-of-thought
|
|
||||||
before the answer. Two bugs this pins:
|
|
||||||
1. A small max_tokens (deep_check passed 400) length-caps it mid-reasoning →
|
|
||||||
content=null → the tier silently returns nothing. _call_ovh must floor the
|
|
||||||
budget so reasoning + the JSON answer fit.
|
|
||||||
2. When length-capped, the JSON can land in reasoning_content, not content →
|
|
||||||
_call_ovh must fall back to reasoning_content.
|
|
||||||
"""
|
|
||||||
import pytest
|
|
||||||
from unittest.mock import AsyncMock, MagicMock, patch
|
|
||||||
|
|
||||||
from compliance.services import llm_cascade
|
|
||||||
|
|
||||||
|
|
||||||
def _resp(data):
|
|
||||||
r = MagicMock()
|
|
||||||
r.raise_for_status = MagicMock()
|
|
||||||
r.json = MagicMock(return_value=data)
|
|
||||||
return r
|
|
||||||
|
|
||||||
|
|
||||||
def _client(resp):
|
|
||||||
inst = AsyncMock()
|
|
||||||
inst.post.return_value = resp
|
|
||||||
inst.__aenter__ = AsyncMock(return_value=inst)
|
|
||||||
inst.__aexit__ = AsyncMock(return_value=False)
|
|
||||||
return inst
|
|
||||||
|
|
||||||
|
|
||||||
class TestCallOvhReasoning:
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_reasoning_content_used_when_content_null(self, monkeypatch):
|
|
||||||
monkeypatch.setenv("OVH_LLM_URL", "https://llm.example.com")
|
|
||||||
monkeypatch.setenv("OVH_LLM_MODEL", "gpt-oss-120b")
|
|
||||||
monkeypatch.setenv("OVH_LLM_KEY", "k")
|
|
||||||
resp = _resp({"choices": [{"message": {
|
|
||||||
"content": None,
|
|
||||||
"reasoning_content": '{"erfuellt": true, "confidence": 0.9}'}}]})
|
|
||||||
with patch("httpx.AsyncClient", return_value=_client(resp)):
|
|
||||||
out = await llm_cascade._call_ovh("sys", "user", max_tokens=400)
|
|
||||||
assert '"erfuellt": true' in out
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_small_budget_is_floored(self, monkeypatch):
|
|
||||||
monkeypatch.setenv("OVH_LLM_URL", "https://llm.example.com")
|
|
||||||
monkeypatch.setenv("OVH_LLM_MODEL", "gpt-oss-120b")
|
|
||||||
inst = _client(_resp({"choices": [{"message": {"content": "{}"}}]}))
|
|
||||||
with patch("httpx.AsyncClient", return_value=inst):
|
|
||||||
await llm_cascade._call_ovh("sys", "user", max_tokens=400)
|
|
||||||
assert inst.post.call_args.kwargs["json"]["max_tokens"] >= 2000
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_large_budget_is_preserved(self, monkeypatch):
|
|
||||||
monkeypatch.setenv("OVH_LLM_URL", "https://llm.example.com")
|
|
||||||
monkeypatch.setenv("OVH_LLM_MODEL", "gpt-oss-120b")
|
|
||||||
inst = _client(_resp({"choices": [{"message": {"content": "{}"}}]}))
|
|
||||||
with patch("httpx.AsyncClient", return_value=inst):
|
|
||||||
await llm_cascade._call_ovh("sys", "user", max_tokens=6000)
|
|
||||||
assert inst.post.call_args.kwargs["json"]["max_tokens"] == 6000
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_content_preferred_when_present(self, monkeypatch):
|
|
||||||
monkeypatch.setenv("OVH_LLM_URL", "https://llm.example.com")
|
|
||||||
monkeypatch.setenv("OVH_LLM_MODEL", "gpt-oss-120b")
|
|
||||||
resp = _resp({"choices": [{"message": {
|
|
||||||
"content": '{"erfuellt": false}', "reasoning_content": "noise"}}]})
|
|
||||||
with patch("httpx.AsyncClient", return_value=_client(resp)):
|
|
||||||
out = await llm_cascade._call_ovh("sys", "user")
|
|
||||||
assert out == '{"erfuellt": false}'
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
|
||||||
async def test_unconfigured_returns_empty(self, monkeypatch):
|
|
||||||
monkeypatch.delenv("OVH_LLM_URL", raising=False)
|
|
||||||
monkeypatch.delenv("OVH_LLM_MODEL", raising=False)
|
|
||||||
assert await llm_cascade._call_ovh("sys", "user") == ""
|
|
||||||
@@ -1,153 +0,0 @@
|
|||||||
"""Unit-Tests Obligation Aggregation Engine (Legal Obligation Layer v1).
|
|
||||||
|
|
||||||
Deckt die fail-safe Regeln + den Redundanz-Kollaps ab (echte DSE-Szenarien:
|
|
||||||
recipients 9×, objection LM+BP, portability OPTIONAL-Format)."""
|
|
||||||
from compliance.services.obligation_aggregation import (
|
|
||||||
BP, LM, OPT, CriterionEval, aggregate_obligation, aggregate_obligations,
|
|
||||||
evals_from_tiered, summarize,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def _ce(oid, tier, met, cid, basis="", crit="", cond=None):
|
|
||||||
return CriterionEval(oid, tier, met, cid, basis, crit, cond)
|
|
||||||
|
|
||||||
|
|
||||||
class TestRedundancyCollapse:
|
|
||||||
def test_nine_controls_one_confirms_collapses_to_one_met(self):
|
|
||||||
# recipients_disclosed: 9 Controls, gleiche Anforderung (Art 13(1)(e))
|
|
||||||
evals = [_ce("recipients_disclosed", LM, i == 4, f"DATA-{i}", "Art. 13(1)(e)")
|
|
||||||
for i in range(9)]
|
|
||||||
res = aggregate_obligation("recipients_disclosed", evals)
|
|
||||||
assert res.status == "MET"
|
|
||||||
assert res.lm_met == 1 and res.lm_total == 1 # 9 → 1 Anforderung
|
|
||||||
assert len(res.evidence) == 9
|
|
||||||
|
|
||||||
def test_all_nine_absent_fails_once(self):
|
|
||||||
evals = [_ce("recipients_disclosed", LM, False, f"DATA-{i}", "Art. 13(1)(e)")
|
|
||||||
for i in range(9)]
|
|
||||||
res = aggregate_obligation("recipients_disclosed", evals)
|
|
||||||
assert res.status == "FAILED"
|
|
||||||
assert res.bucket == "PFLICHT"
|
|
||||||
|
|
||||||
|
|
||||||
class TestPartialMultiFacet:
|
|
||||||
def test_two_distinct_lm_requirements_one_met_is_partial(self):
|
|
||||||
evals = [
|
|
||||||
_ce("transfer", LM, True, "C1", "Art. 13(1)(f)"), # erfüllt
|
|
||||||
_ce("transfer", LM, False, "C2", "Art. 46"), # fehlt → distinkt
|
|
||||||
]
|
|
||||||
res = aggregate_obligation("transfer", evals)
|
|
||||||
assert res.status == "PARTIAL"
|
|
||||||
assert res.lm_met == 1 and res.lm_total == 2
|
|
||||||
|
|
||||||
def test_both_distinct_requirements_met(self):
|
|
||||||
evals = [
|
|
||||||
_ce("transfer", LM, True, "C1", "Art. 13(1)(f)"),
|
|
||||||
_ce("transfer", LM, True, "C2", "Art. 46"),
|
|
||||||
]
|
|
||||||
assert aggregate_obligation("transfer", evals).status == "MET"
|
|
||||||
|
|
||||||
|
|
||||||
class TestApplicability:
|
|
||||||
def test_conditional_false_is_na(self):
|
|
||||||
evals = [_ce("transfer", LM, False, "C1", "Art. 44", cond="has_third_country_transfer")]
|
|
||||||
res = aggregate_obligation("transfer", evals, applicable_fn=lambda c, t: False)
|
|
||||||
assert res.status == "NA"
|
|
||||||
assert res.bucket == "NICHT_ANWENDBAR"
|
|
||||||
assert res.applicable is False
|
|
||||||
|
|
||||||
def test_conditional_true_evaluates_normally(self):
|
|
||||||
evals = [_ce("transfer", LM, False, "C1", "Art. 44", cond="has_third_country_transfer")]
|
|
||||||
res = aggregate_obligation("transfer", evals, applicable_fn=lambda c, t: True)
|
|
||||||
assert res.status == "FAILED"
|
|
||||||
|
|
||||||
def test_conditional_unknown_defaults_applicable(self):
|
|
||||||
evals = [_ce("transfer", LM, True, "C1", "Art. 44", cond="x")]
|
|
||||||
res = aggregate_obligation("transfer", evals, applicable_fn=lambda c, t: None)
|
|
||||||
assert res.applicable is True and res.status == "MET"
|
|
||||||
|
|
||||||
def test_no_predicate_means_applicable(self):
|
|
||||||
evals = [_ce("transfer", LM, True, "C1", cond="x")]
|
|
||||||
assert aggregate_obligation("transfer", evals).applicable is True
|
|
||||||
|
|
||||||
|
|
||||||
class TestUndetermined:
|
|
||||||
def test_all_lm_none_is_undetermined(self):
|
|
||||||
evals = [_ce("ob", LM, None, "C1", "b"), _ce("ob", LM, None, "C2", "b")]
|
|
||||||
res = aggregate_obligation("ob", evals)
|
|
||||||
assert res.status == "UNDETERMINED"
|
|
||||||
assert res.bucket == "PFLICHT"
|
|
||||||
|
|
||||||
def test_one_determinable_requirement_decides(self):
|
|
||||||
# eine Anforderung unbestimmt, die andere klar erfüllt → MET über die bewertbare
|
|
||||||
evals = [_ce("ob", LM, None, "C1", "b1"), _ce("ob", LM, True, "C2", "b2")]
|
|
||||||
res = aggregate_obligation("ob", evals)
|
|
||||||
assert res.status == "MET"
|
|
||||||
assert res.lm_total == 1 # nur die bewertbare Anforderung zählt
|
|
||||||
|
|
||||||
|
|
||||||
class TestBestPracticeOnly:
|
|
||||||
def test_pure_bp_covered_is_met_recommendation_bucket(self):
|
|
||||||
evals = [_ce("art20_format", OPT, True, "C1")]
|
|
||||||
res = aggregate_obligation("art20_format", evals)
|
|
||||||
assert res.status == "MET"
|
|
||||||
assert res.bucket == "EMPFEHLUNG"
|
|
||||||
|
|
||||||
def test_pure_bp_not_covered_is_open_never_failed(self):
|
|
||||||
evals = [_ce("art20_format", OPT, False, "C1", crit="JSON/CSV")]
|
|
||||||
res = aggregate_obligation("art20_format", evals)
|
|
||||||
assert res.status == "OPEN"
|
|
||||||
assert res.bucket == "EMPFEHLUNG"
|
|
||||||
assert len(res.recommendations) == 1
|
|
||||||
|
|
||||||
|
|
||||||
class TestRecommendationsWithinLm:
|
|
||||||
def test_unmet_bp_in_lm_obligation_becomes_recommendation(self):
|
|
||||||
# objection_direct_marketing: LM erfüllt + 3 BP teils offen
|
|
||||||
evals = [
|
|
||||||
_ce("obj_dm", LM, True, "SEC-8410", "Art. 21(2)", "Recht"),
|
|
||||||
_ce("obj_dm", BP, False, "SEC-8410", "", "Kontaktweg"),
|
|
||||||
_ce("obj_dm", BP, True, "SEC-8410", "", "kostenlos"),
|
|
||||||
]
|
|
||||||
res = aggregate_obligation("obj_dm", evals)
|
|
||||||
assert res.status == "MET" and res.bucket == "PFLICHT"
|
|
||||||
assert len(res.recommendations) == 1
|
|
||||||
assert res.recommendations[0]["criterion"] == "Kontaktweg"
|
|
||||||
|
|
||||||
|
|
||||||
class TestAdapterAndSummary:
|
|
||||||
def test_evals_from_tiered_zips_and_skips_no_obligation(self):
|
|
||||||
tc = [
|
|
||||||
{"criterion": "Recht", "compliance_tier": "LEGAL_MINIMUM",
|
|
||||||
"legal_basis": "Art. 21(1)", "obligation_id": "obj_gen"},
|
|
||||||
{"criterion": "Weg", "compliance_tier": "BEST_PRACTICE",
|
|
||||||
"legal_basis": "", "obligation_id": "obj_gen"},
|
|
||||||
{"criterion": "ohne", "compliance_tier": "OPTIONAL"}, # kein obligation_id → skip
|
|
||||||
]
|
|
||||||
detail = [{"met": True}, {"met": False}, {"met": True}]
|
|
||||||
evals = evals_from_tiered("AUTH-2051", tc, detail, conditional="x")
|
|
||||||
assert len(evals) == 2
|
|
||||||
assert evals[0].met is True and evals[0].conditional == "x"
|
|
||||||
assert evals[1].tier == BP and evals[1].met is False
|
|
||||||
|
|
||||||
def test_aggregate_obligations_groups_by_id(self):
|
|
||||||
evals = [
|
|
||||||
_ce("a", LM, True, "C1", "b"),
|
|
||||||
_ce("a", LM, True, "C2", "b"),
|
|
||||||
_ce("b", LM, False, "C3", "b"),
|
|
||||||
]
|
|
||||||
results = {r.obligation_id: r for r in aggregate_obligations(evals)}
|
|
||||||
assert set(results) == {"a", "b"}
|
|
||||||
assert results["a"].status == "MET"
|
|
||||||
assert results["b"].status == "FAILED"
|
|
||||||
|
|
||||||
def test_summarize_counts_buckets_and_failures(self):
|
|
||||||
evals = [
|
|
||||||
_ce("a", LM, False, "C1", "b"), # FAILED Pflicht
|
|
||||||
_ce("c", OPT, False, "C3", crit="x"), # OPEN Empfehlung
|
|
||||||
]
|
|
||||||
s = summarize(aggregate_obligations(evals))
|
|
||||||
assert s["obligations"] == 2
|
|
||||||
assert s["pflicht_failed"] == 1
|
|
||||||
assert s["buckets"]["PFLICHT"] == 1
|
|
||||||
assert s["buckets"]["EMPFEHLUNG"] == 1
|
|
||||||
@@ -1,57 +0,0 @@
|
|||||||
"""Unit-Tests für die minimalen Applicability-Prädikate."""
|
|
||||||
from compliance.services.obligation_applicability import (
|
|
||||||
applicable, direct_marketing, has_third_country_transfer,
|
|
||||||
uses_legitimate_interest,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class TestThirdCountry:
|
|
||||||
def test_drittland_present(self):
|
|
||||||
assert has_third_country_transfer("übermittlung in ein drittland erfolgt") is True
|
|
||||||
|
|
||||||
def test_scc_present(self):
|
|
||||||
assert has_third_country_transfer("auf basis der standardvertragsklauseln") is True
|
|
||||||
|
|
||||||
def test_absent(self):
|
|
||||||
assert has_third_country_transfer("verarbeitung nur innerhalb deutschlands") is False
|
|
||||||
|
|
||||||
|
|
||||||
class TestLegitimateInterest:
|
|
||||||
def test_present(self):
|
|
||||||
assert uses_legitimate_interest("auf grundlage unseres berechtigten interesses") is True
|
|
||||||
|
|
||||||
def test_absent(self):
|
|
||||||
assert uses_legitimate_interest("nur auf grundlage ihrer einwilligung") is False
|
|
||||||
|
|
||||||
|
|
||||||
class TestDirectMarketing:
|
|
||||||
def test_newsletter(self):
|
|
||||||
assert direct_marketing("anmeldung zum newsletter möglich") is True
|
|
||||||
|
|
||||||
def test_direktwerbung(self):
|
|
||||||
assert direct_marketing("daten für direktwerbung genutzt") is True
|
|
||||||
|
|
||||||
def test_absent(self):
|
|
||||||
assert direct_marketing("wir versenden keine werblichen inhalte ohne basis") is True # 'werbliche' trifft
|
|
||||||
|
|
||||||
def test_truly_absent(self):
|
|
||||||
assert direct_marketing("reine vertragsabwicklung") is False
|
|
||||||
|
|
||||||
|
|
||||||
class TestApplicableHook:
|
|
||||||
def test_known_predicate_true(self):
|
|
||||||
assert applicable("has_third_country_transfer", "Transfer in die USA") is True
|
|
||||||
|
|
||||||
def test_known_predicate_false_triggers_na(self):
|
|
||||||
assert applicable("has_third_country_transfer", "nur in der EU") is False
|
|
||||||
|
|
||||||
def test_public_task_alias(self):
|
|
||||||
assert applicable("legitimate_interest_or_public_task",
|
|
||||||
"zur ausübung öffentlicher gewalt") is True
|
|
||||||
|
|
||||||
def test_unknown_predicate_returns_none(self):
|
|
||||||
# profiling noch nicht modelliert → None → Aufrufer behält anwendbar
|
|
||||||
assert applicable("profiling", "irgendein text") is None
|
|
||||||
|
|
||||||
def test_case_insensitive(self):
|
|
||||||
assert applicable("uses_legitimate_interest", "BERECHTIGTES INTERESSE") is True
|
|
||||||
@@ -1,92 +0,0 @@
|
|||||||
"""Unit-Tests für die reinen Helfer der Obligation Discovery Pipeline (scripts/obligation_discovery/_core.py)."""
|
|
||||||
import pathlib
|
|
||||||
import sys
|
|
||||||
|
|
||||||
sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[2] / "scripts" / "obligation_discovery"))
|
|
||||||
|
|
||||||
from _core import ( # noqa: E402
|
|
||||||
centroid, cosine, greedy_cluster, merge_edges, parse_req, validate_registry,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class TestParseReq:
|
|
||||||
def test_list_passthrough(self):
|
|
||||||
assert parse_req(["a", "b"]) == ["a", "b"]
|
|
||||||
|
|
||||||
def test_python_repr_string(self):
|
|
||||||
assert parse_req("['x', 'y']") == ["x", "y"]
|
|
||||||
|
|
||||||
def test_json_string(self):
|
|
||||||
assert parse_req('["x", "y"]') == ["x", "y"]
|
|
||||||
|
|
||||||
def test_plain_string(self):
|
|
||||||
assert parse_req("just text") == ["just text"]
|
|
||||||
|
|
||||||
|
|
||||||
class TestCosine:
|
|
||||||
def test_identical(self):
|
|
||||||
assert cosine([1.0, 2.0, 3.0], [1.0, 2.0, 3.0]) > 0.999
|
|
||||||
|
|
||||||
def test_orthogonal(self):
|
|
||||||
assert abs(cosine([1.0, 0.0], [0.0, 1.0])) < 1e-9
|
|
||||||
|
|
||||||
def test_empty(self):
|
|
||||||
assert cosine([], [1.0]) == 0.0
|
|
||||||
|
|
||||||
|
|
||||||
class TestGreedyCluster:
|
|
||||||
def test_near_vectors_cluster_far_separate(self):
|
|
||||||
vecs = [[1.0, 0.0], [0.99, 0.01], [0.0, 1.0]]
|
|
||||||
clusters = greedy_cluster(vecs, 0.9)
|
|
||||||
assert len(clusters) == 2
|
|
||||||
assert clusters[0]["members"] == [0, 1]
|
|
||||||
assert clusters[1]["members"] == [2]
|
|
||||||
|
|
||||||
def test_deterministic(self):
|
|
||||||
vecs = [[1.0, 0.0], [0.5, 0.5], [0.99, 0.0]]
|
|
||||||
assert greedy_cluster(vecs, 0.8) == greedy_cluster(vecs, 0.8)
|
|
||||||
|
|
||||||
def test_none_vector_isolated(self):
|
|
||||||
clusters = greedy_cluster([[1.0, 0.0], None], 0.5)
|
|
||||||
assert clusters[1]["members"] == [1] and clusters[1]["seed"] is None
|
|
||||||
|
|
||||||
|
|
||||||
class TestCentroid:
|
|
||||||
def test_mean(self):
|
|
||||||
assert centroid([0, 1], [[0.0, 2.0], [2.0, 4.0]]) == [1.0, 3.0]
|
|
||||||
|
|
||||||
|
|
||||||
class TestValidateRegistry:
|
|
||||||
def _reg(self, obls, rels=None):
|
|
||||||
return {"obligations": obls, "relationships": rels or []}
|
|
||||||
|
|
||||||
def test_lm_without_legal_basis_fails(self):
|
|
||||||
r = self._reg([{"id": "x", "tier": "LEGAL_MINIMUM", "legal_basis": [], "member_controls": ["C1"]}])
|
|
||||||
v = validate_registry(r)
|
|
||||||
assert v["lm_without_legal_basis"] == ["x"] and v["passed"] is False
|
|
||||||
|
|
||||||
def test_clean_passes(self):
|
|
||||||
r = self._reg([{"id": "x", "tier": "LEGAL_MINIMUM", "legal_basis": [{"source": "CRA"}],
|
|
||||||
"member_controls": ["C1"], "provenance": {"source_meta_cluster": "M0"}}])
|
|
||||||
assert validate_registry(r)["passed"] is True
|
|
||||||
|
|
||||||
def test_over8_per_review_unit_flagged(self):
|
|
||||||
obls = [{"id": f"o{i}", "tier": "BEST_PRACTICE", "member_controls": ["C"],
|
|
||||||
"provenance": {"source_meta_cluster": "M0"}} for i in range(9)]
|
|
||||||
v = validate_registry(self._reg(obls))
|
|
||||||
assert v["over8_per_review_unit"] == {"M0": 9} and v["passed"] is False
|
|
||||||
|
|
||||||
def test_empty_member_controls_flagged(self):
|
|
||||||
v = validate_registry(self._reg([{"id": "x", "tier": "BEST_PRACTICE", "member_controls": []}]))
|
|
||||||
assert v["empty_member_controls"] == ["x"] and v["passed"] is False
|
|
||||||
|
|
||||||
|
|
||||||
class TestMergeEdges:
|
|
||||||
def test_dedup_and_semantic_only(self):
|
|
||||||
existing = [{"type": "supports", "from": "a", "to": "b"}]
|
|
||||||
proposed = [{"type": "supports", "from": "a", "to": "b"}, # dup
|
|
||||||
{"type": "depends_on", "from": "c", "to": "d"}, # new
|
|
||||||
{"type": "out_of_scope", "clusters": [1]}] # not semantic
|
|
||||||
merged, added = merge_edges(existing, proposed)
|
|
||||||
assert added == 1
|
|
||||||
assert {"type": "depends_on", "from": "c", "to": "d"} in merged
|
|
||||||
@@ -1,74 +0,0 @@
|
|||||||
"""Unit-Tests für die DSE Shadow-Verdrahtung (compute_obligation_shadow, pure)."""
|
|
||||||
from compliance.services.specialist_agents.dse._obligation_shadow import (
|
|
||||||
compute_obligation_shadow,
|
|
||||||
)
|
|
||||||
|
|
||||||
NON_LLM = "art20_right_exists_core" # nicht in der LLM_REQUIRED-Registry
|
|
||||||
LLM_REQ = "third_country_transfer_disclosed" # in der LLM_REQUIRED-Registry
|
|
||||||
|
|
||||||
|
|
||||||
def _markers(n, ob, cond=None):
|
|
||||||
return {f"C{i}": {"obl": [ob], "cond": cond} for i in range(n)}
|
|
||||||
|
|
||||||
|
|
||||||
class TestComputeShadow:
|
|
||||||
def test_collapse_and_delta(self):
|
|
||||||
results = [{"control_id": f"C{i}", "passed": False} for i in range(5)]
|
|
||||||
s = compute_obligation_shadow(results, "x", _markers(5, NON_LLM))
|
|
||||||
assert s["legacy_control_findings"] == 5
|
|
||||||
assert s["obligation_findings"] == 1 # 5 → 1
|
|
||||||
assert s["failed_by_current_checker"] == 1
|
|
||||||
assert s["recall_limited"] == 0
|
|
||||||
assert s["collapse_factor"] == 5.0
|
|
||||||
assert s["met_failed_delta"] == 4
|
|
||||||
assert s["met_count"] == 0
|
|
||||||
top = s["top_collapsed_obligations"][0]
|
|
||||||
assert top["obligation"] == NON_LLM and top["fehlt"] == 5
|
|
||||||
assert top["recall_limited"] is False
|
|
||||||
|
|
||||||
def test_fp_correction_one_passed_collapses_to_met(self):
|
|
||||||
results = [{"control_id": f"C{i}", "passed": i == 0} for i in range(5)]
|
|
||||||
s = compute_obligation_shadow(results, "x", _markers(5, NON_LLM))
|
|
||||||
assert s["legacy_control_findings"] == 4
|
|
||||||
assert s["obligation_findings"] == 0 # MET (anderswo erfüllt)
|
|
||||||
assert s["met_failed_delta"] == 4
|
|
||||||
|
|
||||||
def test_na_when_predicate_false(self):
|
|
||||||
results = [{"control_id": "C0", "passed": False}]
|
|
||||||
m = {"C0": {"obl": [LLM_REQ], "cond": "has_third_country_transfer"}}
|
|
||||||
s = compute_obligation_shadow(results, "nur innerhalb der eu", m)
|
|
||||||
assert s["na_count"] == 1
|
|
||||||
assert s["obligation_findings"] == 0 # NA statt FEHLT
|
|
||||||
|
|
||||||
def test_no_markers_returns_status(self):
|
|
||||||
s = compute_obligation_shadow([{"control_id": "C0", "passed": False}], "x", {})
|
|
||||||
assert "no obligation" in s["status"]
|
|
||||||
|
|
||||||
def test_does_not_mutate_results(self):
|
|
||||||
results = [{"control_id": "C0", "passed": False}]
|
|
||||||
compute_obligation_shadow(results, "x", _markers(1, NON_LLM))
|
|
||||||
assert set(results[0].keys()) == {"control_id", "passed"}
|
|
||||||
|
|
||||||
|
|
||||||
class TestRecallSegregation:
|
|
||||||
def test_llm_required_failed_is_recall_limited_not_real_gap(self):
|
|
||||||
# 5 verfehlte third_country-Controls, Transfer-Text vorhanden → FAILED,
|
|
||||||
# aber LLM_REQUIRED → RECALL_LIMITED, NICHT failed_by_current_checker.
|
|
||||||
results = [{"control_id": f"C{i}", "passed": False} for i in range(5)]
|
|
||||||
m = {f"C{i}": {"obl": [LLM_REQ], "cond": "has_third_country_transfer"}
|
|
||||||
for i in range(5)}
|
|
||||||
s = compute_obligation_shadow(results, "übermittlung in ein drittland", m)
|
|
||||||
assert s["obligation_findings"] == 1
|
|
||||||
assert s["recall_limited"] == 1
|
|
||||||
assert s["failed_by_current_checker"] == 0
|
|
||||||
assert s["recall_limited_obligations"] == [LLM_REQ]
|
|
||||||
assert s["top_collapsed_obligations"][0]["recall_limited"] is True
|
|
||||||
|
|
||||||
def test_mixed_real_gap_and_recall_limited(self):
|
|
||||||
results = [{"control_id": "A", "passed": False}, {"control_id": "B", "passed": False}]
|
|
||||||
m = {"A": {"obl": [NON_LLM], "cond": None},
|
|
||||||
"B": {"obl": [LLM_REQ], "cond": "has_third_country_transfer"}}
|
|
||||||
s = compute_obligation_shadow(results, "übermittlung in ein drittland", m)
|
|
||||||
assert s["obligation_findings"] == 2
|
|
||||||
assert s["failed_by_current_checker"] == 1
|
|
||||||
assert s["recall_limited"] == 1
|
|
||||||
@@ -1,20 +0,0 @@
|
|||||||
"""Unit-Tests für die Obligation-Taxonomie-Registry (decision_method_required)."""
|
|
||||||
from compliance.services.obligation_taxonomy import OBLIGATION_META, requires_llm
|
|
||||||
|
|
||||||
|
|
||||||
class TestRequiresLlm:
|
|
||||||
def test_marked_obligations_require_llm(self):
|
|
||||||
for ob in ("recipients_disclosed", "third_country_transfer_disclosed",
|
|
||||||
"safeguards_disclosed", "safeguards_accessible"):
|
|
||||||
assert requires_llm(ob) is True
|
|
||||||
|
|
||||||
def test_unmarked_obligation_does_not(self):
|
|
||||||
assert requires_llm("art20_right_exists_core") is False
|
|
||||||
assert requires_llm("objection_general_art21_1") is False
|
|
||||||
|
|
||||||
def test_unknown_obligation_is_false(self):
|
|
||||||
assert requires_llm("does_not_exist") is False
|
|
||||||
|
|
||||||
def test_registry_values_are_llm(self):
|
|
||||||
assert all(v.get("decision_method_required") == "LLM"
|
|
||||||
for v in OBLIGATION_META.values())
|
|
||||||
@@ -1,102 +0,0 @@
|
|||||||
"""Unit-Tests für die getierte 3-Status-Auswertung (_tiered_eval).
|
|
||||||
|
|
||||||
Deckt ab: Status-Logik (inkl. kein-LM → ERFÜLLT, UNBESTIMMT bei nicht bewertbar),
|
|
||||||
Empfehlungs-Sammlung, EMBEDDING/LLM-Routing (gemockt) und den Reproduzierbarkeits-
|
|
||||||
Cache. Embedding/LLM werden gemockt — kein Netzwerk."""
|
|
||||||
import asyncio
|
|
||||||
|
|
||||||
from compliance.services.specialist_agents.dse import _tiered_eval as te
|
|
||||||
|
|
||||||
|
|
||||||
# ---- reine Status-Logik -------------------------------------------------
|
|
||||||
def test_status_no_lm_is_erfuellt():
|
|
||||||
assert te._status([]) == "ERFÜLLT"
|
|
||||||
|
|
||||||
|
|
||||||
def test_status_all_met_erfuellt():
|
|
||||||
assert te._status([True, True]) == "ERFÜLLT"
|
|
||||||
|
|
||||||
|
|
||||||
def test_status_none_met_fehlt():
|
|
||||||
assert te._status([False, False]) == "FEHLT"
|
|
||||||
|
|
||||||
|
|
||||||
def test_status_partial_teilweise():
|
|
||||||
assert te._status([True, False]) == "TEILWEISE"
|
|
||||||
|
|
||||||
|
|
||||||
def test_status_any_none_unbestimmt():
|
|
||||||
assert te._status([True, None]) == "UNBESTIMMT"
|
|
||||||
|
|
||||||
|
|
||||||
# ---- evaluate_tiered (Embedding/LLM gemockt) ----------------------------
|
|
||||||
def _crit(text, tier, dm="EMBEDDING"):
|
|
||||||
return {"criterion": text, "compliance_tier": tier,
|
|
||||||
"decision_method": dm, "legal_basis": "x"}
|
|
||||||
|
|
||||||
|
|
||||||
class _Doc:
|
|
||||||
def __init__(self, text):
|
|
||||||
self.text = text
|
|
||||||
|
|
||||||
|
|
||||||
def test_evaluate_partial_with_recommendation(monkeypatch):
|
|
||||||
crits = [_crit("Zwecke genannt", "LEGAL_MINIMUM"),
|
|
||||||
_crit("Speicherdauer genannt", "LEGAL_MINIMUM"),
|
|
||||||
_crit("tabellarisch ausgewiesen", "BEST_PRACTICE")]
|
|
||||||
|
|
||||||
async def fake_embed(texts, ctx, thr):
|
|
||||||
return {"Zwecke genannt": True, "Speicherdauer genannt": False,
|
|
||||||
"tabellarisch ausgewiesen": False}
|
|
||||||
|
|
||||||
monkeypatch.setattr(te, "_embed_present", fake_embed)
|
|
||||||
out = asyncio.run(te.evaluate_tiered("C1", crits, {"hash": "h"}, _Doc("x" * 200)))
|
|
||||||
assert out["status"] == "TEILWEISE"
|
|
||||||
assert out["lm_met"] == 1 and out["lm_total"] == 2
|
|
||||||
assert len(out["recommendations"]) == 1
|
|
||||||
assert out["recommendations"][0]["tier"] == "BEST_PRACTICE"
|
|
||||||
|
|
||||||
|
|
||||||
def test_evaluate_no_lm_is_erfuellt_with_recs(monkeypatch):
|
|
||||||
crits = [_crit("Bildsymbole", "OPTIONAL"), _crit("Legende", "OPTIONAL")]
|
|
||||||
|
|
||||||
async def fake_embed(texts, ctx, thr):
|
|
||||||
return {t: False for t in texts}
|
|
||||||
|
|
||||||
monkeypatch.setattr(te, "_embed_present", fake_embed)
|
|
||||||
out = asyncio.run(te.evaluate_tiered("C2", crits, {"hash": "h"}, _Doc("x" * 200)))
|
|
||||||
assert out["status"] == "ERFÜLLT"
|
|
||||||
assert out["lm_total"] == 0
|
|
||||||
assert len(out["recommendations"]) == 2
|
|
||||||
|
|
||||||
|
|
||||||
def test_evaluate_llm_criterion_routed(monkeypatch):
|
|
||||||
crits = [_crit("Speicherdauer hinreichend nachvollziehbar", "LEGAL_MINIMUM", dm="LLM")]
|
|
||||||
|
|
||||||
async def fake_llm(cid, idx, crit, doc, dh):
|
|
||||||
return True
|
|
||||||
|
|
||||||
monkeypatch.setattr(te, "_llm_met", fake_llm)
|
|
||||||
out = asyncio.run(te.evaluate_tiered("C3", crits, {"hash": "h"}, _Doc("x" * 200)))
|
|
||||||
assert out["status"] == "ERFÜLLT" and out["lm_total"] == 1
|
|
||||||
|
|
||||||
|
|
||||||
def test_evaluate_unbestimmt_when_embed_unavailable(monkeypatch):
|
|
||||||
crits = [_crit("Zwecke genannt", "LEGAL_MINIMUM")]
|
|
||||||
|
|
||||||
async def fake_embed(texts, ctx, thr):
|
|
||||||
return {t: None for t in texts} # Embedding-Service down
|
|
||||||
|
|
||||||
monkeypatch.setattr(te, "_embed_present", fake_embed)
|
|
||||||
out = asyncio.run(te.evaluate_tiered("C4", crits, {"hash": "h"}, _Doc("x" * 200)))
|
|
||||||
assert out["status"] == "UNBESTIMMT"
|
|
||||||
|
|
||||||
|
|
||||||
# ---- Reproduzierbarkeits-Cache -----------------------------------------
|
|
||||||
def test_cache_roundtrip(monkeypatch, tmp_path):
|
|
||||||
monkeypatch.setattr(te, "_CACHE_DB", str(tmp_path / "cache.db"))
|
|
||||||
assert te._cache_get("k1") is None
|
|
||||||
te._cache_put("k1", True)
|
|
||||||
te._cache_put("k2", False)
|
|
||||||
assert te._cache_get("k1") is True
|
|
||||||
assert te._cache_get("k2") is False
|
|
||||||
@@ -1,41 +0,0 @@
|
|||||||
# 01 — Retrieval-Pipeline
|
|
||||||
|
|
||||||
**Zweck:** Einen Kandidaten-Pool bauen, der die *richtigen* Quellen enthält (Pflichtquelle **und** Controls) — auch dann, wenn reine Semantik sie verfehlen würde. Re-Ranking (02) kann nur ordnen, was im Pool liegt; deshalb ist der Pool-Aufbau die erste Verteidigungslinie gegen Recall-Lücken.
|
|
||||||
|
|
||||||
## Mechanik
|
|
||||||
|
|
||||||
`searchInternal()` (`legal_rag_client.go`) orchestriert den Pool in fester Reihenfolge — jede Stufe **augmentiert** (ersetzt nie), Fehler degradieren still:
|
|
||||||
|
|
||||||
1. **Embedding** — `bge-m3` (1024-dim) über Ollama, Query auf 2000 Zeichen gekappt.
|
|
||||||
2. **Hybrid (RRF)** — `searchHybrid()`: dense + Volltext via Qdrant Query-API, RRF-Fusion. Fällt bei Fehler auf `searchDense()` (reine Vektorsuche) zurück.
|
|
||||||
3. **Binding-Augmentation** — `searchBinding()`: zieht die Top-`source_class=binding_law`-Treffer dazu, **damit die Pflichtquelle immer Kandidat ist**, auch wenn Guidance semantisch dominiert.
|
|
||||||
4. **Control-Augmentation** — `searchControls()`: nur bei Control-Intent (siehe [05](05-control-intent.md)); tiefer dense-Pull, gefiltert auf Control-Pool-Rollen.
|
|
||||||
5. **Graph-Augmentation** — `expandViaGraph()`: **opt-in**; zieht verbundene Normen über Zitations-Kanten.
|
|
||||||
6. **Merge** — `mergeDedupHits()`: konkateniert, behält die erste Vorkommnis je Punkt-ID, Reihenfolge erhalten.
|
|
||||||
|
|
||||||
Danach: Map auf `LegalSearchResult` → Authority-Rerank (02) → Control-Diversity (05) → Truncate auf `topK`.
|
|
||||||
|
|
||||||
## Konstanten + Warum
|
|
||||||
|
|
||||||
| Konstante | Wert | Warum |
|
|
||||||
|-----------|------|-------|
|
|
||||||
| `prefetchLimit` (hybrid) | `20`, bzw. `topK*4` bei topK>20 | Fusion-Fenster: genug dense-Kontext für RRF, ohne den Volltext-Anteil zu verwässern |
|
|
||||||
| `controlPoolDepth` | `60` | **Gemessen:** für EU-Cyber-Control-Queries liegen die relevanten Control-Quellen (NIST, CRA-Anhang) bei dense-Rang ~8–9 — weit unter dem kleinen top-K. Auf dem größeren (95k) synced Korpus reicht ein fixer Tiefen-Pull von 60, um sie zum Kandidaten zu machen |
|
|
||||||
| `graphSeedCount` | `5` | nur die Top-Hits als Graph-Saat (Begrenzung der Expansion) |
|
|
||||||
| `graphMaxExpand` | `15` | Obergrenze der über Kanten gezogenen Normen |
|
|
||||||
| `graphHopPenalty` | `0.05` | leichte Distanz-Strafe pro Kante (Pool-Expansion, kein Ranking-Hebel) |
|
|
||||||
| `RAG_GRAPH_EXPANSION` | env, default **aus** | **Opt-in:** kein gemessener Rang-Nutzen ggü. der Binding-Augmentation, +1 Qdrant-Call/Suche, Flutungsrisiko über Reverse-Kanten. Bleibt als Recall-Sicherheitsnetz |
|
|
||||||
|
|
||||||
> Forward-Kanten (`references_out`) treiben die Graph-Expansion; Reverse-Kanten (`references_in`) werden **nur als Metadaten** geführt (sonst flutet ein populärer Anhang den Pool).
|
|
||||||
|
|
||||||
## Code
|
|
||||||
|
|
||||||
- `legal_rag_client.go` → `searchInternal()`, `mergeDedupHits()`
|
|
||||||
- `legal_rag_http.go` → `searchHybrid()`, `searchDense()`, `searchBinding()`, `searchControls()`
|
|
||||||
- `legal_rag_graph.go` → `expandViaGraph()`
|
|
||||||
|
|
||||||
## Adressierte Fehlerklassen
|
|
||||||
|
|
||||||
- **„Pflichtquelle nicht im Pool"** → Binding-Augmentation (Stufe 3) garantiert die `binding_law`-Quelle als Kandidat.
|
|
||||||
- **„Control-Quelle unter top-K"** → Control-Augmentation + `controlPoolDepth` (Stufe 4) holt tiefliegende NIST/CRA-Anhang-Treffer.
|
|
||||||
- **„Recall-Lücke bei Synonymen"** → Hybrid (RRF) deckt lexikalische Treffer ab, die rein semantisch fehlen.
|
|
||||||
@@ -1,51 +0,0 @@
|
|||||||
# 02 — Authority-Re-Ranking
|
|
||||||
|
|
||||||
**Zweck:** Bindendes Recht der passenden Jurisdiktion/Domäne nach oben, Guidance/Fremdrecht/Off-Domain nach unten — **Reihenfolge only, nichts wird gelöscht**. Der `Score` trägt nach dem Rerank den Authority-Score, damit nachgelagerte Multi-Collection-Merges (Advisor) die Ordnung bewahren.
|
|
||||||
|
|
||||||
## Mechanik
|
|
||||||
|
|
||||||
`authorityScore()` (`authority_rerank.go`) berechnet pro Treffer einen normativen Relevanz-Score aus dem rohen Semantik-Score + gewichteter Autorität + Kontext-Bonus/Penalty:
|
|
||||||
|
|
||||||
```
|
|
||||||
score = rawSemantic
|
|
||||||
+ authorityCoef · weight/100 (Autorität, siehe 03)
|
|
||||||
+ jurisdictionGain (DE/EU-Match)
|
|
||||||
− foreignPenalty (CH bei DE/EU-Frage)
|
|
||||||
− unknownPenalty (unbekannte Klasse)
|
|
||||||
+ domainMatchGain (Chunk-Domäne == Query-Domäne)
|
|
||||||
− offDomainPenalty (bindend, aber off-domain)
|
|
||||||
− scopePenalty (BDSG Teil 3 bei allgemeiner DS-Frage)
|
|
||||||
+ topicGain (bevorzugte kanonische Norm)
|
|
||||||
− supersededPenalty (status="superseded")
|
|
||||||
```
|
|
||||||
|
|
||||||
`rerankByAuthority()` sortiert stabil nach diesem Score und schreibt ihn zurück. `liftAboveBinding()` hebt bei **Auslegungs-Intent** eine semantisch konkurrenzfähige Guidance knapp über das bindende Recht — mit Margin-Guard, damit off-topic-Guidance das Gesetz nicht überholt.
|
|
||||||
|
|
||||||
## Konstanten + Warum
|
|
||||||
|
|
||||||
| Konstante | Wert | Warum |
|
|
||||||
|-----------|------|-------|
|
|
||||||
| `authorityCoef` | `0.40` | Gewicht→Score-Multiplikator. Konservativ kalibriert gegen die Offline-Golden-Harness (Phase A): hoch genug, dass bindendes Recht gewinnt, niedrig genug, dass starke Semantik nicht erschlagen wird |
|
|
||||||
| `jurisdictionGain` | `0.05` | leichter Vorzug für DE/EU-Quellen bei DE/EU-Frage |
|
|
||||||
| `foreignPenalty` | `0.60` | Fremdrecht (CH) bei DE/EU-Frage klar demoten — aber **nicht** entfernen (Vergleichsfälle bleiben auffindbar) |
|
|
||||||
| `unknownPenalty` | `0.08` | unklassifizierte Quellen leicht zurückstufen |
|
|
||||||
| `domainMatchGain` | `0.15` | Domänen-Treffer (data_protection / cyber / ai / product_safety) belohnen |
|
|
||||||
| `offDomainPenalty` | `0.10` | bindende, aber fachfremde Norm demoten (z.B. DSGVO bei reiner Cyber-Frage) |
|
|
||||||
| `scopePenalty` | `0.25` | BDSG §45–84 (Justiz/Strafverfolgung) bei allgemeiner DS-Frage zurückstufen — häufige Scope-Verwechslung |
|
|
||||||
| `topicGain` | `0.18` | Verstärker für bevorzugte kanonische Normen (z.B. Art. 37 DSGVO bei DSB-Fragen) |
|
|
||||||
| `supersededPenalty` | `0.50` | abgelöste Alt-Quelle demoten, „damit Default-Fragen die eu-v1-Norm sehen, History aber auffindbar bleibt" |
|
|
||||||
| `intentLiftGain` | `0.10` | Epsilon-Lift einer Guidance über das beste bindende Recht bei Auslegungs-Intent |
|
|
||||||
| `intentLiftMargin` | `0.05` | Guard: Lift nur, wenn die Semantik innerhalb von 0.05 zum besten bindenden Treffer liegt |
|
|
||||||
|
|
||||||
**Auslegungs-Intent-Signale** (`guidanceIntentSignals`): `edpb`, `dsk`, `enisa`, `bsi`, `leitlinie`, `guideline`, `orientierungshilfe`, `auslegung`, `empfiehlt`, `empfehlung`, `sagt`, `laut`, …
|
|
||||||
|
|
||||||
## Code
|
|
||||||
|
|
||||||
- `authority_rerank.go` → `authorityScore()`, `rerankByAuthority()`, `bestBindingSemantic()`, `liftAboveBinding()`
|
|
||||||
|
|
||||||
## Adressierte Fehlerklassen
|
|
||||||
|
|
||||||
- **„Guidance verdrängt Gesetz"** → `authorityCoef`·weight hebt bindendes Recht; `liftAboveBinding` nur mit Margin-Guard.
|
|
||||||
- **„Fremdrecht Top-1"** → `foreignPenalty`.
|
|
||||||
- **„Off-Domain-Gesetz dominiert"** → `domainMatchGain` / `offDomainPenalty` / `scopePenalty`.
|
|
||||||
- **„Veraltete Norm gewinnt"** → `supersededPenalty` (siehe [08](08-explainability.md)).
|
|
||||||
@@ -1,49 +0,0 @@
|
|||||||
# 03 — `source_class` (Rechtsnatur / Autorität)
|
|
||||||
|
|
||||||
**Zweck:** Die Autoritäts-Achse, die den **Rang** bestimmt (siehe [02](02-authority.md)). Deterministisch abgeleitet — der noch nicht re-ingestierte (ungetaggte) Korpus wird trotzdem klassifiziert, ohne Re-Tagging des Bestands.
|
|
||||||
|
|
||||||
## Mechanik
|
|
||||||
|
|
||||||
`classifyAuthority()` (`authority.go`) entscheidet in dieser Reihenfolge:
|
|
||||||
|
|
||||||
1. **Standard-NAME-Override** — erkannter Standard-Name (NIST/OWASP/ISO 27001/CIS/CSA CCM/Grundschutz) erzwingt `technical_standard` (Gewicht 80), **auch wenn die Payload `supervisory_guidance` sagt**. Grund: der Korpus taggt viele Standards mit generischem guidance-`source_class`; der Name ist autoritativer. `binding_law` bleibt unangetastet.
|
|
||||||
2. **Explizite Payload-Werte** — gesetztes `source_class` / `authority_weight` gewinnen.
|
|
||||||
3. **Marker-Fallback** — foreign → standard → guidance → regulation → unknown.
|
|
||||||
|
|
||||||
`inferJurisdiction()`: Fremd-Marker → `CH`; enthält `§` oder DE-Marker → `DE`; sonst → `EU`.
|
|
||||||
|
|
||||||
## Konstanten + Warum
|
|
||||||
|
|
||||||
**Gewichte je Klasse** (`sourceClassFromWeight()`):
|
|
||||||
|
|
||||||
| `source_class` | Gewicht | Schwelle | Bedeutung |
|
|
||||||
|----------------|---------|----------|-----------|
|
|
||||||
| `binding_law` | `100` | w ≥ 100 | bindendes Recht (Gesetz/VO) |
|
|
||||||
| `technical_standard` | `80` | 80 ≤ w < 100 | Best-Practice-Control-Katalog (NIST/OWASP/ISO) |
|
|
||||||
| `supervisory_guidance` | `70` | 70 ≤ w < 80 | Aufsichts-/Auslegungs-Guidance (ENISA/BSI/EDPB) |
|
|
||||||
| `unknown` | `50` | default | unklassifiziert |
|
|
||||||
| `foreign_law` | `0` | w ≤ 0 | Fremdrecht (CH) |
|
|
||||||
|
|
||||||
**Marker-Listen** (Substring-Match):
|
|
||||||
|
|
||||||
| Liste | Einträge (Auszug) | Wirkung |
|
|
||||||
|-------|-------------------|---------|
|
|
||||||
| `standardMarkers` *(vor guidance geprüft)* | NIST, OWASP, Grundschutz, ISO 27001, ISO/IEC 27001, CSA CCM, Cloud Controls Matrix, CIS Benchmark, CIS Control | → `technical_standard` (80) |
|
|
||||||
| `guidanceMarkers` | DSK, EDPB, BfDI, ENISA, BSI, EUCC, Standards Mapping, Orientierungshilfe, Handreichung, Leitlinie, Empfehlung, OECD, CISA, Blue Guide, … | → `supervisory_guidance` (70) |
|
|
||||||
| `foreignMarkers` | RevDSG, fedlex, (CH) | → `foreign_law` (0) |
|
|
||||||
| `deMarkers` | BDSG, DSK, BfDI, BayLfD, BSI | Signal **DE**-Jurisdiktion |
|
|
||||||
|
|
||||||
## Der Standard-Name-Override (Fix 2026-06-25)
|
|
||||||
|
|
||||||
**Problem:** Der CE-Korpus taggt z.B. `NIST SP 800-82r3` als `source_class=supervisory_guidance` (Gewicht 70), **nicht** technical_standard. `classifyAuthority` vertraute dem Payload-Tag → NIST landete als guidance, **kein `control_standard`** im Pool → die Diversity-Regel ([05](05-control-intent.md)) konnte nichts injizieren.
|
|
||||||
|
|
||||||
**Fix:** Erkannter Standard-Name überschreibt ein fehl-getaggtes guidance/unknown-`source_class` → `technical_standard`. Code-Fix, **kein Re-Ingest** nötig. Bindendes Recht bleibt unangetastet (Sanity geprüft: Rechtsfrage liefert weiterhin binding Top-1).
|
|
||||||
|
|
||||||
## Code
|
|
||||||
|
|
||||||
- `authority.go` → `classifyAuthority()`, `sourceClassFromWeight()`, `inferJurisdiction()`
|
|
||||||
|
|
||||||
## Adressierte Fehlerklassen
|
|
||||||
|
|
||||||
- **„Standard als guidance mistagged → kein control_standard"** → Standard-Name-Override.
|
|
||||||
- **„Fremdrecht falsch eingeordnet"** → `foreignMarkers` + `foreign_law`-Gewicht 0.
|
|
||||||
@@ -1,60 +0,0 @@
|
|||||||
# 04 — `source_role` (Funktionale Rolle)
|
|
||||||
|
|
||||||
**Zweck:** Die zu `source_class` **orthogonale** Achse: *Was tut die Quelle im Dokument?* Sie bestimmt die **Control-Pool-Zugehörigkeit** bei Umsetzungsfragen — unabhängig von der Rechtsnatur. Deterministisch aus Markern abgeleitet, kein Re-Tagging des Bestands.
|
|
||||||
|
|
||||||
## Die 7 Rollen
|
|
||||||
|
|
||||||
| Konstante | Wert | Definition |
|
|
||||||
|-----------|------|-----------|
|
|
||||||
| `roleObligation` | `obligation` | die abstrakte Pflicht (das WAS) |
|
|
||||||
| `roleOperationalReq` | `operational_requirement` | konkrete bindende Anforderung (z.B. CRA Anhang I) |
|
|
||||||
| `roleProceduralReq` | `procedural_requirement` | Prozess: Meldung/Registrierung/DSFA/Incident |
|
|
||||||
| `roleControlStandard` | `control_standard` | Best-Practice-Katalog (NIST/OWASP/ISO/CIS) |
|
|
||||||
| `roleImplGuidance` | `implementation_guidance` | Umsetzungs-How-to (ENISA Good Practices, BSI) |
|
|
||||||
| `roleInterpretation` | `interpretation` | interpretiert die *Bedeutung* der Norm (EDPB-Leitlinie) |
|
|
||||||
| `roleDefinition` | `definition` | Definitionen / Scope / Recitals |
|
|
||||||
|
|
||||||
**Control-Pool** = `{operational_requirement, procedural_requirement, control_standard, implementation_guidance}` (die vier „wie umsetzen"-Rollen, `isControlPoolRole()`).
|
|
||||||
|
|
||||||
## Mechanik
|
|
||||||
|
|
||||||
`classifyRole()` (`control_role.go`) — Entscheidungsreihenfolge:
|
|
||||||
|
|
||||||
1. `IsRecital` → `definition`
|
|
||||||
2. `source_class == technical_standard` → `control_standard`
|
|
||||||
3. `source_class == supervisory_guidance`:
|
|
||||||
- enthält `implMarker` → `implementation_guidance`
|
|
||||||
- sonst → `interpretation`
|
|
||||||
4. `source_class == binding_law`:
|
|
||||||
- `definitionMarker` → `definition`
|
|
||||||
- `proceduralMarker` → `procedural_requirement`
|
|
||||||
- `annexMarker` **oder** `operationalMarker` → `operational_requirement`
|
|
||||||
- sonst → `obligation`
|
|
||||||
5. default → `obligation`
|
|
||||||
|
|
||||||
`controlRoleOf(payload)` klassifiziert die rohe Qdrant-Payload **vor** dem Mapping — so kann `searchControls` ([01](01-retrieval.md)) seinen tiefen dense-Pull filtern, ohne jeden Treffer voll zu materialisieren.
|
|
||||||
|
|
||||||
## Marker-Listen
|
|
||||||
|
|
||||||
| Liste | Einträge (Auszug) | → Rolle |
|
|
||||||
|-------|-------------------|---------|
|
|
||||||
| `proceduralMarkers` | Meldung, Meldepflicht, Notification, Registrierung, Konformitätserklärung, Incident, Reporting, Folgenabschätzung, DSFA, DPIA, Anzeigepflicht | `procedural_requirement` |
|
|
||||||
| `annexMarkers` | Anhang, Annex, Appendix, Anlage | `operational_requirement` |
|
|
||||||
| `operationalMarkers` | Anforderung, Requirement, essential, wesentliche | `operational_requirement` |
|
|
||||||
| `implMarkers` | Good Practice, Best Practice, Standards Mapping, Umsetzung, Implementation, Handreichung, Maßnahmenkatalog, ICS, SCADA, Technical Guideline, TIG | `implementation_guidance` |
|
|
||||||
| `definitionMarkers` | Begriffsbestimmung, Definition | `definition` |
|
|
||||||
|
|
||||||
## Warum orthogonal zu `source_class`
|
|
||||||
|
|
||||||
`source_class` (Rechtsnatur) und `source_role` (Funktion) sind **zwei Achsen**, nicht eine. ENISA bleibt `supervisory_guidance` (Rechtsnatur) **und** `implementation_guidance` (Funktion) — sie wird **nicht** umgetaggt (fachlich falsch), darf aber bei Umsetzungsfragen in den Control-Pool. So muss der Bestand nicht angefasst werden: `source_role` ist wie `source_class` aus Markern ableitbar.
|
|
||||||
|
|
||||||
`source_role` ist die **Wirbelsäule der Langzeit-Architektur** Regulation → Obligation → Operational Requirement → Control → Evidence ([09](09-framework-layer.md), Prio 4).
|
|
||||||
|
|
||||||
## Code
|
|
||||||
|
|
||||||
- `control_role.go` → `classifyRole()`, `controlRoleOf()`, `isControlPoolRole()`
|
|
||||||
|
|
||||||
## Adressierte Fehlerklassen
|
|
||||||
|
|
||||||
- **„Controls = nur technical_standard"** → vier Control-Pool-Rollen statt einer.
|
|
||||||
- **„abstrakte Pflicht dominiert Umsetzungsfrage"** → `obligation` ist *nicht* im Control-Pool (siehe [05](05-control-intent.md)).
|
|
||||||
@@ -1,51 +0,0 @@
|
|||||||
# 05 — Control-Intent + Diversity
|
|
||||||
|
|
||||||
**Zweck:** Bei einer **Umsetzungsfrage** („Welche Controls/Maßnahmen passen?") den Control-Pool ([04](04-source-role.md)) über die abstrakte Pflicht heben — und sicherstellen, dass die Ergebnisliste **verschiedene Quellenarten** zeigt, statt dass eine Rolle sie flutet. Bei einer **Rechtsfrage** bleibt alles beim Authority-Rerank ([02](02-authority.md)).
|
|
||||||
|
|
||||||
## Intent-Erkennung
|
|
||||||
|
|
||||||
`queryWantsControls()` (`authority_rerank.go`) — Keyword-Match (`controlIntentSignals`):
|
|
||||||
|
|
||||||
> control, controls, maßnahme, schutzmaßnahme, best practice, umsetzen, implementier, absicher, härt, hardening, nist, owasp, grundschutz, ccm, iso 27001, isms
|
|
||||||
|
|
||||||
Nur wenn dieser Gate `true` ist, feuern `applyControlRoles()` und `ensureControlDiversity()`.
|
|
||||||
|
|
||||||
## Rollen-Boost (`applyControlRoles`)
|
|
||||||
|
|
||||||
Jeder Control-Pool-Treffer bekommt `controlPoolGain + controlRoleBonus[role]` auf den Score:
|
|
||||||
|
|
||||||
| Größe | Wert | Warum |
|
|
||||||
|-------|------|-------|
|
|
||||||
| `controlPoolGain` | `0.15` | hebt **jede** Control-Pool-Rolle über die Nicht-Control-Rollen (obligation/interpretation/definition) — sonst gewinnt die bindende abstrakte `obligation` per Autorität allein |
|
|
||||||
| `controlRoleBonus[operational_requirement]` | `0.100` | weicher Intra-Pool-Vorrang (User 2026-06-24): op_req zuerst |
|
|
||||||
| `controlRoleBonus[procedural_requirement]` | `0.075` | … dann Prozess-Pflichten |
|
|
||||||
| `controlRoleBonus[control_standard]` | `0.050` | … dann Standard-Kataloge |
|
|
||||||
| `controlRoleBonus[implementation_guidance]` | `0.000` | guidance als Basis, kein Bonus |
|
|
||||||
|
|
||||||
> **Bewusst weich, keine harte Hierarchie:** Eine semantisch dominante `implementation_guidance` (z.B. ENISA bei einer EU-Cyber-Umsetzungsfrage) **darf Top-1 bleiben** — das ist fachlich korrekt. Der Boost demoted nur die abstrakte Pflicht, er erzwingt keine Reihenfolge.
|
|
||||||
|
|
||||||
## Control-Diversity-Regel (`ensureControlDiversity`)
|
|
||||||
|
|
||||||
**Problem:** Selbst mit Boost kann eine dichte Wolke gleicher Rolle (viele ENISA-Chunks) `operational_requirement` und `control_standard` aus der Top-K verdrängen — die Quellenarten werden unsichtbar.
|
|
||||||
|
|
||||||
**Lösung (statt harter `+0.30`-Rollenkeule):** Wenn die Top-K nur `implementation_guidance` enthält, **injiziere** den besten `operational_requirement` + besten `control_standard` aus dem Pool, indem der niedrigst-platzierte redundante guidance-Slot verdrängt wird. Algorithmus:
|
|
||||||
|
|
||||||
1. Rolle jedes Treffers bestimmen (`roleAt`).
|
|
||||||
2. Prüfen, welche Rollen in der Top-K vertreten sind.
|
|
||||||
3. Für jede fehlende Wunsch-Rolle (`operational_requirement`, `control_standard`): besten Treffer dieser Rolle unterhalb der Top-K finden, niedrigste `implementation_guidance` in der Top-K überschreiben.
|
|
||||||
4. Truncate auf `topK` (das ursprüngliche Duplikat fällt im Tail weg).
|
|
||||||
|
|
||||||
**Ergebnis live:** Umsetzungsfrage → `1.–4. ENISA · 5. NIST SP 800-82r3 (control_standard) · 6. MaschinenVO Anhang-III (op_req)`. ENISA behält Top-1, die anderen Quellenarten sind sichtbar.
|
|
||||||
|
|
||||||
> **Prinzip:** Nicht raten, nicht erzwingen, sondern relevante Quellenarten sichtbar machen.
|
|
||||||
|
|
||||||
## Code
|
|
||||||
|
|
||||||
- `authority_rerank.go` → `queryWantsControls()`
|
|
||||||
- `control_role.go` → `applyControlRoles()`, `ensureControlDiversity()`
|
|
||||||
|
|
||||||
## Adressierte Fehlerklassen
|
|
||||||
|
|
||||||
- **„abstrakte Pflicht dominiert Umsetzungsfrage"** → `controlPoolGain`.
|
|
||||||
- **„eine Rolle flutet die Top-K, Quellenarten unsichtbar"** → `ensureControlDiversity`.
|
|
||||||
- **„harte Tier-Ordnung overfittet auf eine Frage"** → weicher Boost statt Keule.
|
|
||||||
@@ -1,45 +0,0 @@
|
|||||||
# 06 — Assessment
|
|
||||||
|
|
||||||
**Zweck:** Eine **auditierbare Begründungsschicht** über die gerankten Ergebnisse. Sie macht aus einer Trefferliste eine prüfbare Aussage: *Welche Norm ist primär, welche hängen daran, wie eindeutig ist das, braucht es einen Menschen?*
|
|
||||||
|
|
||||||
## Mechanik
|
|
||||||
|
|
||||||
`Assess()` (`legal_rag_assess.go`) nimmt die bereits gerankten `results []LegalSearchResult` und baut ein `LegalAssessment`:
|
|
||||||
|
|
||||||
| Feld | Inhalt |
|
|
||||||
|------|--------|
|
|
||||||
| `PrimaryNorm` | `CitationUnit` bzw. `ArticleLabel` des Top-Treffers |
|
|
||||||
| `PrimaryRegulation` | `RegulationShort` des Top-Treffers |
|
|
||||||
| `ConnectedNorms` | verbundene Normen (`references_out` + `references_in`), gekappt + dedupliziert |
|
|
||||||
| `CrossRegime` | ob mehrere Regulierungen in den Top-N liegen |
|
|
||||||
| `WinnerMargin` | Score-Abstand Top-1 ↔ Top-2 (Proxy für Eindeutigkeit) |
|
|
||||||
| `HumanReviewFlag` | true bei niedriger Eindeutigkeit |
|
|
||||||
| `ScoreReasoning` | kurze deutsche Begründung |
|
|
||||||
|
|
||||||
## Konstanten + Warum
|
|
||||||
|
|
||||||
| Konstante | Wert | Warum |
|
|
||||||
|-----------|------|-------|
|
|
||||||
| `assessConnectedCap` | `12` | Obergrenze der in der Assessment gezeigten verbundenen Normen — verhindert, dass ein stark vernetzter Artikel die Begründung flutet |
|
|
||||||
| `assessCrossRegimeTopN` | `5` | Fenster, über das „Cross-Regime" (mehrere Regulierungen) beurteilt wird |
|
|
||||||
| `assessReviewMargin` | `0.05` | enger Winner-Abstand → Human-Review-Flag (siehe [07](07-confidence.md)) |
|
|
||||||
|
|
||||||
## Human-Review-Logik
|
|
||||||
|
|
||||||
`HumanReviewFlag` wird `true`, wenn **eine** der Bedingungen gilt:
|
|
||||||
|
|
||||||
- `WinnerMargin < 0.05` — Top-1 und Top-2 liegen zu dicht beieinander (uneindeutig),
|
|
||||||
- `CrossRegime == true` — mehrere Regimes betroffen (z.B. DSGVO + CRA),
|
|
||||||
- der Primär-Treffer ist **nicht** `binding_law` — eine Rechtsaussage ohne bindende Primärquelle.
|
|
||||||
|
|
||||||
> Das ist die deterministische Eskalations-Schwelle: das System sagt von sich aus „hier sollte ein Mensch drauf schauen", statt scheinbare Sicherheit vorzutäuschen.
|
|
||||||
|
|
||||||
## Code
|
|
||||||
|
|
||||||
- `legal_rag_assess.go` → `Assess()`, `primaryLabel()`
|
|
||||||
|
|
||||||
## Adressierte Fehlerklassen
|
|
||||||
|
|
||||||
- **„uneindeutige Antwort wird als sicher verkauft"** → `WinnerMargin` + `HumanReviewFlag`.
|
|
||||||
- **„Cross-Regime übersehen"** → `CrossRegime` über `assessCrossRegimeTopN`.
|
|
||||||
- **„Rechtsaussage ohne bindende Quelle"** → Flag bei nicht-bindendem Primär-Treffer.
|
|
||||||
@@ -1,38 +0,0 @@
|
|||||||
# 07 — Confidence
|
|
||||||
|
|
||||||
**Zweck:** Eine ehrliche Aussage über die Verlässlichkeit eines Ergebnisses — ohne einen erfundenen „Confidence: 87 %"-Wert, der Scheinsicherheit suggeriert.
|
|
||||||
|
|
||||||
## Bewusste Entscheidung: kein eigenes Confidence-Feld
|
|
||||||
|
|
||||||
Es gibt **kein** explizites `confidence`-Feld in der Engine. Stattdessen wird Verlässlichkeit aus zwei real berechneten, prüfbaren Größen abgeleitet:
|
|
||||||
|
|
||||||
| Größe | Quelle | Bedeutung |
|
|
||||||
|-------|--------|-----------|
|
|
||||||
| `WinnerMargin` | `LegalAssessment` ([06](06-assessment.md)) | Score-Abstand Top-1 ↔ Top-2 — wie klar „gewinnt" die Primärnorm? |
|
|
||||||
| `HumanReviewFlag` | `LegalAssessment` | deterministische Eskalation: ist die Antwort uneindeutig/grenzwertig? |
|
|
||||||
|
|
||||||
**Warum so?** Ein kalibrierter Wahrscheinlichkeitswert würde eine Genauigkeit vortäuschen, die ein regelbasierter Retriever nicht hat. Der **Abstand** zwischen Top-1 und Top-2 ist dagegen eine *gemessene*, erklärbare Größe: ein großer Margin = eindeutige Norm, ein kleiner Margin = mehrere plausible Quellen → Mensch entscheiden lassen.
|
|
||||||
|
|
||||||
## Schwelle
|
|
||||||
|
|
||||||
| Konstante | Wert | Wirkung |
|
|
||||||
|-----------|------|---------|
|
|
||||||
| `assessReviewMargin` | `0.05` | `WinnerMargin < 0.05` ⇒ `HumanReviewFlag = true` |
|
|
||||||
|
|
||||||
`HumanReviewFlag` feuert zusätzlich bei Cross-Regime und bei nicht-bindender Primärquelle ([06](06-assessment.md)).
|
|
||||||
|
|
||||||
## Verhältnis zur Authority-Schicht
|
|
||||||
|
|
||||||
Der `Score`, auf dem der Margin beruht, ist **nicht** der rohe Semantik-Score, sondern der Authority-Score nach dem Rerank ([02](02-authority.md)). Damit misst der Margin die *normative* Eindeutigkeit (Rechtsnatur + Domäne berücksichtigt), nicht nur die semantische Ähnlichkeit.
|
|
||||||
|
|
||||||
## Code
|
|
||||||
|
|
||||||
- `legal_rag_types.go` → `LegalSearchResult.Score`, `LegalAssessment.WinnerMargin`, `LegalAssessment.HumanReviewFlag`
|
|
||||||
- `legal_rag_assess.go` → Berechnung in `Assess()`
|
|
||||||
|
|
||||||
## Adressierte Fehlerklassen
|
|
||||||
|
|
||||||
- **„Scheinsicherheit"** → kein erfundener Prozentwert; Margin + Flag statt Pseudo-Confidence.
|
|
||||||
- **„knappe Entscheidung wird automatisch durchgewinkt"** → `assessReviewMargin`-Eskalation.
|
|
||||||
|
|
||||||
> **Ausbaustufe:** Echte Citation-Gating-Confidence (Finding nur bei Quelle ∧ Scope ∧ Stichtag) gehört in die Authority-/Freshness-Schicht und an Control → Evidence ([09](09-framework-layer.md)), nicht in einen Modell-Score.
|
|
||||||
@@ -1,42 +0,0 @@
|
|||||||
# 08 — Explainability, Zitate + Supersede
|
|
||||||
|
|
||||||
**Zweck:** Jedes Ergebnis muss sich **belegen** lassen — woher es kommt, womit es verbunden ist, und ob es noch gilt. Das ist die Grundlage für Zitierfähigkeit und für die spätere Citation-Gating-Logik.
|
|
||||||
|
|
||||||
## Zitate + Graph-Kanten
|
|
||||||
|
|
||||||
Aus der Qdrant-Payload geladen (Phase-2-Graph-Metadaten):
|
|
||||||
|
|
||||||
| Feld | Inhalt | Verwendung |
|
|
||||||
|------|--------|-----------|
|
|
||||||
| `CitationUnit` | kanonischer Artikel-/Anhang-Identifier | Dedup, Primärnorm-Label |
|
|
||||||
| `article_label` | menschenlesbare Fundstelle (z.B. „Art. 13 CRA") | Anzeige, Begründung |
|
|
||||||
| `citation_style` | Zitierformat-Marker | Anzeige |
|
|
||||||
| `references_out` | Normen, die dieser Chunk **zitiert** (Forward-Kanten) | Graph-Expansion ([01](01-retrieval.md)) + `ConnectedNorms` |
|
|
||||||
| `references_in` | Normen, die **diesen** Chunk zitieren (Reverse-Kanten) | **nur** Metadaten — nicht expandiert (Flutungsschutz) |
|
|
||||||
|
|
||||||
`Assess()` ([06](06-assessment.md)) verdichtet die Kanten zu `ConnectedNorms` — so wird sichtbar, dass z.B. Art. 13 CRA auf Anhang I verweist (die eigentliche Pflichtquelle).
|
|
||||||
|
|
||||||
## Supersede-Handling
|
|
||||||
|
|
||||||
Recht ändert sich; ein veralteter Stand darf den aktuellen nicht schlagen — aber Übergangs-/History-Fragen müssen ihn noch finden.
|
|
||||||
|
|
||||||
| Mechanik | Wert / Feld | Verhalten |
|
|
||||||
|----------|-------------|-----------|
|
|
||||||
| **Erkennung** | Payload `status == "superseded"` → `Superseded`-Flag | markiert die abgelöste Alt-Quelle |
|
|
||||||
| **Demotion** | `supersededPenalty = 0.50` (`authorityScore`, [02](02-authority.md)) | konsequente Zurückstufung |
|
|
||||||
| **Philosophie** | — | „Alt-Quelle demoted (nicht versteckt) — Default-Fragen sehen die eu-v1-Norm, History bleibt auffindbar" |
|
|
||||||
|
|
||||||
> **Nicht entfernt, nur bestraft:** Eine abgelöste Norm kann bei einer expliziten History-Frage trotzdem hoch ranken — sie wird nur konsistent demoted, nicht ausgeblendet. Das ist dieselbe „Reihenfolge, nichts löschen"-Linie wie beim Authority-Rerank.
|
|
||||||
|
|
||||||
## Code
|
|
||||||
|
|
||||||
- `legal_rag_client.go` → Payload-Mapping (`references_out/in`, `status`)
|
|
||||||
- `legal_rag_graph.go` → Forward-Kanten-Expansion, Reverse-Kanten als Metadaten
|
|
||||||
- `legal_rag_assess.go` → `ConnectedNorms`
|
|
||||||
- `authority_rerank.go` → `supersededPenalty`
|
|
||||||
|
|
||||||
## Adressierte Fehlerklassen
|
|
||||||
|
|
||||||
- **„Aussage ohne Fundstelle"** → `CitationUnit` / `article_label`.
|
|
||||||
- **„Pflichtquelle hinter Verweis versteckt"** → Forward-Kanten-Expansion (Art. 13 → Anhang I).
|
|
||||||
- **„veralteter Rechtsstand gewinnt"** → `supersededPenalty`, aber auffindbar.
|
|
||||||
@@ -1,51 +0,0 @@
|
|||||||
# 09 — `framework_*`-Layer (Control-Mapping-Brücke)
|
|
||||||
|
|
||||||
**Zweck:** Einen **konkreten Control adressierbar** machen (z.B. `V14.2.4`), damit das System vom „welches Dokument passt?" zum „welcher konkrete Control erfüllt CRA Annex I?" übergeht. Das ist die Brücke zur nächsten Stufe — **Control → Evidence** — und der eigentliche Burggraben.
|
|
||||||
|
|
||||||
> **Ehrlicher Status:** Dieser Layer lebt **heute in der Qdrant-Payload**, nicht im Retrieval-Code. Die `ucca`-Engine liest/routet `framework_*` (noch) nicht — sie ist die **Datengrundlage**, auf der Prio 4 aufsetzt. `framework_control` reist aktuell im Feld `article` mit und ist daher bereits in den Antworten sichtbar.
|
|
||||||
|
|
||||||
## Schema (pro Chunk)
|
|
||||||
|
|
||||||
| Feld | Beispiel (OWASP) | Bedeutung |
|
|
||||||
|------|------------------|-----------|
|
|
||||||
| `framework` | `OWASP ASVS` | Rahmenwerk |
|
|
||||||
| `framework_version` | `5.0` | Version (mit `superseded`-Mechanik historisierbar, [08](08-explainability.md)) |
|
|
||||||
| `framework_section` | `V6` | Kapitel/Sektion |
|
|
||||||
| `framework_control` | `V6.2.4` | konkrete Requirement-ID — der adressierbare Control |
|
|
||||||
| `framework_section_name` | `Password Security` | menschenlesbarer Kontext |
|
|
||||||
| `asvs_level` | `L1`/`L2`/`L3` | (OWASP-spezifisch) Stufe |
|
|
||||||
|
|
||||||
Analog für NIST geplant: `framework="NIST SP 800-53"`, `framework_family="SI"`, `framework_control="SI-2"`, `framework_revision="5"`.
|
|
||||||
|
|
||||||
## OWASP ASVS 5.0 — die erste Referenz (Parser-4-Muster)
|
|
||||||
|
|
||||||
- **Quelle:** `OWASP/ASVS` GitHub, `5.0/docs_en/...flat.json` (345 Requirements). Lizenz **CC-BY-SA-4.0** (zulässig; nur CC-BY-NC ist geblockt), Attribution `OWASP`.
|
|
||||||
- **Ingestion = per-Requirement Direct-Upsert** (nicht der RAG-Chunker, der `framework_control` zerschneiden würde): 1 Qdrant-Punkt pro Requirement, `id = uuid5("owasp_asvs_5.0_"+req_id)` (idempotent), `source_class=technical_standard` / `authority_weight=80`, bge-m3-Vektor.
|
|
||||||
- **Stand:** 345 Punkte auf macmini-qdrant **und** qdrant-dev, live verifiziert (`„OWASP … Authentifizierung"` → Top-OWASP mit `V`-Codes).
|
|
||||||
- **Lehre:** Künftige Standards (NIST-Re-Tag, BSI Grundschutz) **immer** mit `source_class=technical_standard` + `framework_*` direkt setzen — das NIST-Altskript ließ `source_class` leer, daher der guidance-Mistag ([03](03-source-class.md)).
|
|
||||||
|
|
||||||
## Brücke zu Prio 4 — Control → Evidence
|
|
||||||
|
|
||||||
```
|
|
||||||
Regulation
|
|
||||||
↓ (legal obligation layer)
|
|
||||||
Obligation
|
|
||||||
↓ (source_role: operational_requirement)
|
|
||||||
Operational Requirement ── CRA Annex I
|
|
||||||
↓ (Control-Mapping über framework_control)
|
|
||||||
Control ── OWASP V6.x · NIST SI-2 · BSI OPS.1.1
|
|
||||||
↓
|
|
||||||
Evidence ── der Nachweis, den ein Auditor sehen will
|
|
||||||
```
|
|
||||||
|
|
||||||
Der nächste Schritt verdrahtet `framework_control` in eine **Control-Mapping-Tabelle** (welcher konkrete Control erfüllt welche Obligation) und darunter die **Evidence-Schicht**. NIST + BSI ziehen im selben `framework_*`-Muster nach.
|
|
||||||
|
|
||||||
## Code / Daten
|
|
||||||
|
|
||||||
- Daten: Qdrant `bp_compliance_ce` (Payload-Felder oben), Ingestion-Skripte (`ingest_owasp.py` u.a.)
|
|
||||||
- Retrieval-Verdrahtung: **offen** (Prio 4)
|
|
||||||
|
|
||||||
## Adressierte Fehlerklassen
|
|
||||||
|
|
||||||
- **„nur Dokument-Treffer, kein adressierbarer Control"** → `framework_control` pro Chunk.
|
|
||||||
- **„Control-Katalog ohne Stand"** → `framework_version` + Supersede.
|
|
||||||
@@ -1,57 +0,0 @@
|
|||||||
# RAG-Retrieval-Engine — Architektur
|
|
||||||
|
|
||||||
Diese Sektion dokumentiert die **deterministische, regelbasierte Retrieval-Engine** des Compliance-SDK (`ai-compliance-sdk/internal/ucca/`). Sie beantwortet für jede Nutzerfrage: *Welche Norm/Quelle ist relevant — und warum?*
|
|
||||||
|
|
||||||
> **Warum diese Doku existiert:** Die Engine trifft viele bewusste `+0.05 / +0.10`-Entscheidungen. Jede Konstante kodiert eine **gemessene** Entscheidung (Golden-Harness, Fehlerklasse) — nicht eine willkürliche Stellschraube. Ohne das *Warum* sind sie in sechs Monaten nicht mehr nachvollziehbar; diese Doku ist die Referenz für Wartung, Onboarding und Audit-/Investoren-Nachweis.
|
|
||||||
|
|
||||||
## Leitprinzip
|
|
||||||
|
|
||||||
> **Nicht raten, nicht erzwingen, sondern relevante Quellenarten sichtbar machen.**
|
|
||||||
|
|
||||||
Der LLM entscheidet **nicht**, was Recht ist — nur, wie eine bereits versionierte, zitierte Norm auf einen Sachverhalt gemappt wird. Wo möglich ist die Engine deterministisch (Marker, Gewichte, Schwellen), nicht modellbasiert. Nichts wird *gelöscht* — Re-Ranking ist reine Reihenfolge, alles bleibt auffindbar.
|
|
||||||
|
|
||||||
## Zwei orthogonale Achsen
|
|
||||||
|
|
||||||
Der Kern des Modells: zwei unabhängige Achsen, die in der Literatur meist vermischt werden.
|
|
||||||
|
|
||||||
| Achse | Frage | Wirkung | Doku |
|
|
||||||
|------|-------|---------|------|
|
|
||||||
| **`source_class`** (Rechtsnatur) | Wie bindend ist die Quelle? | bestimmt den **Rang** | [03](03-source-class.md) |
|
|
||||||
| **`source_role`** (Funktion) | Was tut die Quelle im Dokument? | bestimmt die **Control-Pool-Zugehörigkeit** | [04](04-source-role.md) |
|
|
||||||
|
|
||||||
Beispiel: NIST ist `technical_standard` (source_class) **und** `control_standard` (source_role). ENISA-Good-Practices sind `supervisory_guidance` **und** `implementation_guidance` — sie bleiben guidance, dürfen aber bei Umsetzungsfragen in den Control-Pool.
|
|
||||||
|
|
||||||
## Pipeline (Überblick)
|
|
||||||
|
|
||||||
```
|
|
||||||
Query
|
|
||||||
│ bge-m3 Embedding
|
|
||||||
▼
|
|
||||||
Retrieval-Pool ── hybrid (RRF) + binding-Augmentation + control-Augmentation + (graph) → 01
|
|
||||||
▼
|
|
||||||
Authority-Rerank ── source_class → Rang (bindendes Recht der passenden Jurisdiktion oben) → 02, 03
|
|
||||||
▼
|
|
||||||
Control-Intent ── source_role → Control-Pool + Diversity (Quellenarten sichtbar machen) → 04, 05
|
|
||||||
▼
|
|
||||||
Assessment ── PrimaryNorm · ConnectedNorms · WinnerMargin · CrossRegime → 06
|
|
||||||
▼
|
|
||||||
Confidence/Explainability ── HumanReviewFlag · Zitate · Graph-Kanten · Supersede → 07, 08
|
|
||||||
```
|
|
||||||
|
|
||||||
`framework_*` ([09](09-framework-layer.md)) ist die **Daten-Brücke** zur nächsten Stufe (Control → Evidence) — heute in der Qdrant-Payload, noch nicht im Retrieval-Code verdrahtet.
|
|
||||||
|
|
||||||
## Dokumente
|
|
||||||
|
|
||||||
| # | Dokument | Inhalt |
|
|
||||||
|---|----------|--------|
|
|
||||||
| 01 | [Retrieval-Pipeline](01-retrieval.md) | Pool-Aufbau: hybrid + binding + control + graph |
|
|
||||||
| 02 | [Authority-Re-Ranking](02-authority.md) | source_class → Rang, Bonus/Penalty-System |
|
|
||||||
| 03 | [source_class](03-source-class.md) | Rechtsnatur, Gewichte, Marker, Standard-Name-Override |
|
|
||||||
| 04 | [source_role](04-source-role.md) | 7 Rollen, Control-Pool, Klassifikation |
|
|
||||||
| 05 | [Control-Intent + Diversity](05-control-intent.md) | Intent-Erkennung, Rollen-Bonus, Diversity-Regel |
|
|
||||||
| 06 | [Assessment](06-assessment.md) | Auditierbare Begründungsschicht |
|
|
||||||
| 07 | [Confidence](07-confidence.md) | WinnerMargin, HumanReviewFlag |
|
|
||||||
| 08 | [Explainability + Supersede](08-explainability.md) | Zitate, Graph-Kanten, Supersede |
|
|
||||||
| 09 | [framework_*-Layer](09-framework-layer.md) | Control-Mapping-Brücke (CRA Annex → OWASP V6.x) |
|
|
||||||
|
|
||||||
> **Fehlerklassen-These:** Modell und Korpus sind austauschbar; die *Fehlerklassen + Hebel* sind das IP. Jede Konstante unten adressiert eine benannte Fehlerklasse (z.B. „Guidance verdrängt Gesetz", „Standard als guidance mistagged"). Die Kalibrierung ist sublinear: wenige Klassen, viele Module.
|
|
||||||
@@ -1,155 +0,0 @@
|
|||||||
# Kriterien-Meta-Modell & Compliance-Tier-Architektur
|
|
||||||
|
|
||||||
> **Status: EINGEFROREN 2026-06-22.** Änderungen an diesem Modell sind
|
|
||||||
> Architekturentscheidungen und erfordern eine bewusste Freigabe (DB-Owner /
|
|
||||||
> Produktverantwortung). Verwandt: [`platform_checker_matrix.md`](platform_checker_matrix.md),
|
|
||||||
> [`verification_method.md`](verification_method.md), [`platform_validation_v1.md`](platform_validation_v1.md).
|
|
||||||
|
|
||||||
## 1. Motivation
|
|
||||||
|
|
||||||
Die Kalibrierung der vier Website-Compliance-Module deckte vier **verschiedene**
|
|
||||||
dominante Fehlerursachen auf:
|
|
||||||
|
|
||||||
| Modul | Dominanter Hebel |
|
|
||||||
|-------|------------------|
|
|
||||||
| Cookie-Policy | Sufficiency (Judge) |
|
|
||||||
| Impressum | Scope / Routing |
|
|
||||||
| AGB | Decision-Method / Routing |
|
|
||||||
| DSE | **Überladene Controls + Vermischung „gesetzliches Minimum vs. Best Practice"** |
|
|
||||||
|
|
||||||
Die DSE-Untersuchung (Adjudikation von 13 Judge↔GT-Disagreements) ergab: **85 % der
|
|
||||||
Restfehler sind Katalog-Defekte, 15 % Prüfer.** Der größte Einzeldefekt: ein Control
|
|
||||||
bündelt mehrere Anforderungen **unterschiedlicher Verbindlichkeit** und wird nur dann
|
|
||||||
als ERFÜLLT gewertet, wenn *alle* erfüllt sind. Folge: gesetzlich konforme Dokumente
|
|
||||||
werden als „FEHLT" gemeldet, weil eine Best-Practice-Empfehlung fehlt.
|
|
||||||
|
|
||||||
Dieses Modell behebt das **im Katalog** — ohne den Prüfer zu ändern und ohne Controls
|
|
||||||
physisch aufzuspalten.
|
|
||||||
|
|
||||||
## 2. Datenmodell
|
|
||||||
|
|
||||||
Ein Control bleibt **stabil** (UUID, Citations, GT-Historie, Kalibrierung,
|
|
||||||
Statistiken). Seine `pass_criteria` werden von einer Stringliste zu **atomaren,
|
|
||||||
getypten Kriterien-Objekten**:
|
|
||||||
|
|
||||||
```
|
|
||||||
Control (stabile control_uuid — NICHT splitten)
|
|
||||||
└─ criteria: Criterion[]
|
|
||||||
|
|
||||||
Criterion
|
|
||||||
├─ criterion (Text der Einzelanforderung)
|
|
||||||
├─ legal_basis (z. B. "Art. 13(1)(c) DSGVO")
|
|
||||||
├─ verification_method (Achse 1 — WAS wird geprüft)
|
|
||||||
├─ decision_method (Achse 2 — WIE wird entschieden)
|
|
||||||
├─ compliance_tier (Achse 3 — WIE VERBINDLICH)
|
|
||||||
└─ weight (reserviert für Reifegrad, s. §6 — heute NICHT gating)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Speicherort:** `canonical_controls.generation_metadata->'tiered_criteria'` (jsonb).
|
|
||||||
**Keine Schema-Änderung.** Kein physischer Control-Split (Variante A wurde verworfen:
|
|
||||||
neue UUIDs → Verlust von Benchmarks/Kalibrierung/Citation/GT = Migrationsprojekt).
|
|
||||||
|
|
||||||
## 3. Die drei Achsen
|
|
||||||
|
|
||||||
Jedes Kriterium trägt drei **unabhängige** Klassifikationen:
|
|
||||||
|
|
||||||
1. **`verification_method`** — artefakt-abhängig: CONTENT · FIELD · REFERENCE ·
|
|
||||||
BEHAVIOR · PRESENTATION · PROCESS · TECHNICAL · CONTRACTUAL. Siehe
|
|
||||||
[`verification_method.md`](verification_method.md).
|
|
||||||
2. **`decision_method`** — welcher Prüfer: REGEX · EMBEDDING · LLM · LINK_RESOLVER ·
|
|
||||||
PLAYWRIGHT · AUDIT · SCANNER. Siehe [`platform_checker_matrix.md`](platform_checker_matrix.md).
|
|
||||||
3. **`compliance_tier`** *(neu, dieses Dokument)* — Verbindlichkeit:
|
|
||||||
- **`LEGAL_MINIMUM`** — gesetzlich erforderlich. Beeinflusst den Compliance-Status.
|
|
||||||
- **`BEST_PRACTICE`** — empfehlenswert, gesetzlich nicht erforderlich. Erscheint als
|
|
||||||
Empfehlung. Beeinflusst den Status **nie**.
|
|
||||||
- **`OPTIONAL`** — Komfort/Detailtiefe. Empfehlung. Beeinflusst den Status **nie**.
|
|
||||||
|
|
||||||
Achse 1 + 2 sind primär **per Kriterium** (atomar); ein Control kann Kriterien
|
|
||||||
verschiedener Methoden mischen.
|
|
||||||
|
|
||||||
## 4. Status-Berechnung (3 Zustände) — Gating NUR auf LEGAL_MINIMUM
|
|
||||||
|
|
||||||
Sei `LM` die Menge der `LEGAL_MINIMUM`-Kriterien eines Controls und `met(LM)` die
|
|
||||||
erfüllten darunter:
|
|
||||||
|
|
||||||
```
|
|
||||||
ERFÜLLT := |LM| > 0 und met(LM) == |LM| (alle Pflicht-Kriterien erfüllt)
|
|
||||||
TEILWEISE := 0 < met(LM) < |LM| (mind. eines erfüllt, mind. eines fehlt)
|
|
||||||
FEHLT := |LM| > 0 und met(LM) == 0 (kein Pflicht-Kriterium erfüllt)
|
|
||||||
```
|
|
||||||
|
|
||||||
`BEST_PRACTICE`/`OPTIONAL`-Kriterien gehen **nicht** in diese Berechnung ein. Sie
|
|
||||||
werden separat als Empfehlungen ausgewiesen (§5, Ebene 2).
|
|
||||||
|
|
||||||
> **Invariante:** Ein erfülltes gesetzliches Minimum darf NIE durch fehlende
|
|
||||||
> Best-Practice-/Optional-Kriterien auf FEHLT/Rot gezogen werden.
|
|
||||||
|
|
||||||
## 5. Reporting — drei Ebenen
|
|
||||||
|
|
||||||
| Ebene | Inhalt | Quelle |
|
|
||||||
|-------|--------|--------|
|
|
||||||
| **1 — Compliance-Status (rechtlich)** | ERFÜLLT / TEILWEISE / FEHLT | NUR `LEGAL_MINIMUM` |
|
|
||||||
| **2 — Optimierungspotenzial** | „Empfehlungen: N · Best-Practice-Abdeckung X %" | `BEST_PRACTICE` + `OPTIONAL` |
|
|
||||||
| **3 — Risiko-Reifegrad** *(optional, später)* | „Reifegrad Y %" für CRA/NIS2/ISO 27001/TOM | gewichtet, s. §6 |
|
|
||||||
|
|
||||||
**Anti-Pattern (verboten):** kein „Compliance-Score = 72 %", wenn alle gesetzlichen
|
|
||||||
Anforderungen erfüllt sind. Das erzeugt „welche 28 % fehlen?" → „eigentlich keine
|
|
||||||
Pflicht" → der Score wird wertlos.
|
|
||||||
|
|
||||||
### Farb-Semantik (Bedeutung, nicht Wertung)
|
|
||||||
|
|
||||||
- **Grün** = gesetzliche Anforderungen erfüllt (Pflicht erfüllt)
|
|
||||||
- **Blau** = empfohlene Verbesserungen vorhanden (Optimierung möglich)
|
|
||||||
- **Rot** = gesetzliche Anforderungen fehlen (Pflichtverletzung)
|
|
||||||
|
|
||||||
`TEILWEISE` ist visuell ein eigener Zustand (z. B. Gelb/Amber): Pflicht teilweise
|
|
||||||
erfüllt. Verbindet sich mit der BreakPilot-Tonalität (kein Panik-Rot) und dem
|
|
||||||
3-Tier-Obligation-Modell (Pflicht/Empfehlung/Kann).
|
|
||||||
|
|
||||||
## 6. `weight`
|
|
||||||
|
|
||||||
Wird heute **gespeichert, aber nicht für das Gating verwendet** (bewusste
|
|
||||||
Entscheidung: Gewichte erzeugen sofort „warum 0.3 und nicht 0.4?"-Diskussionen). Es
|
|
||||||
ist die Reserve für **Ebene 3 (Reifegrad)**: später lässt sich daraus ein gewichteter
|
|
||||||
Best-Practice-/Reifegrad-Prozentwert berechnen. Richtwerte: LEGAL_MINIMUM 1.0 ·
|
|
||||||
BEST_PRACTICE ~0.3 · OPTIONAL ~0.1.
|
|
||||||
|
|
||||||
## 7. compliance_tier ist eine PLATTFORM-Achse
|
|
||||||
|
|
||||||
Nicht nur ein DSE-Fix. Dasselbe Muster tritt überall auf — DSE (Minimum vs. BP),
|
|
||||||
Cookie (Offenlegung vs. Transparenz), Impressum (Pflicht- vs. Komfortfelder), AGB
|
|
||||||
(erforderlich vs. empfehlenswert) und perspektivisch CRA/NIS2/Maschinenverordnung.
|
|
||||||
Ein einzelnes Kriterium trägt überall `compliance_tier`; die Plattform wertet
|
|
||||||
**Compliance / Empfehlungen / Reifegrad** regulierungsunabhängig aus.
|
|
||||||
|
|
||||||
## 8. Validierungsnachweis (Pilot, 2026-06-22)
|
|
||||||
|
|
||||||
Geschrieben auf macmini (`generation_metadata.tiered_criteria`, prod-guarded), gemessen
|
|
||||||
gegen Opus-GT (ikea/ob/teamviewer):
|
|
||||||
|
|
||||||
- **5 Pilot-Controls** (SEC-7285-A03, SEC-3257-A01, Portabilitäts-Cluster
|
|
||||||
DATA-1613/DATA-2552/COMP-2087): alle **6 Disagreement-Fälle** (vormals falsch-FEHLT)
|
|
||||||
wandern zu **ERFÜLLT + Empfehlungen**; echte Lücken bleiben korrekt FEHLT — ohne
|
|
||||||
Prüfer-Änderung.
|
|
||||||
- **TEILWEISE-Validierung** (DATA-1445-A02, SEC-4752-A02): der 3. Status tritt real auf
|
|
||||||
(1 ERFÜLLT / 5 TEILWEISE), Splitter durchgängig „Speicherdauer pro Zweck"
|
|
||||||
(Art. 13(2)(a)).
|
|
||||||
- Lehre: selbst Pilot-Kriterien können Minimum + Best-Practice vermischen
|
|
||||||
(„Speicherdauer *pro Zweck*"). Die LM/BP-Linie ist eine **Produktpolitik-Entscheidung
|
|
||||||
(Mensch)**, kein NLP-Problem. Das Modell ist korrekt; die Kriterien-Schärfe ist
|
|
||||||
Kurationsarbeit.
|
|
||||||
|
|
||||||
## 9. Invarianten (nicht verletzen)
|
|
||||||
|
|
||||||
1. Control-UUID bleibt stabil — **kein** physischer Split.
|
|
||||||
2. Status (Grün/Gelb/Rot) hängt **ausschließlich** an `LEGAL_MINIMUM`.
|
|
||||||
3. `BEST_PRACTICE`/`OPTIONAL` erzeugen Empfehlungen, **nie** einen FEHLT-Status.
|
|
||||||
4. Kein Prozent-Compliance-Score, wenn alle gesetzlichen Anforderungen erfüllt sind.
|
|
||||||
5. Speicherung in `generation_metadata` (jsonb) — keine Schema-Migration.
|
|
||||||
|
|
||||||
## 10. Rollout (nach diesem Freeze)
|
|
||||||
|
|
||||||
1. **10–15** der schlimmsten überladenen DSE-Controls tiern (nicht alle 49 auf einmal).
|
|
||||||
2. 3-Status-Logik in die Live-DSE-Engine verdrahten (heute nur Mess-Harness).
|
|
||||||
3. Benchmark erneut: FP / FN / Precision / Recall + Status-Verteilung.
|
|
||||||
4. Erst bei stabilem Effekt: Rollout auf alle 49 überladenen Controls.
|
|
||||||
@@ -1,89 +0,0 @@
|
|||||||
# Obligation Aggregation — Validated Shadow Results (2026-06-24)
|
|
||||||
|
|
||||||
Status: **bewiesen im Shadow auf macmini**, NICHT deployt, NICHT live geschaltet.
|
|
||||||
Code auf Branch `feat/obligation-aggregation`; das LLM-Tiering der recipients/transfer-
|
|
||||||
Controls liegt als DB-Marker nur auf macmini.
|
|
||||||
|
|
||||||
Dieser Stand validiert die Ausführung des [Legal Obligation Layer v1](legal_obligation_layer_v1.md)
|
|
||||||
über vier ineinandergreifende Schichten.
|
|
||||||
|
|
||||||
## Die vier Schichten
|
|
||||||
|
|
||||||
1. **Obligation Aggregation** — `compliance/services/obligation_aggregation.py`.
|
|
||||||
Aggregiert Kriterium-/Control-Bewertungen zu Findings auf OBLIGATION-Ebene
|
|
||||||
(Regulation → Obligation → Control → Criterion). Redundanz kollabiert per OR pro
|
|
||||||
`legal_basis`-Anforderung; fail-safe Status (MET/PARTIAL/FAILED/NA/UNDETERMINED/OPEN).
|
|
||||||
2. **Applicability** — `compliance/services/obligation_applicability.py`.
|
|
||||||
Prädikate (`has_third_country_transfer`, `uses_legitimate_interest`, `direct_marketing`,
|
|
||||||
`legitimate_interest_or_public_task`) entscheiden bedingte Obligations → True/False/None
|
|
||||||
(unbekannt → anwendbar, nie stille NA).
|
|
||||||
3. **Recall-limited Segregation** — `compliance/services/obligation_taxonomy.py` +
|
|
||||||
`specialist_agents/dse/_obligation_shadow.py`.
|
|
||||||
`decision_method_required=LLM` trennt FAILED ehrlich in `failed_by_current_checker`
|
|
||||||
(echte Lücke) vs `recall_limited` (Prüfer kann mit aktueller Methode nicht verifizieren).
|
|
||||||
4. **Targeted LLM Fix** — recipients/transfer-Controls mit `tiered_criteria`
|
|
||||||
(decision_method=LLM) → Layer 3 nutzt den **Haiku-Sufficiency-Judge** statt Keyword/Embedding.
|
|
||||||
|
|
||||||
## Shadow-Zahlen (7 Firmen, Live-Engine, Keyword/Embedding)
|
|
||||||
|
|
||||||
| | Wert |
|
|
||||||
|---|---|
|
|
||||||
| legacy control-findings | 136 |
|
|
||||||
| obligation findings | 29 |
|
|
||||||
| **Kollaps** | **4,7×** |
|
|
||||||
| davon echte Lücken | 23 |
|
|
||||||
| davon recall_limited | 6 (nur 2/7 Firmen, nur Drittland/Garantien) |
|
|
||||||
| MET (FP-Korrektur) | 46 |
|
|
||||||
| N/A (Applicability) | 2 |
|
|
||||||
|
|
||||||
`recall_limited` ist klein + konzentriert: ausschließlich `third_country_transfer_disclosed` /
|
|
||||||
`safeguards_disclosed` / `safeguards_accessible`, je 2/7 Firmen. `recipients_disclosed`
|
|
||||||
manifestierte nie als recall_limited (Keyword/Embedding trägt dort).
|
|
||||||
|
|
||||||
## Targeted LLM Fix — Validierung (teamviewer + safetykon)
|
|
||||||
|
|
||||||
Recall-Defekt-Diagnose (teamviewer): die Drittland-/Garantien-Offenlegung steht dicht in
|
|
||||||
einem Absatz („…außerhalb EU/EWR … Standardvertragsklauseln/Schutzmaßnahmen"), aber
|
|
||||||
**0/22 Controls** trafen — Keyword (Vokabular-Mismatch) und Embedding (cos 0.49–0.57, teils
|
|
||||||
falscher Chunk) versagen. Kein Schwellen-Fix → CONTENT/LLM-Klasse.
|
|
||||||
|
|
||||||
Nach LLM-Tiering (Haiku-Judge):
|
|
||||||
|
|
||||||
| | vorher (kw+emb) | nachher (LLM) |
|
|
||||||
|---|---|---|
|
|
||||||
| teamviewer findings | 5 | **0** |
|
|
||||||
| teamviewer recall_limited | 3 | **0** |
|
|
||||||
| safetykon findings | 7 | **4** |
|
|
||||||
| safetykon recall_limited | 3 | **0** |
|
|
||||||
|
|
||||||
- **teamviewer → 0 Findings:** DSE auf diesen Pflichten real konform; die 5 alten Findings
|
|
||||||
waren Falsch-Positive des Keyword/Embedding-Prüfers.
|
|
||||||
- **safetykon → 4 (keine Über-Korrektur):** Drittland/Garantien → MET, aber
|
|
||||||
`art20_right_exists_core` + `art20_machine_readable_format` bleiben **FAILED** (echte
|
|
||||||
Portability-Lücke), `legitimate_interest_disclosed` → **NA** (Applicability).
|
|
||||||
|
|
||||||
## Eingesetztes Modell
|
|
||||||
|
|
||||||
Der Tiered-/Sufficiency-Pfad ist **fest auf Claude Haiku 4.5 verdrahtet**
|
|
||||||
(`checkers/router.py:build_spec` setzt für CONTENT/LLM `extra.judge="haiku"` →
|
|
||||||
`llm_checker._haiku` → `_call_anthropic`; validierter Judge P0.89/R0.91, Entscheidung
|
|
||||||
2026-06-22). **Nicht** die OVH-Kaskade (35b/120b), **nicht** Opus. Konsequenz: der Fix
|
|
||||||
reproduziert sich überall identisch, braucht aber einen gültigen Anthropic-Key für den
|
|
||||||
Haiku-Judge — auch auf dev.
|
|
||||||
|
|
||||||
## Nächster operativer Block (gegated, NICHT ausgeführt)
|
|
||||||
|
|
||||||
```
|
|
||||||
Deploy-Fenster frei (andere Session fertig)
|
|
||||||
↓
|
|
||||||
dev-DB-Tiering replizieren (die 22 recipients/transfer-Controls)
|
|
||||||
↓
|
|
||||||
Haiku-Judge auf dev bestätigen (gültiger Anthropic-Key — NICHT der OVH-Pfad)
|
|
||||||
↓
|
|
||||||
Shadow aktiv lassen (Telemetrie), Produktverhalten unverändert
|
|
||||||
↓
|
|
||||||
erst dann Umschalten planen
|
|
||||||
```
|
|
||||||
|
|
||||||
Folge-Cleanup: sobald LLM-Tiering Standard ist, wird die `recall_limited`-Segregation für
|
|
||||||
diese 4 Obligations obsolet (dann ist FAILED = echte Lücke, nicht Reichweitenproblem).
|
|
||||||
@@ -1,77 +0,0 @@
|
|||||||
# Obligation Discovery Pipeline v1
|
|
||||||
|
|
||||||
Ein **generisches Verfahren zur Ableitung einer regulatorischen Ontologie** (Legal Obligation
|
|
||||||
Registry) aus großen Compliance-Korpora. Validiert über drei Domänen (SBOM, Vulnerability
|
|
||||||
Handling, Authentication). Erzeugt die zitierfähige Mitte aus
|
|
||||||
[obligation_registry_v1.md](obligation_registry_v1.md).
|
|
||||||
|
|
||||||
## Architekturregel (nicht verhandelbar)
|
|
||||||
|
|
||||||
```
|
|
||||||
RUNTIME bleibt deterministisch (Document → Embedding → LLM-Judge → Finding)
|
|
||||||
DISCOVERY darf LLM-gestützt sein (Controls → … → LLM-Synthese → Obligation Registry)
|
|
||||||
```
|
|
||||||
Discovery läuft **einmalig/offline** mit dem stärksten Modell; die Runtime-Prüf-Engine wird
|
|
||||||
davon nicht berührt. Zwei getrennte Probleme, eine gemeinsame Sprache (die Obligation).
|
|
||||||
|
|
||||||
## Stufen (`scripts/obligation_discovery/`)
|
|
||||||
|
|
||||||
| Stufe | Skript | Aufgabe | Key |
|
|
||||||
|---|---|---|---|
|
|
||||||
| 1 | `precluster.py` | Controls (scope) → Embedding (gecacht) → **Mikro-Cluster** | – |
|
|
||||||
| 2 | `meta_cluster.py` | Mikro → **Review Units** (Skalierungs-Fix für große Domänen) | – |
|
|
||||||
| 3 | `synthesize_obligations.py` | Review Units → Opus → **Obligation Candidates** | ENV |
|
|
||||||
| 4 | `validate_registry.py` | Belastbarkeits-Checks | – |
|
|
||||||
| 5 | `merge_review_diff.py` | vorgeschlagene Beziehungskanten dedupliziert mergen | – |
|
|
||||||
|
|
||||||
Reine, unit-getestete Helfer in `_core.py`. Ausführung im `bp-compliance-backend`-Container
|
|
||||||
(`PYTHONPATH=/app`); der Key kommt aus `ANTHROPIC_API_KEY` (nie hartcodiert).
|
|
||||||
|
|
||||||
## Zwei-Stufen-Clustering = der Skalierungs-Fix
|
|
||||||
|
|
||||||
Ein flacher Single-Threshold-Pre-Cluster + EIN LLM-Synthese-Call skaliert NICHT auf große
|
|
||||||
Domänen. Lösung: eine Hierarchiestufe. **Review Unit ≠ Meta-Cluster** — die Review Unit ist
|
|
||||||
das, was der LLM sieht (entkoppelt vom Clustering, später merge/split-bar).
|
|
||||||
|
|
||||||
## Belegte Meilensteine
|
|
||||||
|
|
||||||
| Domäne | Controls | → Cluster/Review Units | → Obligations | vs Ground Truth |
|
|
||||||
|---|---|---|---|---|
|
|
||||||
| **SBOM** | 258 | 86 Mikro | 12 (→ 11 final) | manuell ~10 — **reproduziert + verfeinert** |
|
|
||||||
| **Vulnerability** | 531 | 200 Mikro | 8 | manuell ~7 — **reproduziert** |
|
|
||||||
| **Authentication** | 4408 | 2134 Mikro → **170 Review Units** | 54 → Kuration **29** | Skalierung — **generalisiert** |
|
|
||||||
|
|
||||||
## Harte Tier-Regel generalisiert
|
|
||||||
|
|
||||||
`LEGAL_MINIMUM` nur mit Primärrechts-Anker (`legal_basis`), sonst `BEST_PRACTICE` /
|
|
||||||
`IMPLEMENTATION_GUIDANCE` / `EVIDENCE`. Authentication zeigt den Wert: nur **6** harte
|
|
||||||
Pflichten (CRA fordert „angemessene Authentisierung"), MFA/Passwort/Session/Krypto sind
|
|
||||||
`guidance_basis`. So kann der Advisor sagen: *„Gesetzlich gefordert ist Schutz vor unbefugtem
|
|
||||||
Zugriff; MFA ist anerkannte Umsetzung, aber keine CRA-Wortlautpflicht."*
|
|
||||||
|
|
||||||
## Kuration (große Domänen)
|
|
||||||
|
|
||||||
Die Synthese darf über-splitten; ein **key-freier, regelbasierter Kurations-Pass** verdichtet:
|
|
||||||
Krypto-Mikro-Mechanismen → `guidance_basis`; Prüf-/Nachweis-Themen → `evidence`-Facette;
|
|
||||||
Mechanismus-Familien bleiben; domänenfremdes (eID/PSD2) → `out_of_scope`; LEGAL_MINIMUM
|
|
||||||
unangetastet.
|
|
||||||
|
|
||||||
## Lessons
|
|
||||||
|
|
||||||
- Große Opus-Calls brauchen **Streaming** (`messages.stream`); der SDK blockt non-streaming
|
|
||||||
bei `max_tokens` > ~8k mit „Streaming is required for operations that may take longer than 10 minutes".
|
|
||||||
- Provenance pro Obligation (`source_meta_cluster`, `discovery_confidence`, `llm_model`,
|
|
||||||
`synthesis_version`) — für spätere Evolution (CRA-Update, Modellwechsel).
|
|
||||||
- `>8 Obligations / Review Unit` → automatische Review-Warnung (Over-Split-Indikator).
|
|
||||||
- Embedding-Cache (pickle) → THR2-Sweeps ohne Re-Embed.
|
|
||||||
|
|
||||||
## End-to-End-Beispiel
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# im bp-compliance-backend-Container, PYTHONPATH=/app, cwd = scripts/obligation_discovery
|
|
||||||
python3 precluster.py --scope auth
|
|
||||||
python3 meta_cluster.py --scope auth --meta-thr 0.62 # → /tmp/auth_review_units.json (inspizieren!)
|
|
||||||
ANTHROPIC_API_KEY=… python3 synthesize_obligations.py \
|
|
||||||
--units /tmp/auth_review_units.json --regulation CRA --theme "Authentisierung" --out /tmp/auth_registry.json
|
|
||||||
python3 validate_registry.py /tmp/auth_registry.json
|
|
||||||
```
|
|
||||||
@@ -1,130 +0,0 @@
|
|||||||
# Obligation Registry v1 — Schema, Zitierfähigkeit, Zwei-Graphen-Architektur
|
|
||||||
|
|
||||||
Status: **Spec festgeschrieben (2026-06-24)**. Baut auf
|
|
||||||
[legal_obligation_layer_v1.md](legal_obligation_layer_v1.md) +
|
|
||||||
[obligation_aggregation_validation.md](obligation_aggregation_validation.md).
|
|
||||||
Die Obligation Discovery Pipeline v1 ist gegen Ground Truth validiert
|
|
||||||
(SBOM 12 vs 10, Vuln 8 vs 7, out_of_scope + conditional Applicability korrekt).
|
|
||||||
|
|
||||||
## Leitsatz
|
|
||||||
|
|
||||||
**Die Legal Obligation ist das fachliche Wissensobjekt der Plattform** — nicht der Master
|
|
||||||
Control. Controls sind Prüfstrategien / Erkennungsmuster / Evidenzsammler FÜR eine Obligation.
|
|
||||||
Ohne Zitierfähigkeit ist die Registry fachlich nicht belastbar: die erste Kundenfrage ist
|
|
||||||
immer „**Wo steht das?**".
|
|
||||||
|
|
||||||
## Zwei Assets, zwei Graphen, EIN Join (nicht verschmelzen, verbinden)
|
|
||||||
|
|
||||||
- **Asset 1 — Compliance Knowledge** (bereits gebaut): 313k atomare Controls, 33k Master
|
|
||||||
Controls, ~14k use-case-gemappt, Dedup, Obligation Layer, Applicability, Tiering, G/C/E.
|
|
||||||
- **Asset 2 — Zitierfähige Wissensbasis** (entsteht in anderer Session): Dokument → Chunk →
|
|
||||||
Paragraph → Span → Zitat.
|
|
||||||
|
|
||||||
Die beiden werden **NICHT verschmolzen** (das wäre wie eine normalisierte DB nach CSV zu
|
|
||||||
exportieren und neu zu importieren). Sie werden über die **Obligation gekoppelt**:
|
|
||||||
|
|
||||||
```
|
|
||||||
GRAPH 1 — Legal Knowledge Graph (Chat/Advisor) GRAPH 2 — Compliance Execution Graph (Engine)
|
|
||||||
Regulation → Annex/Artikel → Paragraph → Span Obligation → Control → Criterion → Evidence → Finding
|
|
||||||
\ /
|
|
||||||
\____ LEGAL OBLIGATION ______/ ← gemeinsame Sprache (der Join)
|
|
||||||
```
|
|
||||||
Chat: „diese Aussage stammt aus Absatz X." · Engine: „diese Obligation ist nicht erfüllt." →
|
|
||||||
beide meinen DIESELBE `obligation_id`.
|
|
||||||
|
|
||||||
## Registry-Schema v1
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
id: # snake_case, regulierungs-agnostisch (z.B. sbom_complete)
|
|
||||||
name: # kurz
|
|
||||||
description: # 1 Satz
|
|
||||||
tier: # LEGAL_MINIMUM | BEST_PRACTICE | IMPLEMENTATION_GUIDANCE | EVIDENCE
|
|
||||||
family: # Organisationshilfe (z.B. sbom, vulnerability_handling)
|
|
||||||
applicability: # universal | conditional:<pred> | domain:<x>
|
|
||||||
facets: # welche Evidenz-Facetten die Pflicht belegt
|
|
||||||
governance: bool
|
|
||||||
capability: bool
|
|
||||||
evidence: bool
|
|
||||||
legal_basis: # PRIMÄRRECHT — Pflicht zwingend (mind. 1 Anker für LEGAL_MINIMUM)
|
|
||||||
- source: CRA
|
|
||||||
regulation_code: eu_2024_2847
|
|
||||||
article: "" # falls zutreffend
|
|
||||||
annex: "Annex I, Part II"
|
|
||||||
section: ""
|
|
||||||
paragraph: ""
|
|
||||||
span_id: "" # harter Anker in die zitierfähige Wissensbasis (Asset 2)
|
|
||||||
document_version: ""
|
|
||||||
citation: "" # menschenlesbar
|
|
||||||
guidance_basis: # SEKUNDÄR — Umsetzung/Best Practice, NICHT Pflicht
|
|
||||||
- source: NIST SSDF
|
|
||||||
anchor: ""
|
|
||||||
role: best_practice # implementation_guidance | best_practice
|
|
||||||
member_controls: # control_uuids (Prüflogik aus Asset 1)
|
|
||||||
citation_anchor_ids: # span/paragraph-Anker (Asset 2) — auf der OBLIGATION, NICHT auf Controls
|
|
||||||
relationships: # siehe Beziehungsgraph
|
|
||||||
decision_method: # CONTENT/LLM | CONTENT/EMBEDDING | FIELD/REGEX | BEHAVIOR/PLAYWRIGHT ...
|
|
||||||
out_of_scope: [] # ausgeschlossene Cluster + Begründung
|
|
||||||
```
|
|
||||||
|
|
||||||
## Zitierfähigkeit hängt an der OBLIGATION (nicht an Controls)
|
|
||||||
|
|
||||||
258 SBOM-Controls → 11 Obligations: nur die **Obligation** speichert
|
|
||||||
`CRA / Annex I / Paragraph X / chunk_id / span_id / document_version`. Die 258 Controls zeigen
|
|
||||||
nur auf die `obligation_id`. Folge: **Regulierungsänderung (CRA v1→v2) = `citation_anchor`
|
|
||||||
tauschen, Controls bleiben identisch.** Massive Pflegeersparnis + Versionsstabilität.
|
|
||||||
|
|
||||||
## `legal_basis` vs `guidance_basis` + `source_role`
|
|
||||||
|
|
||||||
Damit beim Verschmelzen von CRA + NIST + OWASP zu einer Obligation NICHT verloren geht, was
|
|
||||||
Pflicht / Best Practice / Evidenz / Umsetzung ist, klassifiziert die Discovery-Pipeline jeden
|
|
||||||
Member/Cluster mit einer **`source_role`**:
|
|
||||||
|
|
||||||
```
|
|
||||||
LEGAL_BASIS → Primärrecht (begründet die Pflicht)
|
|
||||||
GUIDANCE → NIST/OWASP/ENISA/BSI/ISO (Umsetzung/Best Practice)
|
|
||||||
EVIDENCE → Nachweis/Bericht/Audit
|
|
||||||
IMPLEMENTATION → technische Umsetzungsanweisung
|
|
||||||
OUT_OF_SCOPE → gehört nicht zur Obligation (andere Regulierung/Domäne)
|
|
||||||
```
|
|
||||||
|
|
||||||
## HARTE Tier-Regel
|
|
||||||
|
|
||||||
Eine Obligation wird **`LEGAL_MINIMUM` nur mit mindestens einem Primärrechts-Anker**
|
|
||||||
(`legal_basis` nicht leer). Ohne Primärrechts-Anker:
|
|
||||||
`BEST_PRACTICE | IMPLEMENTATION_GUIDANCE | EVIDENCE` — **aber niemals Pflicht.**
|
|
||||||
|
|
||||||
## Beziehungsgraph (Ontologie)
|
|
||||||
|
|
||||||
**Strukturell** (bereits in der Pipeline): `same_obligation`, `sub_obligation`,
|
|
||||||
`applicability_variant`, `evidence_for`, `governance_for`, `out_of_scope`.
|
|
||||||
|
|
||||||
**Semantisch (NEU, P2-Ergänzung):** `requires`, `implements`, `supports`,
|
|
||||||
`produces_evidence_for`, `depends_on`, `derived_from`. Beispiele:
|
|
||||||
```
|
|
||||||
sbom_established --supports--> vulnerability_handling --supports--> incident_reporting
|
|
||||||
authentication --requires--> credential_management
|
|
||||||
```
|
|
||||||
→ für den Compliance Advisor extrem wertvoll (er kann Pflicht-Ketten erklären).
|
|
||||||
|
|
||||||
## Citation-Anchor-Pipeline (Document → Obligation, NICHT Document → Control)
|
|
||||||
|
|
||||||
Der neue Ingest erzeugt zusätzlich zu Chunk/Embedding: `paragraph_uuid`, `span_uuid`,
|
|
||||||
`document_version`, `legal_citation`, `referenced_articles`, `referenced_regulations`.
|
|
||||||
**Erst danach** läuft Obligation Discovery, sodass jede neu entdeckte Obligation sofort ihre
|
|
||||||
Primärquelle bekommt:
|
|
||||||
```
|
|
||||||
Neue Dokumente → Chunking → Span IDs → LLM („welche Obligation(en)?") → Confidence
|
|
||||||
→ Review → obligation.citation_anchor_ids[]
|
|
||||||
```
|
|
||||||
Die alten Controls werden wiederverwendet; die Pipeline erzeugt zusätzlich Obligation→Evidence
|
|
||||||
und Obligation→Citation-Anchors. **Kein Re-Ingest zum Neubau von Controls.**
|
|
||||||
|
|
||||||
## Sequenz (geändert — Registry vor weiteren Cuts)
|
|
||||||
|
|
||||||
```
|
|
||||||
SBOM ✓ → Vuln ✓ → Registry v1 (DIESE Spec) → Ontologie/Beziehungsgraph ergänzen
|
|
||||||
→ Authentication → Remote Access → Logging → Updates
|
|
||||||
```
|
|
||||||
Begründung: Schema jetzt billig änderbar; bei 300–1000 Obligations wird jede Schemaänderung
|
|
||||||
teuer. Fortschritt wird daran gemessen, ob jede neue Obligation die Registry besser macht —
|
|
||||||
nicht an neuen Controls.
|
|
||||||
-11
@@ -56,17 +56,6 @@ markdown_extensions:
|
|||||||
|
|
||||||
nav:
|
nav:
|
||||||
- Start: index.md
|
- Start: index.md
|
||||||
- Architektur RAG:
|
|
||||||
- Übersicht: architecture/index.md
|
|
||||||
- 01 Retrieval-Pipeline: architecture/01-retrieval.md
|
|
||||||
- 02 Authority-Re-Ranking: architecture/02-authority.md
|
|
||||||
- 03 source_class: architecture/03-source-class.md
|
|
||||||
- 04 source_role: architecture/04-source-role.md
|
|
||||||
- 05 Control-Intent + Diversity: architecture/05-control-intent.md
|
|
||||||
- 06 Assessment: architecture/06-assessment.md
|
|
||||||
- 07 Confidence: architecture/07-confidence.md
|
|
||||||
- 08 Explainability + Supersede: architecture/08-explainability.md
|
|
||||||
- 09 framework_*-Layer: architecture/09-framework-layer.md
|
|
||||||
- Services:
|
- Services:
|
||||||
- AI Compliance SDK:
|
- AI Compliance SDK:
|
||||||
- Uebersicht: services/ai-compliance-sdk/index.md
|
- Uebersicht: services/ai-compliance-sdk/index.md
|
||||||
|
|||||||
@@ -1,71 +0,0 @@
|
|||||||
{
|
|
||||||
"schema_version": "controls_for_obligation_mapping_v1",
|
|
||||||
"purpose": "Accepted CRA->OWASP controls (Compliance Execution Graph) for the Obligation Registry to propose the SEMANTIC control->obligation_id, replacing the coarse citation_unit interim join. Fill proposed_obligation_id per control, then we adopt it into control_mapping.obligation_id.",
|
|
||||||
"source": "ai-compliance-sdk control_mappings, mapping_status=accepted, reviewed_by=benjamin 2026-06-25",
|
|
||||||
"count": 7,
|
|
||||||
"controls": [
|
|
||||||
{
|
|
||||||
"framework": "OWASP ASVS",
|
|
||||||
"control": "V6.3.1",
|
|
||||||
"source_norm": "CRA Annex I Part I (2)(c) — Schutz vor unbefugtem Zugriff",
|
|
||||||
"citation_unit": "Annex I (2)(c)",
|
|
||||||
"family": "auth",
|
|
||||||
"mapping_type": "supports",
|
|
||||||
"proposed_obligation_id": ""
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"framework": "OWASP ASVS",
|
|
||||||
"control": "V6.1.1",
|
|
||||||
"source_norm": "CRA Annex I Part I (2)(c) — Schutz vor unbefugtem Zugriff",
|
|
||||||
"citation_unit": "Annex I (2)(c)",
|
|
||||||
"family": "auth",
|
|
||||||
"mapping_type": "supports",
|
|
||||||
"proposed_obligation_id": ""
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"framework": "OWASP ASVS",
|
|
||||||
"control": "V11.2.1",
|
|
||||||
"source_norm": "CRA Annex I Part I (2)(d) — Vertraulichkeit / Verschluesselung",
|
|
||||||
"citation_unit": "Annex I (2)(d)",
|
|
||||||
"family": "crypto",
|
|
||||||
"mapping_type": "supports",
|
|
||||||
"proposed_obligation_id": ""
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"framework": "OWASP ASVS",
|
|
||||||
"control": "V11.7.1",
|
|
||||||
"source_norm": "CRA Annex I Part I (2)(d) — Vertraulichkeit / Verschluesselung",
|
|
||||||
"citation_unit": "Annex I (2)(d)",
|
|
||||||
"family": "crypto",
|
|
||||||
"mapping_type": "supports",
|
|
||||||
"proposed_obligation_id": ""
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"framework": "OWASP ASVS",
|
|
||||||
"control": "V16.3.3",
|
|
||||||
"source_norm": "CRA Annex I Part I (2)(k) — Sicherheitsrelevante Ereignisse / Logging",
|
|
||||||
"citation_unit": "Annex I (2)(k)",
|
|
||||||
"family": "logging",
|
|
||||||
"mapping_type": "supports",
|
|
||||||
"proposed_obligation_id": ""
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"framework": "OWASP ASVS",
|
|
||||||
"control": "V16.3.4",
|
|
||||||
"source_norm": "CRA Annex I Part I (2)(k) — Sicherheitsrelevante Ereignisse / Logging",
|
|
||||||
"citation_unit": "Annex I (2)(k)",
|
|
||||||
"family": "logging",
|
|
||||||
"mapping_type": "supports",
|
|
||||||
"proposed_obligation_id": ""
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"framework": "OWASP ASVS",
|
|
||||||
"control": "V16.1.1",
|
|
||||||
"source_norm": "CRA Annex I Part I (2)(k) — Sicherheitsrelevante Ereignisse / Logging",
|
|
||||||
"citation_unit": "Annex I (2)(k)",
|
|
||||||
"family": "logging",
|
|
||||||
"mapping_type": "supports",
|
|
||||||
"proposed_obligation_id": ""
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,227 +0,0 @@
|
|||||||
{
|
|
||||||
"schema_version": "obligation_procedures_v1",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"layer": "Regulation -> Legal Obligation -> Procedure -> Control -> Evidence",
|
|
||||||
"note": "Procedure ist KEINE neue Compliance-Pflicht. LEGAL_MINIMUM liegt an der Obligation; die Procedure beschreibt, WIE sie umgesetzt wird; Evidence belegt die Umsetzung. source_role=procedural_requirement (Konvergenz mit der Legal-Knowledge-Engine der anderen Session).",
|
|
||||||
"citation_status": "pending_span_anchor",
|
|
||||||
"scope": "worked examples: SBOM + Vulnerability Handling",
|
|
||||||
"procedures": [
|
|
||||||
{
|
|
||||||
"procedure_id": "sbom_generation_process",
|
|
||||||
"name": "SBOM-Erstellungsprozess",
|
|
||||||
"description": "Erzeugen einer vollstaendigen, maschinenlesbaren Software Bill of Materials fuer ein Produkt mit digitalen Elementen.",
|
|
||||||
"source_role": "procedural_requirement",
|
|
||||||
"fulfills_obligations": ["sbom_creation", "sbom_dependency_coverage", "sbom_format_standard", "sbom_tooling_automation"],
|
|
||||||
"steps": [
|
|
||||||
"Komponenten und (direkte + transitive) Abhaengigkeiten inventarisieren",
|
|
||||||
"SBOM automatisiert in der Build-/Toolchain generieren",
|
|
||||||
"Komponenten, Versionen, Lizenzen und Lieferanten erfassen",
|
|
||||||
"in anerkanntem maschinenlesbarem Format (CycloneDX/SPDX) ausgeben",
|
|
||||||
"Format- und Schemavalidierung durchfuehren"
|
|
||||||
],
|
|
||||||
"controls": [
|
|
||||||
"SBOM-Datei vorhanden",
|
|
||||||
"Format ist maschinenlesbar und standardkonform (CycloneDX/SPDX)",
|
|
||||||
"direkte und transitive Abhaengigkeiten enthalten"
|
|
||||||
],
|
|
||||||
"evidence": ["sbom.cyclonedx.json", "Format-Validierungs-Log", "Build-/Toolchain-Konfiguration"],
|
|
||||||
"citation_spans": [], "citation_status": "pending_span_anchor"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"procedure_id": "sbom_update_process",
|
|
||||||
"name": "SBOM-Aktualisierungsprozess",
|
|
||||||
"description": "Halten der SBOM aktuell ueber den Produktlebenszyklus bei Komponenten-, Versions- und Patch-Aenderungen.",
|
|
||||||
"source_role": "procedural_requirement",
|
|
||||||
"fulfills_obligations": ["sbom_maintenance_update"],
|
|
||||||
"steps": [
|
|
||||||
"Komponentenaenderung erkennen (Dependency-/Patch-/Versionsaenderung)",
|
|
||||||
"SBOM neu generieren",
|
|
||||||
"Lieferanten-SBOMs aktualisieren",
|
|
||||||
"neue SBOM-Version speichern",
|
|
||||||
"SBOM in Release-Artefakte uebernehmen"
|
|
||||||
],
|
|
||||||
"controls": [
|
|
||||||
"CI prueft SBOM vorhanden",
|
|
||||||
"SBOM-Version passt zum Release",
|
|
||||||
"Supplier-Komponenten enthalten"
|
|
||||||
],
|
|
||||||
"evidence": ["sbom.json", "CI-Log", "Release-Artefakt", "Supplier-SBOM"],
|
|
||||||
"citation_spans": [], "citation_status": "pending_span_anchor"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"procedure_id": "sbom_supplier_integration_process",
|
|
||||||
"name": "Lieferanten-SBOM-Integration",
|
|
||||||
"description": "Beschaffen und Einarbeiten von Lieferanten-/Drittkomponenten-SBOMs in die Produkt-SBOM.",
|
|
||||||
"source_role": "procedural_requirement",
|
|
||||||
"fulfills_obligations": ["sbom_supply_chain_contracts", "sbom_dependency_coverage"],
|
|
||||||
"steps": [
|
|
||||||
"SBOM-Anforderung in Lieferantenvertraege aufnehmen",
|
|
||||||
"Lieferanten-SBOMs einsammeln",
|
|
||||||
"in die Produkt-SBOM mergen",
|
|
||||||
"Drittkomponenten und deren Abhaengigkeiten nachverfolgen"
|
|
||||||
],
|
|
||||||
"controls": [
|
|
||||||
"vertragliche SBOM-Klausel vorhanden",
|
|
||||||
"Lieferanten-SBOMs eingegangen",
|
|
||||||
"Drittkomponenten in der SBOM gelistet"
|
|
||||||
],
|
|
||||||
"evidence": ["Lieferantenvertrag-Klausel", "eingegangene Supplier-SBOMs", "gemergte SBOM"],
|
|
||||||
"citation_spans": [], "citation_status": "pending_span_anchor"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"procedure_id": "sbom_provision_process",
|
|
||||||
"name": "SBOM-Bereitstellungsprozess",
|
|
||||||
"description": "Zugaenglichmachen der SBOM fuer berechtigte Parteien (Nutzer, Behoerde) unter Wahrung der Vertraulichkeit.",
|
|
||||||
"source_role": "procedural_requirement",
|
|
||||||
"fulfills_obligations": ["sbom_access_provision", "sbom_authority_provision", "sbom_confidentiality"],
|
|
||||||
"steps": [
|
|
||||||
"Zugangskanal definieren (Portal/API/dokumentierter Pfad)",
|
|
||||||
"Nutzer ueber den Zugangsweg informieren",
|
|
||||||
"auf begruendetes Verlangen der Marktueberwachungsbehoerde vertraulich bereitstellen",
|
|
||||||
"Zugriffskontrolle und Vertraulichkeitsmassnahmen anwenden"
|
|
||||||
],
|
|
||||||
"controls": [
|
|
||||||
"Zugangspfad dokumentiert",
|
|
||||||
"Zugriffskontrolle/Vertraulichkeit umgesetzt",
|
|
||||||
"Behoerden-Bereitstellungsprozess definiert"
|
|
||||||
],
|
|
||||||
"evidence": ["Zugangskanal-Dokumentation", "Behoerden-Anfrage-Log", "Zugriffskontroll-Konfiguration"],
|
|
||||||
"citation_spans": [], "citation_status": "pending_span_anchor"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"procedure_id": "sbom_conformity_documentation_process",
|
|
||||||
"name": "SBOM in technischer Dokumentation/Konformitaet",
|
|
||||||
"description": "Aufnehmen der SBOM in die technische Dokumentation und Verifizieren der Vollstaendigkeit fuer die Konformitaetsbewertung.",
|
|
||||||
"source_role": "procedural_requirement",
|
|
||||||
"fulfills_obligations": ["sbom_technical_documentation", "sbom_completeness_verification"],
|
|
||||||
"steps": [
|
|
||||||
"SBOM in die technische Dokumentation aufnehmen",
|
|
||||||
"Vollstaendigkeit gegen die real eingesetzte Softwarekomposition pruefen",
|
|
||||||
"der Konformitaetsbewertung beilegen (ggf. EUCC)"
|
|
||||||
],
|
|
||||||
"controls": [
|
|
||||||
"SBOM Teil der technischen Dokumentation",
|
|
||||||
"Vollstaendigkeit verifiziert",
|
|
||||||
"Konformitaetsnachweis vorhanden"
|
|
||||||
],
|
|
||||||
"evidence": ["technische Dokumentation", "Vollstaendigkeits-Pruefbericht", "Konformitaetsnachweis"],
|
|
||||||
"citation_spans": [], "citation_status": "pending_span_anchor"
|
|
||||||
},
|
|
||||||
|
|
||||||
{
|
|
||||||
"procedure_id": "vuln_handling_process_setup",
|
|
||||||
"name": "Schwachstellenbehandlungsprozess einrichten",
|
|
||||||
"description": "Dokumentierten Prozess und Meldekanal (CVD) fuer die Schwachstellenbehandlung etablieren.",
|
|
||||||
"source_role": "procedural_requirement",
|
|
||||||
"fulfills_obligations": ["vuln_handling_process"],
|
|
||||||
"steps": [
|
|
||||||
"dokumentierten Schwachstellenbehandlungsprozess definieren",
|
|
||||||
"Coordinated-Vulnerability-Disclosure-Richtlinie und Meldekanal veroeffentlichen",
|
|
||||||
"eingehende Meldungen triagieren"
|
|
||||||
],
|
|
||||||
"controls": [
|
|
||||||
"Behandlungsprozess dokumentiert",
|
|
||||||
"Meldekanal/Kontaktstelle auffindbar (z.B. security.txt)",
|
|
||||||
"Triage-Verfahren vorhanden"
|
|
||||||
],
|
|
||||||
"evidence": ["Prozessdokument", "security.txt / Kontaktstelle", "Triage-Log"],
|
|
||||||
"citation_spans": [], "citation_status": "pending_span_anchor"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"procedure_id": "vuln_identification_process",
|
|
||||||
"name": "Schwachstellen-Identifikation",
|
|
||||||
"description": "Bekannte Schwachstellen in eingesetzten Komponenten erkennen und inventarisieren.",
|
|
||||||
"source_role": "procedural_requirement",
|
|
||||||
"fulfills_obligations": ["vuln_identification_inventory"],
|
|
||||||
"steps": [
|
|
||||||
"Advisories/CVE-Feeds beobachten",
|
|
||||||
"gegen die SBOM-Komponenten abgleichen",
|
|
||||||
"Schwachstellen-Inventar pflegen"
|
|
||||||
],
|
|
||||||
"controls": [
|
|
||||||
"Advisory-/CVE-Monitoring aktiv",
|
|
||||||
"SBOM-zu-CVE-Abgleich durchgefuehrt",
|
|
||||||
"Schwachstellen-Inventar gepflegt"
|
|
||||||
],
|
|
||||||
"evidence": ["CVE-Abgleich-Report", "Schwachstellen-Register"],
|
|
||||||
"citation_spans": [], "citation_status": "pending_span_anchor"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"procedure_id": "vuln_assessment_process",
|
|
||||||
"name": "Schwachstellen-Bewertung/Priorisierung",
|
|
||||||
"description": "Identifizierte Schwachstellen nach Schweregrad, Ausnutzbarkeit und Exposition bewerten und priorisieren.",
|
|
||||||
"source_role": "procedural_requirement",
|
|
||||||
"fulfills_obligations": ["vuln_assessment_prioritization"],
|
|
||||||
"steps": [
|
|
||||||
"Schweregrad bewerten (z.B. CVSS)",
|
|
||||||
"Ausnutzbarkeit/Exposition einschaetzen",
|
|
||||||
"risikobasiert priorisieren"
|
|
||||||
],
|
|
||||||
"controls": [
|
|
||||||
"Schweregrad standardisiert bewertet",
|
|
||||||
"risikobasierte Priorisierung vorhanden"
|
|
||||||
],
|
|
||||||
"evidence": ["Bewertungsdatensatz (CVSS)", "Prioritaetenliste"],
|
|
||||||
"citation_spans": [], "citation_status": "pending_span_anchor"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"procedure_id": "vuln_remediation_process",
|
|
||||||
"name": "Schwachstellen-Behebung",
|
|
||||||
"description": "Bekannte Schwachstellen fristgerecht durch Patches/Gegenmassnahmen beheben und Sicherheitsupdates bereitstellen.",
|
|
||||||
"source_role": "procedural_requirement",
|
|
||||||
"fulfills_obligations": ["vuln_remediation_patching"],
|
|
||||||
"steps": [
|
|
||||||
"Fix/Gegenmassnahme entwickeln",
|
|
||||||
"testen",
|
|
||||||
"Sicherheitsupdate kostenfrei und zeitnah bereitstellen",
|
|
||||||
"bis zum Abschluss nachverfolgen"
|
|
||||||
],
|
|
||||||
"controls": [
|
|
||||||
"zeitnahe Behebung",
|
|
||||||
"Sicherheitsupdate bereitgestellt",
|
|
||||||
"Follow-up bis Closure"
|
|
||||||
],
|
|
||||||
"evidence": ["Patch/Release", "Behebungs-Zeitleiste", "Follow-up-Log"],
|
|
||||||
"citation_spans": [], "citation_status": "pending_span_anchor"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"procedure_id": "vuln_disclosure_process",
|
|
||||||
"name": "Offenlegung + Nutzerinformation",
|
|
||||||
"description": "Koordinierte Offenlegung behobener Schwachstellen und Information der Nutzer ueber Schutzmassnahmen.",
|
|
||||||
"source_role": "procedural_requirement",
|
|
||||||
"fulfills_obligations": ["coordinated_vulnerability_disclosure", "vuln_info_dissemination_users"],
|
|
||||||
"steps": [
|
|
||||||
"Offenlegungszeitpunkt koordinieren",
|
|
||||||
"Security Advisory / CVE-Eintrag veroeffentlichen",
|
|
||||||
"Nutzer ueber behobene Schwachstelle und Schutzmassnahmen informieren"
|
|
||||||
],
|
|
||||||
"controls": [
|
|
||||||
"Advisory veroeffentlicht",
|
|
||||||
"Nutzer informiert"
|
|
||||||
],
|
|
||||||
"evidence": ["Security Advisory", "CVE-Eintrag", "Nutzer-Benachrichtigung"],
|
|
||||||
"citation_spans": [], "citation_status": "pending_span_anchor"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"procedure_id": "vuln_authority_reporting_process",
|
|
||||||
"name": "Behoerdenmeldung aktiv ausgenutzter Schwachstellen",
|
|
||||||
"description": "Aktiv ausgenutzte Schwachstellen fristgerecht an CSIRT/ENISA melden (CRA Art. 14-Kaskade).",
|
|
||||||
"source_role": "procedural_requirement",
|
|
||||||
"fulfills_obligations": ["exploited_vuln_reporting_authorities"],
|
|
||||||
"applicability_note": "bedingt: nur bei aktiv ausgenutzter Schwachstelle",
|
|
||||||
"steps": [
|
|
||||||
"aktive Ausnutzung erkennen",
|
|
||||||
"Fruehwarnung an CSIRT/ENISA (24h)",
|
|
||||||
"vollstaendige Meldung (72h)",
|
|
||||||
"Abschlussbericht (14 Tage)"
|
|
||||||
],
|
|
||||||
"controls": [
|
|
||||||
"24h-Fruehwarnung erfolgt",
|
|
||||||
"72h-Meldung erfolgt",
|
|
||||||
"14d-Abschlussbericht erfolgt"
|
|
||||||
],
|
|
||||||
"evidence": ["CSIRT/ENISA-Meldungsbelege", "Zeitstempel der Kaskade"],
|
|
||||||
"citation_spans": [], "citation_status": "pending_span_anchor"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
@@ -1,423 +0,0 @@
|
|||||||
{
|
|
||||||
"schema_version": "obligation_join_keys_v1",
|
|
||||||
"contract": "obligation_id ist der stabile Join-Key. Legal Knowledge Graph haengt citation_spans an obligation_id; Compliance Execution Graph mappt control_mapping.source_norm -> obligation_id. Interim-Bruecke = citation_units. obligation_id NIE neu vergeben (re-link).",
|
|
||||||
"count": 47,
|
|
||||||
"obligation_ids": [
|
|
||||||
{
|
|
||||||
"obligation_id": "sbom_creation",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "sbom",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I Part II (1)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "sbom_dependency_coverage",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "sbom",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Art. 3(36) i.V.m. Annex I Part II (1)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "sbom_format_standard",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "sbom",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I Part II (1)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "sbom_maintenance_update",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "sbom",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I Part II (1)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "sbom_completeness_verification",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "sbom",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "sbom_tooling_automation",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "sbom",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "IMPLEMENTATION"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "sbom_access_provision",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "sbom",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "sbom_authority_provision",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "sbom",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Art. 31 / Annex I Part II (1)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "sbom_confidentiality",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "sbom",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Art. 31(4)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "sbom_supply_chain_contracts",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "sbom",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "sbom_technical_documentation",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "sbom",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Art. 31 i.V.m. Annex VII"
|
|
||||||
],
|
|
||||||
"source_role": "EVIDENCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "vuln_identification_inventory",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "vuln",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I Part II (1)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "vuln_assessment_prioritization",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "vuln",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I Part II (1)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "vuln_remediation_patching",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "vuln",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I Part II (2) & (8)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "vuln_handling_process",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "vuln",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Article 13(8) & Annex VII"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "coordinated_vulnerability_disclosure",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "vuln",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I Part II (5)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "exploited_vuln_reporting_authorities",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "vuln",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Article 14 & Article 16"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "vuln_info_dissemination_users",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "vuln",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I Part II (4) & (6)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "user_authentication_required",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I (2)(d)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "authentication_policy_documented",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "auth_exceptions_documented",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "mfa_required",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "step_up_authentication",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "privileged_op_reauth",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "strong_crypto_authentication",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I (2)(e)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "credential_lifecycle_management",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "credential_confidentiality_protection",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I (2)(e)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "password_policy",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "no_default_credentials",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I (2)(a)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "account_lockout_failed_attempts",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "server_side_validation",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "session_binding_management",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "reauth_after_inactivity",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "token_validation_lifecycle",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "mutual_authentication",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "revocation_check",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "encrypted_auth_channel",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I (2)(e)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "tls_certificate_auth",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "service_to_service_auth",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "auth_key_management",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "biometric_authentication",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "federated_auth_assertions",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "separate_authn_authz",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "remote_access_authentication",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "supplier_access_auth",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "personal_admin_accounts",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "BEST_PRACTICE",
|
|
||||||
"citation_units": [],
|
|
||||||
"source_role": "GUIDANCE"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"obligation_id": "firmware_software_authentication",
|
|
||||||
"regulation": "CRA",
|
|
||||||
"family": "authentication",
|
|
||||||
"tier": "LEGAL_MINIMUM",
|
|
||||||
"citation_units": [
|
|
||||||
"Annex I (2)(c)"
|
|
||||||
],
|
|
||||||
"source_role": "LEGAL_BASIS"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
@@ -1,114 +0,0 @@
|
|||||||
"""Reine Helfer der Obligation Discovery Pipeline (keine schweren Imports → unit-testbar).
|
|
||||||
|
|
||||||
Die Pipeline leitet aus großen Compliance-Korpora eine regulatorische Ontologie ab:
|
|
||||||
Controls → Mikro-Cluster → Meta-Cluster/Review-Units → LLM-Synthese → Obligation Registry.
|
|
||||||
Architekturregel: RUNTIME bleibt deterministisch; DISCOVERY (dieses Tooling) darf LLM-gestützt
|
|
||||||
sein und läuft EINMALIG/offline. Siehe docs-src/development/obligation_discovery_pipeline_v1.md.
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import ast
|
|
||||||
import json
|
|
||||||
import math
|
|
||||||
from typing import Optional
|
|
||||||
|
|
||||||
SEMANTIC_EDGE_TYPES = ("depends_on", "supports", "produces_evidence_for",
|
|
||||||
"implements", "derived_from")
|
|
||||||
|
|
||||||
|
|
||||||
def parse_req(req) -> list:
|
|
||||||
"""requirements-Spalte (JSON ODER Python-Repr ODER String) robust zu Liste."""
|
|
||||||
if isinstance(req, list):
|
|
||||||
return req
|
|
||||||
if isinstance(req, str):
|
|
||||||
for fn in (json.loads, ast.literal_eval):
|
|
||||||
try:
|
|
||||||
v = fn(req)
|
|
||||||
return v if isinstance(v, list) else [str(v)]
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
return [req]
|
|
||||||
return []
|
|
||||||
|
|
||||||
|
|
||||||
def cosine(a, b) -> float:
|
|
||||||
if not a or not b:
|
|
||||||
return 0.0
|
|
||||||
dot = sum(x * y for x, y in zip(a, b))
|
|
||||||
na = math.sqrt(sum(x * x for x in a))
|
|
||||||
nb = math.sqrt(sum(y * y for y in b))
|
|
||||||
return dot / (na * nb) if na and nb else 0.0
|
|
||||||
|
|
||||||
|
|
||||||
def greedy_cluster(vecs: list, thr: float) -> list[dict]:
|
|
||||||
"""Single-Pass-Greedy-Clustering: jeder Vektor joint den ersten Cluster, dessen Seed
|
|
||||||
cosine ≥ thr ist, sonst neuer Cluster. Deterministisch (stabile Reihenfolge)."""
|
|
||||||
clusters: list[dict] = []
|
|
||||||
for i, v in enumerate(vecs):
|
|
||||||
if not v:
|
|
||||||
clusters.append({"seed": None, "members": [i]})
|
|
||||||
continue
|
|
||||||
best, best_sim = None, thr
|
|
||||||
for c in clusters:
|
|
||||||
if c["seed"] is None:
|
|
||||||
continue
|
|
||||||
s = cosine(v, c["seed"])
|
|
||||||
if s >= best_sim:
|
|
||||||
best_sim, best = s, c
|
|
||||||
if best:
|
|
||||||
best["members"].append(i)
|
|
||||||
else:
|
|
||||||
clusters.append({"seed": v, "members": [i]})
|
|
||||||
return clusters
|
|
||||||
|
|
||||||
|
|
||||||
def centroid(idxs: list[int], vecs: list) -> Optional[list]:
|
|
||||||
vs = [vecs[i] for i in idxs if vecs[i]]
|
|
||||||
if not vs:
|
|
||||||
return None
|
|
||||||
n = len(vs)
|
|
||||||
return [sum(col) / n for col in zip(*vs)]
|
|
||||||
|
|
||||||
|
|
||||||
def validate_registry(reg: dict) -> dict:
|
|
||||||
"""Belastbarkeits-Checks (User-Regeln): LEGAL_MINIMUM braucht legal_basis,
|
|
||||||
member_controls vollständig, out_of_scope separat, >8-Obligations/Review-Unit-Warnung."""
|
|
||||||
obls = reg.get("obligations", [])
|
|
||||||
lm = [o for o in obls if o.get("tier") == "LEGAL_MINIMUM"]
|
|
||||||
lm_without_basis = [o["id"] for o in lm if not o.get("legal_basis")]
|
|
||||||
empty_members = [o["id"] for o in obls if not o.get("member_controls")]
|
|
||||||
per_unit: dict[str, int] = {}
|
|
||||||
for o in obls:
|
|
||||||
ru = (o.get("provenance") or {}).get("source_meta_cluster")
|
|
||||||
if ru:
|
|
||||||
per_unit[ru] = per_unit.get(ru, 0) + 1
|
|
||||||
over8 = {ru: n for ru, n in per_unit.items() if n > 8}
|
|
||||||
rels = reg.get("relationships", [])
|
|
||||||
return {
|
|
||||||
"obligations": len(obls),
|
|
||||||
"legal_minimum": len(lm),
|
|
||||||
"lm_without_legal_basis": lm_without_basis,
|
|
||||||
"empty_member_controls": empty_members,
|
|
||||||
"over8_per_review_unit": over8,
|
|
||||||
"out_of_scope": sum(1 for r in rels if r.get("type") == "out_of_scope"),
|
|
||||||
"semantic_edges": sum(1 for r in rels if r.get("type") in SEMANTIC_EDGE_TYPES),
|
|
||||||
"passed": not lm_without_basis and not empty_members and not over8,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def merge_edges(relationships: list[dict], proposed: list[dict]) -> tuple[list[dict], int]:
|
|
||||||
"""Proposed semantische Kanten dedupliziert in relationships mergen. Gibt (merged, added)."""
|
|
||||||
existing = {(r.get("type"), r.get("from"), r.get("to"))
|
|
||||||
for r in relationships if r.get("from")}
|
|
||||||
added = 0
|
|
||||||
out = list(relationships)
|
|
||||||
for e in proposed:
|
|
||||||
if e.get("type") not in SEMANTIC_EDGE_TYPES:
|
|
||||||
continue
|
|
||||||
key = (e["type"], e.get("from"), e.get("to"))
|
|
||||||
if key in existing or not e.get("from") or not e.get("to"):
|
|
||||||
continue
|
|
||||||
out.append(e)
|
|
||||||
existing.add(key)
|
|
||||||
added += 1
|
|
||||||
return out, added
|
|
||||||
@@ -1,90 +0,0 @@
|
|||||||
"""P3 — Compliance-Advisor-Proof: obligation-basierte Antwort als vollstaendige
|
|
||||||
BEGRUENDUNGSKETTE aus der Registry (NICHT RAG-Text, KEIN LLM):
|
|
||||||
Rechtsgrundlage -> Obligation -> Procedure -> Controls -> Evidence -> Antwort.
|
|
||||||
Deterministisch + zitierfaehig. Der Unterschied zu RAG: RAG beantwortet — BreakPilot
|
|
||||||
begruendet UND operationalisiert.
|
|
||||||
|
|
||||||
python3 scripts/obligation_discovery/advisor_proof.py --registry obligations/cra.json \
|
|
||||||
--procedures obligations/cra_procedures.json --topic sbom --has-digital-elements
|
|
||||||
"""
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import argparse
|
|
||||||
import json
|
|
||||||
|
|
||||||
|
|
||||||
def applies(obl: dict, has_digital: bool) -> tuple[bool, str]:
|
|
||||||
a = obl.get("applicability", "universal")
|
|
||||||
if a == "universal":
|
|
||||||
return True, ""
|
|
||||||
if a.startswith("domain:products_with_digital_elements"):
|
|
||||||
return has_digital, "nur fuer Produkte mit digitalen Elementen (CRA Art. 3)"
|
|
||||||
if a.startswith("domain:"):
|
|
||||||
return True, a.split(":", 1)[1]
|
|
||||||
if a.startswith("conditional:"):
|
|
||||||
return True, f"bedingt: {a.split(':',1)[1]}"
|
|
||||||
return True, ""
|
|
||||||
|
|
||||||
|
|
||||||
def main() -> None:
|
|
||||||
ap = argparse.ArgumentParser()
|
|
||||||
ap.add_argument("--registry", required=True)
|
|
||||||
ap.add_argument("--procedures", required=True)
|
|
||||||
ap.add_argument("--topic", default="sbom")
|
|
||||||
ap.add_argument("--has-digital-elements", action="store_true")
|
|
||||||
ap.add_argument("--question", default="Muss ich als Maschinenbauer eine SBOM bereitstellen?")
|
|
||||||
a = ap.parse_args()
|
|
||||||
reg = json.load(open(a.registry, encoding="utf-8"))
|
|
||||||
procs = json.load(open(a.procedures, encoding="utf-8"))["procedures"]
|
|
||||||
|
|
||||||
obls = [o for o in reg["obligations"]
|
|
||||||
if a.topic in o.get("family", "") or a.topic in o["id"]]
|
|
||||||
ids = {o["id"] for o in obls}
|
|
||||||
by_obl: dict[str, list] = {}
|
|
||||||
for p in procs:
|
|
||||||
for oid in p.get("fulfills_obligations", []):
|
|
||||||
by_obl.setdefault(oid, []).append(p)
|
|
||||||
|
|
||||||
pflicht = [o for o in obls if o["tier"] == "LEGAL_MINIMUM" and applies(o, a.has_digital_elements)[0]]
|
|
||||||
best = [o for o in obls if o["tier"] != "LEGAL_MINIMUM"]
|
|
||||||
|
|
||||||
print(f"FRAGE: {a.question}")
|
|
||||||
print(f"\nANTWORT: {'JA' if pflicht and a.has_digital_elements else 'NUR WENN CRA-anwendbar'} — "
|
|
||||||
f"sofern das Produkt unter den CRA faellt (product with digital elements, Art. 3).")
|
|
||||||
print("\n══ BEGRUENDUNGSKETTE (Recht → Obligation → Procedure → Controls → Evidence) ══")
|
|
||||||
|
|
||||||
req_evidence: list[str] = []
|
|
||||||
for o in pflicht:
|
|
||||||
lb = "; ".join(f"{b.get('source','')} {b.get('anchor','')}".strip() for b in o.get("legal_basis", []))
|
|
||||||
print(f"\n● PFLICHT: {o['id']} — {o.get('description','')[:80]}")
|
|
||||||
print(f" Rechtsgrundlage: {lb or '—'}")
|
|
||||||
ps = by_obl.get(o["id"], [])
|
|
||||||
for p in ps:
|
|
||||||
print(f" Procedure (wie umgesetzt): {p['procedure_id']} — Schritte: {len(p.get('steps',[]))}")
|
|
||||||
print(f" Controls (Pruefung): {' · '.join(p.get('controls', []))[:96]}")
|
|
||||||
print(f" Nachweis: {' · '.join(p.get('evidence', []))}")
|
|
||||||
req_evidence += p.get("evidence", [])
|
|
||||||
if not ps:
|
|
||||||
print(" Procedure: (noch keine modelliert)")
|
|
||||||
|
|
||||||
print("\n── REQUIRED EVIDENCE (aggregiert, womit wird es nachgewiesen) ──")
|
|
||||||
print(" " + " · ".join(dict.fromkeys(req_evidence)) if req_evidence else " —")
|
|
||||||
|
|
||||||
print("\n── BEST PRACTICE (anerkannte Umsetzung, KEINE CRA-Wortlautpflicht) ──")
|
|
||||||
for o in best:
|
|
||||||
gb = "; ".join(b.get("source", "") for b in o.get("guidance_basis", []))
|
|
||||||
print(f" • {o['id']} — {o.get('description','')[:64]} | Guidance: {gb or '—'}")
|
|
||||||
|
|
||||||
print("\n── BEZIEHUNG (warum es zaehlt) ──")
|
|
||||||
for r in reg.get("relationships", []):
|
|
||||||
if r.get("from") in ids and r.get("to") not in ids:
|
|
||||||
print(f" • {r['from']} --{r['type']}--> {r['to']}: {r.get('note','')[:64]}")
|
|
||||||
|
|
||||||
pend = sum(1 for o in pflicht if o.get("citation_status") == "pending_span_anchor")
|
|
||||||
print(f"\n── CITATION ──\n {pend}/{len(pflicht)} Pflichten: pending_span_anchor "
|
|
||||||
f"(Textstellen-Anker folgen mit dem zitierfaehigen Re-Ingest)")
|
|
||||||
print("\n(RAG beantwortet — BreakPilot begruendet UND operationalisiert.)")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user