Files

Benjamin Admin ae5c5c24eb feat(control-pipeline): add applicability demo test package with evaluator

6 priority demo cases with golden outputs, evaluator.py and run_demo.py:
- CASE-001: Webshop+Stripe (anti-PSD2 false positive)
- CASE-002: Bank+TAN-Generator (scope override for batteries)
- CASE-004: FinTech Wallet (true positive PSD2/AML)
- CASE-006: SaaS+SMS Gateway (anti-TKG false positive)
- CASE-008: Software→IoT Hardware (multi-regime scope)
- CASE-011: Embedded Finance (escalation case)

Self-test passes 6/6 against golden outputs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-23 19:08:31 +02:00

expected_outputs

feat(control-pipeline): add applicability demo test package with evaluator

2026-04-23 19:08:31 +02:00

reports

feat(control-pipeline): add applicability demo test package with evaluator

2026-04-23 19:08:31 +02:00

demo_cases.yaml

feat(control-pipeline): add applicability demo test package with evaluator

2026-04-23 19:08:31 +02:00

evaluator.py

feat(control-pipeline): add applicability demo test package with evaluator

2026-04-23 19:08:31 +02:00

README.md

feat(control-pipeline): add applicability demo test package with evaluator

2026-04-23 19:08:31 +02:00

run_demo.py

feat(control-pipeline): add applicability demo test package with evaluator

2026-04-23 19:08:31 +02:00

README.md

Applicability Engine Demo Package

Inhalt

demo_cases.yaml — 6 priorisierte Demo- und Regressionstestfälle
expected_outputs/CASE-*.json — Golden Outputs für die 6 Fälle
evaluator.py — vergleicht tatsächliche Engine-Outputs gegen die Assertions
run_demo.py — einfacher Runner
reports/ — Zielordner für JSON- und Markdown-Reports

Schnellstart

python run_demo.py

Das nutzt expected_outputs/ als Self-Test.

Gegen echte SDK-Outputs laufen lassen

Lege pro Fall eine Datei CASE-XYZ.json mit folgendem Schema in ein Verzeichnis:

{
  "case_id": "CASE-001",
  "assigned_controls": [],
  "excluded_controls": [],
  "escalations": [],
  "inferred_industries": [],
  "confidence": {
    "overall": 0.0,
    "industry_assignment": 0.0,
    "control_assignment": 0.0
  },
  "explanation": "",
  "uncertainty_flags": []
}

Dann:

python run_demo.py --actual-dir /pfad/zu/deinen/outputs

Testlogik

Der Evaluator prüft:

must_assign
must_not_assign
escalate_for_legal_review
inferred_industries.must_include
inferred_industries.must_not_include
reasoning_must_contain

Zusätzlich gibt es Warnings, wenn Grenzfälle eskaliert sind, aber keine uncertainty_flags gesetzt wurden oder die Confidence unplausibel hoch ist.