feat(control-pipeline): add applicability demo test package with evaluator
6 priority demo cases with golden outputs, evaluator.py and run_demo.py: - CASE-001: Webshop+Stripe (anti-PSD2 false positive) - CASE-002: Bank+TAN-Generator (scope override for batteries) - CASE-004: FinTech Wallet (true positive PSD2/AML) - CASE-006: SaaS+SMS Gateway (anti-TKG false positive) - CASE-008: Software→IoT Hardware (multi-regime scope) - CASE-011: Embedded Finance (escalation case) Self-test passes 6/6 against golden outputs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
53
control-pipeline/tests/applicability_demo/README.md
Normal file
53
control-pipeline/tests/applicability_demo/README.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# Applicability Engine Demo Package
|
||||
|
||||
## Inhalt
|
||||
- `demo_cases.yaml` — 6 priorisierte Demo- und Regressionstestfälle
|
||||
- `expected_outputs/CASE-*.json` — Golden Outputs für die 6 Fälle
|
||||
- `evaluator.py` — vergleicht tatsächliche Engine-Outputs gegen die Assertions
|
||||
- `run_demo.py` — einfacher Runner
|
||||
- `reports/` — Zielordner für JSON- und Markdown-Reports
|
||||
|
||||
## Schnellstart
|
||||
```bash
|
||||
python run_demo.py
|
||||
```
|
||||
|
||||
Das nutzt `expected_outputs/` als Self-Test.
|
||||
|
||||
## Gegen echte SDK-Outputs laufen lassen
|
||||
Lege pro Fall eine Datei `CASE-XYZ.json` mit folgendem Schema in ein Verzeichnis:
|
||||
|
||||
```json
|
||||
{
|
||||
"case_id": "CASE-001",
|
||||
"assigned_controls": [],
|
||||
"excluded_controls": [],
|
||||
"escalations": [],
|
||||
"inferred_industries": [],
|
||||
"confidence": {
|
||||
"overall": 0.0,
|
||||
"industry_assignment": 0.0,
|
||||
"control_assignment": 0.0
|
||||
},
|
||||
"explanation": "",
|
||||
"uncertainty_flags": []
|
||||
}
|
||||
```
|
||||
|
||||
Dann:
|
||||
|
||||
```bash
|
||||
python run_demo.py --actual-dir /pfad/zu/deinen/outputs
|
||||
```
|
||||
|
||||
## Testlogik
|
||||
Der Evaluator prüft:
|
||||
- `must_assign`
|
||||
- `must_not_assign`
|
||||
- `escalate_for_legal_review`
|
||||
- `inferred_industries.must_include`
|
||||
- `inferred_industries.must_not_include`
|
||||
- `reasoning_must_contain`
|
||||
|
||||
Zusätzlich gibt es Warnings, wenn Grenzfälle eskaliert sind, aber keine `uncertainty_flags`
|
||||
gesetzt wurden oder die Confidence unplausibel hoch ist.
|
||||
Reference in New Issue
Block a user