feat(pipeline): v3 — scoped control applicability + source_type classification

Phase 4: source_type (law/guideline/standard/restricted) on source_citation - NIST/OWASP/ENISA correctly shown as "Standard" instead of "Gesetzliche Grundlage" - Dynamic frontend labels based on source_type - Backfill endpoint POST /v1/canonical/generate/backfill-source-type Phase v3: Scoped Control Applicability - 3 new fields: applicable_industries, applicable_company_size, scope_conditions - LLM prompt extended with 39 industries, 5 company sizes, 10 scope signals - All 5 generation paths (Rule 1/2/3, batch structure, batch reform) updated - _build_control_from_json: parsing + validation (string→list, size validation) - _store_control: writes 3 new JSONB columns - API: response models, create/update requests, SELECT queries extended - Migration 063: 3 new JSONB columns with GIN indexes - 110 generator tests + 28 route tests = 138 total, all passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 16:28:05 +01:00
parent 3bb9fffab6
commit f2819b99af
9 changed files with 685 additions and 139 deletions
@@ -214,13 +214,13 @@ Wenn du z.B. eine neue `GetUserStats()` Funktion im Go Service hinzufuegst:

 ## Modul-spezifische Tests

-### Canonical Control Generator (81+ Tests)
+### Canonical Control Generator (98+ Tests)

 Die Control Library hat eine umfangreiche Test-Suite ueber 6 Dateien.
 Siehe [Canonical Control Library — Tests](../services/sdk-modules/canonical-control-library.md#tests) und [Control Generator Pipeline](../services/sdk-modules/control-generator-pipeline.md) fuer Details.

 ```bash
-# Alle Generator-Tests (81 Tests in 12 Klassen)
+# Alle Generator-Tests (98 Tests in 13 Klassen)
 cd backend-compliance && pytest -v tests/test_control_generator.py

 # Similarity Detector Tests
@@ -242,7 +242,7 @@ cd backend-compliance && pytest -v tests/test_validate_controls.py

 | Klasse | Tests | Prueft |
 |--------|-------|--------|
-| `TestLicenseMapping` | 12 | Lizenz-Klassifikation (Rule 1/2/3), Case-Insensitivitaet |
+| `TestLicenseMapping` | 13 | Lizenz-Klassifikation (Rule 1/2/3), Case-Insensitivitaet, source_type |
 | `TestDomainDetection` | 5 | Keyword-basierte Domain-Erkennung (AUTH, CRYP, NET, DATA) |
 | `TestJsonParsing` | 4 | JSON-Parser fuer LLM-Responses (Markdown-Fencing, Preamble) |
 | `TestGeneratedControlRules` | 3 | Rule-spezifische Felder (original_text, citation, source_info) |
@@ -254,3 +254,4 @@ cd backend-compliance && pytest -v tests/test_validate_controls.py
 | `TestRegulationFilter` | 5 | regulation_filter Prefix-Matching, leere regulation_codes |
 | `TestPipelineVersion` | 5 | pipeline_version=2 in DB-Writes, null-Handling in Structure/Reform |
 | `TestRecitalDetection` | 10 | Erwaegungsgrund-Erkennung in Quelltexten (Regex, Phrasen, Kombiniert) |
+| `TestSourceTypeClassification` | 16 | law/guideline/standard/restricted Klassifizierung aller Quellentypen |