feat(pipeline): v3 — scoped control applicability + source_type classification
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 36s
CI/CD / test-python-backend-compliance (push) Successful in 36s
CI/CD / test-python-document-crawler (push) Successful in 27s
CI/CD / test-python-dsms-gateway (push) Successful in 18s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Has been skipped
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 36s
CI/CD / test-python-backend-compliance (push) Successful in 36s
CI/CD / test-python-document-crawler (push) Successful in 27s
CI/CD / test-python-dsms-gateway (push) Successful in 18s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Has been skipped
Phase 4: source_type (law/guideline/standard/restricted) on source_citation - NIST/OWASP/ENISA correctly shown as "Standard" instead of "Gesetzliche Grundlage" - Dynamic frontend labels based on source_type - Backfill endpoint POST /v1/canonical/generate/backfill-source-type Phase v3: Scoped Control Applicability - 3 new fields: applicable_industries, applicable_company_size, scope_conditions - LLM prompt extended with 39 industries, 5 company sizes, 10 scope signals - All 5 generation paths (Rule 1/2/3, batch structure, batch reform) updated - _build_control_from_json: parsing + validation (string→list, size validation) - _store_control: writes 3 new JSONB columns - API: response models, create/update requests, SELECT queries extended - Migration 063: 3 new JSONB columns with GIN indexes - 110 generator tests + 28 route tests = 138 total, all passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -98,6 +98,7 @@ erDiagram
|
||||
varchar generation_strategy
|
||||
smallint pipeline_version
|
||||
integer license_rule
|
||||
jsonb source_citation
|
||||
jsonb open_anchors
|
||||
}
|
||||
canonical_control_mappings {
|
||||
@@ -316,7 +317,7 @@ Der Validator (`scripts/validate-controls.py`) prueft bei jedem Commit:
|
||||
|
||||
- Ziel, Begruendung, Geltungsbereich
|
||||
- Anforderungen, Pruefverfahren, Nachweise
|
||||
- **Gesetzliche Grundlage** (blaue Box): source_citation mit Artikel, Paragraph, Lizenz, Link
|
||||
- **Quellennachweis** (dynamische Farbe): `source_type`-basiert — blau fuer Gesetze, indigo fuer Leitlinien, teal fuer Standards
|
||||
- **Open-Source-Referenzen** (gruener Kasten): Verlinkte Open Anchors
|
||||
- Generierungsdetails: processing_path, similarity_status
|
||||
- Tags, Risk Score, Implementation Effort
|
||||
@@ -613,15 +614,19 @@ Bei der Generierung werden automatisch zugewiesen:
|
||||
|
||||
### Architektur-Entscheidung: Gesetzesverweise
|
||||
|
||||
Controls leiten sich aus zwei Quellen ab:
|
||||
Controls leiten sich aus vier Quellentypen ab (Feld `source_citation.source_type`):
|
||||
|
||||
1. **Direkte gesetzliche Pflichten (Rule 1):** z.B. DSGVO Art. 32 erzwingt "technische und organisatorische Massnahmen". Diese Controls haben `source_citation` mit exakter Gesetzesreferenz und Originaltext.
|
||||
| source_type | Beschreibung | Beispiele | Frontend-Darstellung |
|
||||
|-------------|-------------|-----------|---------------------|
|
||||
| `law` | Bindendes EU/DE/AT-Recht | DSGVO, AI Act, BDSG, NIS2 | Blaue Box "Gesetzliche Grundlage" + Badge "Direkte gesetzliche Pflicht" |
|
||||
| `guideline` | Behoerdliche Leitlinien (Soft Law) | EDPB, WP29, Blue Guide | Indigo Box "Behoerdliche Leitlinie" + Badge "Aufsichtsbehoerdliche Empfehlung" |
|
||||
| `standard` | Freiwillige Standards/Frameworks | NIST, OWASP, ENISA, CISA, OECD | Teal Box "Standard / Best Practice" + Badge "Freiwilliger Standard" |
|
||||
| `restricted` | Geschuetzte Normen (Rule 3) | BSI, ISO, ETSI | Amber Box "Abgeleitet aus regulatorischen Anforderungen" (kein Originaltext) |
|
||||
|
||||
2. **Implizite Umsetzung ueber Best Practices (Rule 2/3):** z.B. OWASP ASVS V2.7 fordert MFA — das ist keine gesetzliche Pflicht, aber eine Best Practice um NIS2 Art. 21 oder DSGVO Art. 32 zu erfuellen. Diese Controls haben Open-Source-Referenzen (Anchors).
|
||||
|
||||
**Im Frontend:**
|
||||
- Rule 1/2 Controls zeigen eine blaue "Gesetzliche Grundlage" Box mit Gesetz, Artikel und Link
|
||||
- Rule 3 Controls zeigen einen Hinweis dass sie implizit Gesetze umsetzen, mit Verweis auf die Referenzen
|
||||
!!! warning "source_type vs license_rule"
|
||||
`source_type` klassifiziert die **rechtliche Verbindlichkeit** (Ist es ein Gesetz?).
|
||||
`license_rule` klassifiziert das **Urheberrecht** (Darf man den Text zitieren?).
|
||||
Beispiel: NIST ist Rule 1 (Public Domain = freie Nutzung) aber `source_type = "standard"` (kein EU-Gesetz).
|
||||
|
||||
### API
|
||||
|
||||
@@ -816,8 +821,8 @@ curl -X POST https://api-dev.breakpilot.ai/api/compliance/v1/canonical/controls
|
||||
| `backend-compliance/tests/test_canonical_control_routes.py` | Python | 14 Tests | REST API Endpoints |
|
||||
| `backend-compliance/tests/test_license_gate.py` | Python | 12 Tests | Lizenz-Klassifikation |
|
||||
| `backend-compliance/tests/test_validate_controls.py` | Python | 14 Tests | CI/CD Validator |
|
||||
| `backend-compliance/tests/test_control_generator.py` | Python | 81 Tests | Pipeline, Batch, Lizenzregeln, QA, Recital |
|
||||
| **Gesamt** | | **149+ Tests** |
|
||||
| `backend-compliance/tests/test_control_generator.py` | Python | 98 Tests | Pipeline, Batch, Lizenzregeln, QA, Recital, Source-Type |
|
||||
| **Gesamt** | | **166+ Tests** |
|
||||
|
||||
### Control Generator Tests (test_control_generator.py)
|
||||
|
||||
@@ -825,7 +830,7 @@ Die Generator-Tests decken folgende Bereiche ab:
|
||||
|
||||
| Klasse | Tests | Prueft |
|
||||
|--------|-------|--------|
|
||||
| `TestLicenseMapping` | 12 | Lizenz-Klassifikation (Rule 1/2/3), Case-Insensitivitaet |
|
||||
| `TestLicenseMapping` | 13 | Lizenz-Klassifikation (Rule 1/2/3), Case-Insensitivitaet, source_type |
|
||||
| `TestDomainDetection` | 5 | Keyword-basierte Domain-Erkennung (AUTH, CRYP, NET, DATA) |
|
||||
| `TestJsonParsing` | 4 | JSON-Parser fuer LLM-Responses (Markdown-Fencing, Preamble) |
|
||||
| `TestGeneratedControlRules` | 3 | Rule-spezifische Felder (original_text, citation, source_info) |
|
||||
@@ -837,6 +842,7 @@ Die Generator-Tests decken folgende Bereiche ab:
|
||||
| `TestRegulationFilter` | 5 | regulation_filter Prefix-Matching, leere regulation_codes |
|
||||
| `TestPipelineVersion` | 5 | pipeline_version=2 in DB-Writes, null-Handling in Structure/Reform |
|
||||
| `TestRecitalDetection` | 10 | Erwaegungsgrund-Erkennung in Quelltexten (Regex, Phrasen, Kombiniert) |
|
||||
| `TestSourceTypeClassification` | 16 | law/guideline/standard/restricted Klassifizierung aller Quellentypen |
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user