feat(pipeline): pipeline_version v2, migration 062, docs + 71 tests
- Add PIPELINE_VERSION=2 constant and pipeline_version column to canonical_controls and canonical_processed_chunks (migration 062) - Anthropic API decides chunk relevance via null-returns (skip_prefilter) - Annex/appendix chunks explicitly protected in prompts - Fix 6 failing tests (CRYP domain, _process_batch tuple return) - Add TestPipelineVersion + TestRegulationFilter test classes (10 new tests) - Add MkDocs page: control-generator-pipeline.md (541 lines) - Update canonical-control-library.md with v2 pipeline diagram - Update testing.md with 71-test breakdown table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -214,13 +214,13 @@ Wenn du z.B. eine neue `GetUserStats()` Funktion im Go Service hinzufuegst:
|
||||
|
||||
## Modul-spezifische Tests
|
||||
|
||||
### Canonical Control Generator (82 Tests)
|
||||
### Canonical Control Generator (71+ Tests)
|
||||
|
||||
Die Control Library hat eine umfangreiche Test-Suite ueber 6 Dateien.
|
||||
Siehe [Canonical Control Library — Tests](../services/sdk-modules/canonical-control-library.md#tests) fuer Details.
|
||||
Siehe [Canonical Control Library — Tests](../services/sdk-modules/canonical-control-library.md#tests) und [Control Generator Pipeline](../services/sdk-modules/control-generator-pipeline.md) fuer Details.
|
||||
|
||||
```bash
|
||||
# Alle Generator-Tests
|
||||
# Alle Generator-Tests (71 Tests in 10 Klassen)
|
||||
cd backend-compliance && pytest -v tests/test_control_generator.py
|
||||
|
||||
# Similarity Detector Tests
|
||||
@@ -237,10 +237,19 @@ cd backend-compliance && pytest -v tests/test_validate_controls.py
|
||||
```
|
||||
|
||||
**Wichtig:** Die Generator-Tests nutzen Mocks fuer Anthropic-API und Qdrant — sie laufen ohne externe Abhaengigkeiten.
|
||||
Die `TestPipelineMocked`-Klasse prueft insbesondere:
|
||||
|
||||
- Korrekte Lizenz-Klassifikation (Rule 1/2/3 Verhalten)
|
||||
- Rule 3 exponiert **keine** Quellennamen in `generation_metadata`
|
||||
- SHA-256 Hash-Deduplizierung fuer Chunks
|
||||
- Config-Defaults (`batch_size: 5`, `skip_processed: true`)
|
||||
- Rule 1 Citation wird korrekt mit Gesetzesreferenz generiert
|
||||
**Testklassen in `test_control_generator.py`:**
|
||||
|
||||
| Klasse | Tests | Prueft |
|
||||
|--------|-------|--------|
|
||||
| `TestLicenseMapping` | 12 | Lizenz-Klassifikation (Rule 1/2/3), Case-Insensitivitaet |
|
||||
| `TestDomainDetection` | 5 | Keyword-basierte Domain-Erkennung (AUTH, CRYP, NET, DATA) |
|
||||
| `TestJsonParsing` | 4 | JSON-Parser fuer LLM-Responses (Markdown-Fencing, Preamble) |
|
||||
| `TestGeneratedControlRules` | 3 | Rule-spezifische Felder (original_text, citation, source_info) |
|
||||
| `TestAnchorFinder` | 2 | RAG-Suche + Web-Framework-Erkennung |
|
||||
| `TestPipelineMocked` | 5 | End-to-End Pipeline mit Mocks (Lizenz, Hash-Dedup, Config) |
|
||||
| `TestParseJsonArray` | 15 | JSON-Array-Parser (Wrapper-Objekte, Bracket-Extraction, Fallbacks) |
|
||||
| `TestBatchSizeConfig` | 5 | Batch-Groesse-Konfiguration + Defaults |
|
||||
| `TestBatchProcessingLoop` | 10 | Batch-Verarbeitung (Rule-Split, Mixed-Rules, Too-Close, Null-Handling) |
|
||||
| `TestRegulationFilter` | 5 | regulation_filter Prefix-Matching, leere regulation_codes |
|
||||
| `TestPipelineVersion` | 5 | pipeline_version=2 in DB-Writes, null-Handling in Structure/Reform |
|
||||
|
||||
Reference in New Issue
Block a user