feat(qa): recital detection, review split, duplicate comparison
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 42s
CI/CD / test-python-backend-compliance (push) Successful in 34s
CI/CD / test-python-document-crawler (push) Successful in 21s
CI/CD / test-python-dsms-gateway (push) Successful in 20s
CI/CD / validate-canonical-controls (push) Successful in 12s
CI/CD / Deploy (push) Has been skipped
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 42s
CI/CD / test-python-backend-compliance (push) Successful in 34s
CI/CD / test-python-document-crawler (push) Successful in 21s
CI/CD / test-python-dsms-gateway (push) Successful in 20s
CI/CD / validate-canonical-controls (push) Successful in 12s
CI/CD / Deploy (push) Has been skipped
Add _detect_recital() to QA pipeline — flags controls where source_original_text contains Erwägungsgrund markers instead of article text (28% of controls with source text affected). - Recital detection via regex + phrase matching in QA validation - 10 new tests (TestRecitalDetection), 81 total - ReviewCompare component for side-by-side duplicate comparison - Review mode split: Duplikat-Verdacht vs Rule-3-ohne-Anchor tabs - MkDocs: recital detection documentation - Detection script for bulk analysis (scripts/find_recital_controls.py) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -500,6 +500,39 @@ Die QA-Metriken werden in `generation_metadata` gespeichert:
|
||||
}
|
||||
```
|
||||
|
||||
### Recital-Erkennung (Erwägungsgrund-Detektion)
|
||||
|
||||
Die QA-Stufe prueft zusaetzlich, ob der `source_original_text` eines Controls tatsaechlich aus einem Gesetzesartikel stammt — oder aus einem Erwaegungsgrund (Recital). Erwaegungsgruende enthalten keine normativen Pflichten und fuehren zu falsch zugeordneten Controls.
|
||||
|
||||
**Erkennungsmethoden:**
|
||||
|
||||
| Methode | Pattern | Beispiel |
|
||||
|---------|---------|----------|
|
||||
| **Regex** | `\((\d{1,3})\)\s*\n` — Erwaegungsgrund-Nummern | `(126)\nUm den Verwaltungsaufwand...` |
|
||||
| **Phrasen** | Typische Recital-Formulierungen (≥2 Treffer) | "daher sollte", "in Erwägung nachstehender Gründe" |
|
||||
|
||||
**Ergebnis bei Verdacht:**
|
||||
|
||||
- `release_state` wird auf `needs_review` gesetzt
|
||||
- `generation_metadata.recital_suspect = true`
|
||||
- `generation_metadata.recital_detection` enthaelt Details:
|
||||
|
||||
```json
|
||||
{
|
||||
"recital_suspect": true,
|
||||
"recital_detection": {
|
||||
"recital_suspect": true,
|
||||
"recital_numbers": ["126", "127"],
|
||||
"recital_phrases": ["daher sollte"],
|
||||
"detection_method": "regex+phrases"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Funktion:** `_detect_recital(text)` in `control_generator.py`
|
||||
|
||||
**Hintergrund:** Bei der Analyse von ~5.500 Controls mit Quelltext wurden 1.555 (28%) als Erwaegungsgrund-Verdacht identifiziert. Der Document Crawler unterschied nicht zwischen Artikeltext und Erwaegungsgruenden, was zu falschen `article`/`paragraph`-Zuordnungen fuehrte.
|
||||
|
||||
### QA-Reklassifizierung bestehender Controls
|
||||
|
||||
```bash
|
||||
@@ -530,7 +563,7 @@ curl -X POST https://api-dev.breakpilot.ai/api/compliance/v1/canonical/generate/
|
||||
| `backend-compliance/migrations/046_control_generator.sql` | Job-Tracking, Chunk-Tracking Tabellen |
|
||||
| `backend-compliance/migrations/048_processing_path_expand.sql` | Erweiterte Processing-Path-Werte |
|
||||
| `backend-compliance/migrations/062_pipeline_version.sql` | `pipeline_version` Spalte |
|
||||
| `backend-compliance/tests/test_control_generator.py` | 15 Tests (Lizenz, Domain, Batch, Pipeline) |
|
||||
| `backend-compliance/tests/test_control_generator.py` | 81+ Tests (Lizenz, Domain, Batch, Pipeline, Recital) |
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user