feat: Obligation-Deduplizierung — 34.617 Duplikate als 'duplicate' markiert
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 33s
CI/CD / test-python-backend-compliance (push) Successful in 35s
CI/CD / test-python-document-crawler (push) Successful in 30s
CI/CD / test-python-dsms-gateway (push) Successful in 20s
CI/CD / validate-canonical-controls (push) Successful in 13s
CI/CD / Deploy (push) Successful in 3s
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 33s
CI/CD / test-python-backend-compliance (push) Successful in 35s
CI/CD / test-python-document-crawler (push) Successful in 30s
CI/CD / test-python-dsms-gateway (push) Successful in 20s
CI/CD / validate-canonical-controls (push) Successful in 13s
CI/CD / Deploy (push) Successful in 3s
Neue Endpunkte POST /obligations/dedup und GET /obligations/dedup-stats. Pro candidate_id wird der aelteste Eintrag behalten, alle weiteren erhalten release_state='duplicate' mit merged_into_id + quality_flags fuer Traceability. Detail-View filtert Duplikate aus. MKDocs aktualisiert. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -152,6 +152,8 @@ erDiagram
|
||||
| `POST` | `/v1/canonical/generate/backfill-domain` | Domain/Category/Target-Audience nachpflegen (Anthropic) |
|
||||
| `GET` | `/v1/canonical/blocked-sources` | Gesperrte Quellen (Rule 3) |
|
||||
| `POST` | `/v1/canonical/blocked-sources/cleanup` | Cleanup-Workflow starten |
|
||||
| `POST` | `/v1/canonical/obligations/dedup` | Obligation-Duplikate markieren (dry_run, batch_size, offset) |
|
||||
| `GET` | `/v1/canonical/obligations/dedup-stats` | Dedup-Statistik (total, by_state, pending) |
|
||||
|
||||
### Beispiel: Control abrufen
|
||||
|
||||
@@ -984,6 +986,37 @@ vom Parent-Obligation uebernommen.
|
||||
**Datei:** `compliance/services/decomposition_pass.py`
|
||||
**Test-Script:** `scripts/qa/test_pass0a.py` (standalone, speichert JSON)
|
||||
|
||||
#### Obligation Deduplizierung
|
||||
|
||||
Die Decomposition-Pipeline erzeugt pro Rich Control mehrere Obligation Candidates.
|
||||
Durch Wiederholungen in der Pipeline koennen identische `candidate_id`-Eintraege
|
||||
mehrfach existieren (z.B. 5x `OC-AUTH-839-01` mit leicht unterschiedlichem Text).
|
||||
|
||||
**Dedup-Strategie:** Pro `candidate_id` wird der aelteste Eintrag (`MIN(created_at)`)
|
||||
behalten. Alle anderen erhalten:
|
||||
|
||||
- `release_state = 'duplicate'`
|
||||
- `merged_into_id` → UUID des behaltenen Eintrags
|
||||
- `quality_flags.dedup_reason` → z.B. `"duplicate of OC-AUTH-839-01"`
|
||||
|
||||
**Endpunkte:**
|
||||
|
||||
```bash
|
||||
# Dry Run — zaehlt betroffene Duplikat-Gruppen
|
||||
curl -X POST "https://macmini:8002/api/compliance/v1/canonical/obligations/dedup?dry_run=true"
|
||||
|
||||
# Ausfuehren — markiert alle Duplikate
|
||||
curl -X POST "https://macmini:8002/api/compliance/v1/canonical/obligations/dedup?dry_run=false"
|
||||
|
||||
# Statistiken
|
||||
curl "https://macmini:8002/api/compliance/v1/canonical/obligations/dedup-stats"
|
||||
```
|
||||
|
||||
**Stand (2026-03-26):** 76.046 Obligations gesamt, davon 34.617 als `duplicate` markiert.
|
||||
41.043 aktive Obligations verbleiben (composed + validated).
|
||||
|
||||
**Migration:** `081_obligation_dedup_state.sql` — Fuegt `'duplicate'` zum `release_state` Constraint hinzu.
|
||||
|
||||
---
|
||||
|
||||
### Migration Passes (1-5)
|
||||
@@ -1033,6 +1066,9 @@ Die Crosswalk-Matrix bildet diese N:M-Beziehung ab.
|
||||
|---------|-------------|
|
||||
| `obligation_candidates` | Extrahierte atomare Pflichten aus Rich Controls |
|
||||
| `obligation_candidates.obligation_type` | `pflicht` / `empfehlung` / `kann` (3-Tier-Klassifizierung) |
|
||||
| `obligation_candidates.release_state` | `extracted` / `validated` / `rejected` / `composed` / `merged` / `duplicate` |
|
||||
| `obligation_candidates.merged_into_id` | UUID des behaltenen Eintrags (bei `duplicate`/`merged`) |
|
||||
| `obligation_candidates.quality_flags` | JSONB mit Metadaten (u.a. `dedup_reason`, `dedup_kept_id`) |
|
||||
| `canonical_controls.parent_control_uuid` | Self-Referenz zum Rich Control (neues Feld) |
|
||||
| `canonical_controls.decomposition_method` | Zerlegungsmethode (neues Feld) |
|
||||
| `canonical_controls.obligation_type` | Uebernommen von Obligation: pflicht/empfehlung/kann |
|
||||
|
||||
Reference in New Issue
Block a user