feat(audit-pipeline): P72-v2 Heuristik nachgeschaerft + P80 Mini-Replay-Endpoint
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 13s
CI / loc-budget (push) Failing after 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 36s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / nodejs-build (push) Has been skipped

P72-v2  MC-Scope-Classifier Heuristik v2 — v1 hatte 79% 'other'-Bucket
        (Patterns zu strict). v2 deckt deutlich breiter ab:
          - DSE: Art. 13/14 + Betroffenenrechte (Art. 15-22) + DSB +
            Aufsichtsbehoerde + Speicherdauer + besondere Kategorien
          - TOM: Art. 32 + Verschluesselung/Backup/Pseudonymisierung +
            Zugriffskontrolle + ISO 27001 + BSI-Grundschutz + Audit-Log
          - cookie_richtlinie: Tracking-Pixel + Webstorage + GA/Matomo/
            Hotjar/Pixel/GTM
          - process: VVT (Art. 30) + DSFA (Art. 35) + Datenpannen
            (Art. 33/34) + HinSchG + Schulungen + Loeschkonzept
        Script `backfill_mc_scope_v2.py` re-classifiziert NUR den
        'other'-Bucket (spezifische v1-Buckets bleiben unangetastet).

P80    Mini-Replay-Endpoint (v1):
          POST /compliance-check/snapshots/{id}/replay
          ?recipient=foo@bar.com & dry_run=false
        Laedt Snapshot, rendert Mail mit AKTUELLEM Render-Code (P63-P67,
        P59b/P61/P62). Sendet [REPLAY]-prefixed Mail oder gibt nur
        HTML-Stats zurueck (dry_run).
        Effekt: 7min Re-Scan -> 2-5sec fuer Mail-Layout-Iterationen.
        v2 (spaeter): MC-Scorecard mit aktuellem scope_doc_type-Filter
        ueber Snapshot — erfordert _run_compliance_check Refactoring.

Plus Bugfix: GET /snapshots/{id} raised jetzt HTTPException statt
Tuple-Return (FastAPI hat Tuple als JSON-Array zurueckgegeben).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-21 10:21:56 +02:00
parent cde670617e
commit 4946571863
3 changed files with 431 additions and 1 deletions
@@ -190,18 +190,44 @@ async def list_snapshots(domain: str = "", limit: int = 20):
@router.get("/snapshots/{snapshot_id}")
async def get_snapshot(snapshot_id: str):
"""P80: load full snapshot raw data."""
from fastapi import HTTPException
from database import SessionLocal
from compliance.services.check_snapshot import load_snapshot
db = SessionLocal()
try:
snap = load_snapshot(db, snapshot_id)
if not snap:
return {"error": "snapshot not found"}, 404
raise HTTPException(status_code=404, detail="snapshot not found")
return snap
finally:
db.close()
@router.post("/snapshots/{snapshot_id}/replay")
async def replay_snapshot(
snapshot_id: str,
recipient: str = "",
dry_run: bool = True,
):
"""P80: replay audit mail render from snapshot. 7min->2sec test cycle.
Default dry_run=true just returns rendered HTML size + section breakdown.
Pass recipient + dry_run=false to actually send a [REPLAY] mail.
"""
from database import SessionLocal
from compliance.services.check_replay import replay_from_snapshot
db = SessionLocal()
try:
return replay_from_snapshot(
db,
snapshot_id=snapshot_id,
recipient=(recipient if recipient else None),
dry_run=dry_run,
)
finally:
db.close()
async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
"""Background task: check all documents with business-profile context."""
try: