feat(audit-pipeline): P72 MC-Scope-Classifier + P80 Snapshot/Replay-Foundation [migration-approved]
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P72 MC-Scope-Classifier — pro MC den ECHTEN Doc-Adressaten festlegen
(cookie_richtlinie/dse/banner_implementation/cmp_audit/tom/avv/jc/
impressum/agb/widerruf/process/accounting/other).
- Migration 145: scope_doc_type Spalte + Index auf canonical_controls
- Backfill-Script mit Regex-Heuristik (12 Regeln, Prioritaet-sortiert)
- Erste 11k-Sample-Distribution: 76% other (Heuristik v1 zu strict —
v2 muss lockerere Patterns fuer DSE/TOM nachschaerfen)
- Ziel: bevor MC-Scorecard filtert, weiss jeder MC welches Dokument
er adressiert. Bisher landeten eHealth-/HGB-MCs im Cookie-Audit.
P80 Snapshot + Replay-Foundation — Roh-Daten persistieren damit
Audit-Pipeline ohne erneuten Crawl rebuildbar ist.
- Migration 146: compliance_check_snapshots Tabelle (JSONB pro
doc_entries/banner_result/profile/cmp_vendors/scan_context)
- services.check_snapshot.save_snapshot/load_snapshot/list
- Endpoints GET /snapshots, GET /snapshots/{id}
- Hook in _run_compliance_check: nach Mail-Send automatischer
Snapshot-Save via separater SessionLocal (background-task safe)
- Replay-Endpoint folgt im naechsten PR (braucht Refactoring
von _run_compliance_check in crawl_phase + interpret_phase)
- Effekt: Test-Cycle 7min -> 5sec bei reinen Logik-Aenderungen
(P73/P79/P81+ profitieren direkt). Snapshots dienen auch als
Regression-Test-Corpus (P81 Golden-Truth-Library).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -155,6 +155,53 @@ async def get_compliance_check_status(check_id: str):
|
||||
)
|
||||
|
||||
|
||||
# ── P80: Snapshot + Replay ───────────────────────────────────────────
|
||||
|
||||
@router.get("/snapshots")
|
||||
async def list_snapshots(domain: str = "", limit: int = 20):
|
||||
"""P80: list recent snapshots, optionally filtered by site_domain."""
|
||||
from database import SessionLocal
|
||||
from compliance.services.check_snapshot import list_snapshots_for_domain
|
||||
db = SessionLocal()
|
||||
try:
|
||||
if domain:
|
||||
return {"snapshots": list_snapshots_for_domain(db, domain, limit)}
|
||||
from sqlalchemy import text
|
||||
rows = db.execute(
|
||||
text("""
|
||||
SELECT id, check_id, site_domain, site_label, created_at,
|
||||
replay_count, notes
|
||||
FROM compliance.compliance_check_snapshots
|
||||
ORDER BY created_at DESC
|
||||
LIMIT :lim
|
||||
"""),
|
||||
{"lim": limit},
|
||||
).fetchall()
|
||||
return {"snapshots": [
|
||||
{"id": str(r[0]), "check_id": r[1], "site_domain": r[2],
|
||||
"site_label": r[3], "created_at": str(r[4]),
|
||||
"replay_count": r[5], "notes": r[6]}
|
||||
for r in rows
|
||||
]}
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
|
||||
@router.get("/snapshots/{snapshot_id}")
|
||||
async def get_snapshot(snapshot_id: str):
|
||||
"""P80: load full snapshot raw data."""
|
||||
from database import SessionLocal
|
||||
from compliance.services.check_snapshot import load_snapshot
|
||||
db = SessionLocal()
|
||||
try:
|
||||
snap = load_snapshot(db, snapshot_id)
|
||||
if not snap:
|
||||
return {"error": "snapshot not found"}, 404
|
||||
return snap
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
|
||||
async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
"""Background task: check all documents with business-profile context."""
|
||||
try:
|
||||
@@ -1028,6 +1075,29 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
|
||||
_compliance_check_jobs[check_id]["progress"] = "Fertig"
|
||||
_compliance_check_jobs[check_id]["progress_pct"] = 100
|
||||
|
||||
# P80: persist raw scan data so we can replay audit pipeline
|
||||
# without re-crawling (7min -> 5sec test cycle).
|
||||
try:
|
||||
from database import SessionLocal
|
||||
from compliance.services.check_snapshot import save_snapshot
|
||||
snap_db = SessionLocal()
|
||||
try:
|
||||
save_snapshot(
|
||||
snap_db,
|
||||
check_id=check_id,
|
||||
doc_entries=doc_entries,
|
||||
banner_result=banner_result,
|
||||
profile=profile,
|
||||
cmp_vendors=cmp_vendors,
|
||||
scan_context=None, # P79 will fill this
|
||||
site_label=site_name,
|
||||
notes=f"recipient={req.recipient}",
|
||||
)
|
||||
finally:
|
||||
snap_db.close()
|
||||
except Exception as snap_err:
|
||||
logger.warning("P80 snapshot save skipped: %s", snap_err)
|
||||
|
||||
# Persist to sidecar SQLite audit log — enables /audit endpoints
|
||||
# (A5 admin tab) and trend view (A6). Best-effort; failures here
|
||||
# do not affect the user-facing response.
|
||||
|
||||
Reference in New Issue
Block a user