cde670617e
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P72 MC-Scope-Classifier — pro MC den ECHTEN Doc-Adressaten festlegen
(cookie_richtlinie/dse/banner_implementation/cmp_audit/tom/avv/jc/
impressum/agb/widerruf/process/accounting/other).
- Migration 145: scope_doc_type Spalte + Index auf canonical_controls
- Backfill-Script mit Regex-Heuristik (12 Regeln, Prioritaet-sortiert)
- Erste 11k-Sample-Distribution: 76% other (Heuristik v1 zu strict —
v2 muss lockerere Patterns fuer DSE/TOM nachschaerfen)
- Ziel: bevor MC-Scorecard filtert, weiss jeder MC welches Dokument
er adressiert. Bisher landeten eHealth-/HGB-MCs im Cookie-Audit.
P80 Snapshot + Replay-Foundation — Roh-Daten persistieren damit
Audit-Pipeline ohne erneuten Crawl rebuildbar ist.
- Migration 146: compliance_check_snapshots Tabelle (JSONB pro
doc_entries/banner_result/profile/cmp_vendors/scan_context)
- services.check_snapshot.save_snapshot/load_snapshot/list
- Endpoints GET /snapshots, GET /snapshots/{id}
- Hook in _run_compliance_check: nach Mail-Send automatischer
Snapshot-Save via separater SessionLocal (background-task safe)
- Replay-Endpoint folgt im naechsten PR (braucht Refactoring
von _run_compliance_check in crawl_phase + interpret_phase)
- Effekt: Test-Cycle 7min -> 5sec bei reinen Logik-Aenderungen
(P73/P79/P81+ profitieren direkt). Snapshots dienen auch als
Regression-Test-Corpus (P81 Golden-Truth-Library).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
41 lines
1.9 KiB
SQL
41 lines
1.9 KiB
SQL
-- P80: Compliance-Check Snapshots fuer Replay-Mode
|
|
--
|
|
-- Persistiert die Roh-Daten eines Scans (DSE-Text, Banner-HTML, Cookies,
|
|
-- CMP-Vendors, Profile) damit die Audit-Pipeline ohne erneuten Crawl
|
|
-- nur die Interpretations-Logik (MC-Scorecard, Mail-Render) neu laufen
|
|
-- kann. Test-Cycle 7min -> 5-10sec bei reinen Logik-Aenderungen.
|
|
|
|
DO $$
|
|
BEGIN
|
|
CREATE TABLE IF NOT EXISTS compliance.compliance_check_snapshots (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
check_id VARCHAR(36) NOT NULL,
|
|
site_domain VARCHAR(255) NOT NULL,
|
|
site_label VARCHAR(255),
|
|
|
|
-- Roh-Daten als JSONB (alles was sich pro Lauf NICHT aendert)
|
|
doc_entries JSONB NOT NULL, -- [{doc_type, url, full_text, cmp_payloads, ...}]
|
|
banner_result JSONB, -- {phases, cookies_detailed, cmp_vendors, ...}
|
|
profile JSONB, -- {business_type, industry, no_direct_sales, ...}
|
|
scan_context JSONB, -- P79: User-Pre-Scan-Felder
|
|
cmp_vendors JSONB, -- vendor-list (post-Phase G)
|
|
|
|
-- Meta
|
|
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
|
|
replay_count INTEGER NOT NULL DEFAULT 0,
|
|
last_replay_at TIMESTAMP WITH TIME ZONE,
|
|
notes TEXT
|
|
);
|
|
|
|
CREATE INDEX IF NOT EXISTS idx_snapshots_check_id
|
|
ON compliance.compliance_check_snapshots(check_id);
|
|
CREATE INDEX IF NOT EXISTS idx_snapshots_domain
|
|
ON compliance.compliance_check_snapshots(site_domain);
|
|
CREATE INDEX IF NOT EXISTS idx_snapshots_created
|
|
ON compliance.compliance_check_snapshots(created_at DESC);
|
|
|
|
COMMENT ON TABLE compliance.compliance_check_snapshots IS
|
|
'P80 Replay-Mode: persistierte Roh-Daten eines Scans. Ermoeglicht '
|
|
'Audit-Pipeline ohne erneuten Browser-Crawl neu zu laufen.';
|
|
END $$;
|