feat(audit): overlapping evidence-slices fuer lueckenlose Beweiskette

Statt EIN full-page screenshot: full-page wird per PIL in viewport-grosse
Slices geschnitten, jede ueberlappt die vorherige um overlap_px Pixel.
Jeder Cookie erscheint in mind. einer Slice, an Slice-Grenzen sogar in
zwei → Dedup nach Name eliminiert die Doppel.

Warum nicht direkt scroll-based slicing in Playwright? VW's
Cookie-Page nutzt scroll-snap / fixed-position — alle viewport-shots
kamen identisch zurueck (Header-Overlay). PIL-cut auf dem full-page
PNG bypasst das Problem voellig.

VW smoke-test (32 slices):
  per-slice: [0, 0, 2, 5, 5, 3, 4, 7, 4, 3, 4, 5, ...]
  103 raw cookies → 79 unique nach dedup
  14 vendor records (Google 9, Adobe-Familie 17, etc.)

Jeder Slice hat eigenen Timestamp + SHA256 → ZIP-Anhang fuer
juristische Beweiskette.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-22 23:38:13 +02:00
parent 1784b43d72
commit efeef73f90
3 changed files with 300 additions and 1 deletions
+57 -1
View File
@@ -16,7 +16,10 @@ from services.consent_scanner import run_consent_test, ConsentTestResult
from services.authenticated_scanner import run_authenticated_test, AuthTestResult
from services.playwright_scanner import scan_website_playwright
from services.dsi_discovery import discover_dsi_documents, DSIDiscoveryResult
from services.page_screenshot import capture_page_evidence
from services.page_screenshot import (
capture_page_evidence,
capture_page_overlapping_slices,
)
from checks.banner_runner import map_scan_to_checks
logging.basicConfig(level=logging.INFO, format="%(levelname)s:%(name)s: %(message)s")
@@ -407,6 +410,59 @@ async def capture_evidence(req: EvidenceRequest):
)
# ── Evidence slices (overlapping scrolling screenshots) ─────────────
class EvidenceSlicesRequest(BaseModel):
url: str
check_id: str = ""
viewport_h: int = 1024
overlap_px: int = 200
max_slices: int = 40
class EvidenceSliceItem(BaseModel):
idx: int
ts: str
top_y: int
bot_y: int
sha256: str
png_b64: str
png_size: int
class EvidenceSlicesResponse(BaseModel):
url: str
total_height_px: int
width_px: int
accepted_banner: bool
expanded: int
slices: list[EvidenceSliceItem]
@app.post("/capture-evidence-slices", response_model=EvidenceSlicesResponse)
async def capture_evidence_slices(req: EvidenceSlicesRequest):
"""Overlapping viewport-screenshots fuer lueckenlose Beweiskette.
Jede Slice ueberlappt die vorherige um overlap_px Pixel — jeder Cookie
erscheint in mind. einem Bild, an Slice-Grenzen sogar in zwei. Dedup
nach Cookie-Name eliminiert die Doppel im Endresultat.
"""
logger.info("Capturing overlapping evidence slices for %s", req.url)
data = await capture_page_overlapping_slices(
req.url, check_id=req.check_id,
viewport_h=req.viewport_h, overlap_px=req.overlap_px,
max_slices=req.max_slices,
)
return EvidenceSlicesResponse(
url=data["url"],
total_height_px=data["total_height_px"],
width_px=data["width_px"],
accepted_banner=data["accepted_banner"],
expanded=data["expanded"],
slices=[EvidenceSliceItem(**s) for s in data["slices"]],
)
# ── Admin: CMP discoveries (Phase E) ────────────────────────────────
@app.get("/cmp-discoveries")