feat: 4 remaining tasks — EU institutions, banner integration, JS-sites, Caritas fixes
Build + Deploy / build-ai-sdk (push) Failing after 36s
Build + Deploy / build-developer-portal (push) Successful in 8s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 7s
Build + Deploy / build-admin-compliance (push) Successful in 8s
Build + Deploy / build-backend-compliance (push) Successful in 8s
CI / nodejs-build (push) Successful in 3m14s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 46s
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Successful in 29s
CI / test-python-dsms-gateway (push) Successful in 30s
CI / validate-canonical-controls (push) Successful in 16s
Build + Deploy / build-dsms-gateway (push) Successful in 8s
Build + Deploy / build-dsms-node (push) Successful in 8s
CI / branch-name (push) Has been skipped
Build + Deploy / trigger-orca (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
Build + Deploy / build-ai-sdk (push) Failing after 36s
Build + Deploy / build-developer-portal (push) Successful in 8s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 7s
Build + Deploy / build-admin-compliance (push) Successful in 8s
Build + Deploy / build-backend-compliance (push) Successful in 8s
CI / nodejs-build (push) Successful in 3m14s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 46s
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Successful in 29s
CI / test-python-dsms-gateway (push) Successful in 30s
CI / validate-canonical-controls (push) Successful in 16s
Build + Deploy / build-dsms-gateway (push) Successful in 8s
Build + Deploy / build-dsms-node (push) Successful in 8s
CI / branch-name (push) Has been skipped
Build + Deploy / trigger-orca (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
1. EU Institution Checks (Verordnung 2018/1725): - New doc_type "eu_institution" with 9 L1 + 15 L2 checks - Both German + English patterns (EU institutions are multilingual) - Auto-detection via "2018/1725", "EDSB", "EDPS" keywords - Correct article references (Art. 15 instead of 13, Art. 5 instead of 6) 2. Banner Check Integration: - banner_runner.py maps scan results to 36 L1/L2 structured checks - BannerCheckTab shows hierarchical ChecklistView with hints - 3-phase summary (cookies/scripts before/after consent) - /scan endpoint now includes structured_checks in response 3. JS-heavy Website Fixes (dm, Zalando, HWK): - dsi_helpers.py: goto_resilient (networkidle→domcontentloaded fallback) - try_dismiss_consent_banner before text extraction - PDF redirect detection (dm.de redirects to GCS PDF) 4. Caritas False Positive Fixes: - Phone regex allows parentheses: +49 (0)761 → now matches - "Recht auf Widerspruch" (3 words) + §23 KDG → matches Art. 21 - Church authorities: "Katholisches Datenschutzzentrum" recognized Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
+45
-25
@@ -16,6 +16,7 @@ from services.consent_scanner import run_consent_test, ConsentTestResult
|
||||
from services.authenticated_scanner import run_authenticated_test, AuthTestResult
|
||||
from services.playwright_scanner import scan_website_playwright
|
||||
from services.dsi_discovery import discover_dsi_documents, DSIDiscoveryResult
|
||||
from checks.banner_runner import map_scan_to_checks
|
||||
|
||||
logging.basicConfig(level=logging.INFO, format="%(levelname)s:%(name)s: %(message)s")
|
||||
logger = logging.getLogger(__name__)
|
||||
@@ -44,6 +45,9 @@ class ScanResponse(BaseModel):
|
||||
scanned_at: str
|
||||
category_tests: list = []
|
||||
banner_checks: dict = {}
|
||||
structured_checks: list = []
|
||||
completeness_pct: int = 0
|
||||
correctness_pct: int = 0
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
@@ -57,30 +61,47 @@ async def scan_consent(req: ScanRequest):
|
||||
logger.info("Starting consent test for %s", req.url)
|
||||
result = await run_consent_test(req.url, req.timeout_per_phase)
|
||||
|
||||
# Build raw response dict for structured check mapping
|
||||
phases = {
|
||||
"before_consent": {
|
||||
"scripts": result.before_scripts,
|
||||
"cookies": result.before_cookies,
|
||||
"tracking_services": result.before_tracking,
|
||||
"violations": [v.__dict__ for v in result.before_violations],
|
||||
},
|
||||
"after_reject": {
|
||||
"scripts": result.reject_scripts,
|
||||
"cookies": result.reject_cookies,
|
||||
"new_tracking": result.reject_new_tracking,
|
||||
"violations": [v.__dict__ for v in result.reject_violations],
|
||||
},
|
||||
"after_accept": {
|
||||
"scripts": result.accept_scripts,
|
||||
"cookies": result.accept_cookies,
|
||||
"new_tracking": result.accept_new_tracking,
|
||||
"undocumented": result.accept_undocumented,
|
||||
},
|
||||
}
|
||||
banner_checks_data = {
|
||||
"has_impressum_link": result.banner_has_impressum_link,
|
||||
"has_dse_link": result.banner_has_dse_link,
|
||||
"violations": [v.__dict__ for v in result.banner_text_violations],
|
||||
}
|
||||
|
||||
# Map to L1/L2 hierarchy
|
||||
raw_for_mapping = {
|
||||
"banner_detected": result.banner_detected,
|
||||
"banner_provider": result.banner_provider,
|
||||
"phases": phases,
|
||||
"banner_checks": banner_checks_data,
|
||||
}
|
||||
check_result = map_scan_to_checks(raw_for_mapping)
|
||||
|
||||
return ScanResponse(
|
||||
url=req.url,
|
||||
banner_detected=result.banner_detected,
|
||||
banner_provider=result.banner_provider,
|
||||
phases={
|
||||
"before_consent": {
|
||||
"scripts": result.before_scripts,
|
||||
"cookies": result.before_cookies,
|
||||
"tracking_services": result.before_tracking,
|
||||
"violations": [v.__dict__ for v in result.before_violations],
|
||||
},
|
||||
"after_reject": {
|
||||
"scripts": result.reject_scripts,
|
||||
"cookies": result.reject_cookies,
|
||||
"new_tracking": result.reject_new_tracking,
|
||||
"violations": [v.__dict__ for v in result.reject_violations],
|
||||
},
|
||||
"after_accept": {
|
||||
"scripts": result.accept_scripts,
|
||||
"cookies": result.accept_cookies,
|
||||
"new_tracking": result.accept_new_tracking,
|
||||
"undocumented": result.accept_undocumented,
|
||||
},
|
||||
},
|
||||
phases=phases,
|
||||
summary={
|
||||
"critical": sum(1 for v in result.reject_violations if v.severity == "CRITICAL"),
|
||||
"high": len(result.before_violations) + sum(1 for v in result.banner_text_violations if v.severity == "HIGH"),
|
||||
@@ -90,11 +111,10 @@ async def scan_consent(req: ScanRequest):
|
||||
"categories_tested": len(result.category_tests),
|
||||
"banner_text_issues": len(result.banner_text_violations),
|
||||
},
|
||||
banner_checks={
|
||||
"has_impressum_link": result.banner_has_impressum_link,
|
||||
"has_dse_link": result.banner_has_dse_link,
|
||||
"violations": [v.__dict__ for v in result.banner_text_violations],
|
||||
},
|
||||
banner_checks=banner_checks_data,
|
||||
structured_checks=check_result["structured_checks"],
|
||||
completeness_pct=check_result["completeness_pct"],
|
||||
correctness_pct=check_result["correctness_pct"],
|
||||
scanned_at=datetime.now(timezone.utc).isoformat(),
|
||||
category_tests=[{
|
||||
"category": ct.category,
|
||||
|
||||
Reference in New Issue
Block a user