fix: 4 bugs from IHK Konstanz scan validation
1. DSE-Matcher: Google/YouTube false match — now requires 2+ word match for provider-name fallback, not just "Google" matching YouTube section 2. AGB/Widerrufsbelehrung: only_ecommerce flag — skips for non-shop websites (detected via payment providers, cart keywords) 3. DSE-internal link following — scanner now discovers links WITHIN the privacy policy and scans those too (finds regional DSE sub-pages) 4. Expanded keyword synonyms for DSE mandatory checks: - "Zweck und Rechtsgrundlage" now matches "zwecke" - "behoerdlichen datenschutzbeauftragt" matches DSB - "aufsichtsbehörde" with umlaut matches Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -64,8 +64,21 @@ def match_service_to_dse(
|
||||
)
|
||||
|
||||
# Step 2: Search for provider name (e.g., "Google" for "Google Analytics")
|
||||
# But only if the provider name is specific enough — avoid "Google" matching YouTube
|
||||
provider = service_name.split()[0] if " " in service_name else service_name
|
||||
if len(provider) < 4 or provider.lower() in ("the", "a", "an"):
|
||||
provider = service_name # Too short/generic, use full name
|
||||
|
||||
section = find_section_by_content(sections, provider)
|
||||
# Verify: the section must actually be about THIS service, not just mention the provider
|
||||
if section and provider.lower() != service_name.lower():
|
||||
# Check if the full service name or a close variant is in the section
|
||||
content_lower = section.content.lower()
|
||||
service_words = service_name.lower().split()
|
||||
# At least 2 words of the service name must match (not just "Google")
|
||||
matching_words = sum(1 for w in service_words if w in content_lower)
|
||||
if matching_words < 2 and service_name.lower() not in content_lower:
|
||||
section = None # False match — provider name found but wrong context
|
||||
|
||||
if section:
|
||||
original = _extract_relevant_paragraph(section.content, provider)
|
||||
|
||||
Reference in New Issue
Block a user