feat(compliance-check): auto-discover missing doc types from homepage
When the user leaves some doc-type rows empty, the tool now actively searches the website for them — only marks 'not found' as last resort. Flow: 1. User submits N URLs (e.g. just DSI) 2. For each canonical doc_type with no submitted URL/text, the route identifies the most-common base (scheme://netloc) from submitted URLs 3. Calls consent-tester /dsi-discovery on the homepage with max_documents=15 (180s timeout) 4. Classifies every discovered doc into a canonical doc_type via title/URL keyword rules (_DISCOVERY_RULES — covers cookie/widerruf/ social_media/agb/nutzungsbedingungen/dsb/impressum/dse) 5. Fills matching empty entries with the discovered text, marks auto_discovered=True and discovery_attempted=True Padding now differentiates: - 'Auf der Website nicht gefunden' — discovery was attempted, no doc matched. Amber badge, friendly hint to add URL manually. - 'Nicht eingereicht — Quelle nicht angegeben' — user gave NO URLs at all, nothing to crawl from. Grey badge. Email + frontend: - Status labels: NICHT GEFUNDEN (amber) vs NICHT EINGEREICHT (grey) - 'Gepruefte Quellen' table tags auto-discovered URLs with a small blue 'auto-entdeckt' badge so GF sees what tool found vs user submitted. Implementation only runs when ≥1 URL was submitted (no base to crawl from otherwise). Adds 30-90s for unsubmitted types but avoids the 'just say nicht gefunden' anti-pattern.
This commit is contained in:
@@ -184,9 +184,14 @@ def _render_document(html: list[str], r: DocCheckResult) -> None:
|
||||
cpct = r.correctness_pct
|
||||
bar_color = "green" if pct >= 80 else "yellow" if pct >= 50 else "red"
|
||||
status_label = "OK" if pct == 100 else "LUECKENHAFT" if pct >= 50 else "MANGELHAFT"
|
||||
is_missing = bool(r.error) and r.error.startswith("Nicht eingereicht")
|
||||
is_missing = bool(r.error) and (
|
||||
r.error.startswith("Nicht eingereicht")
|
||||
or r.error.startswith("Auf der Website nicht gefunden")
|
||||
)
|
||||
if is_missing:
|
||||
status_label = "NICHT EINGEREICHT"
|
||||
status_label = ("NICHT GEFUNDEN"
|
||||
if r.error.startswith("Auf der Website")
|
||||
else "NICHT EINGEREICHT")
|
||||
elif r.error:
|
||||
status_label = "FEHLER"
|
||||
|
||||
@@ -220,13 +225,19 @@ def _render_document(html: list[str], r: DocCheckResult) -> None:
|
||||
|
||||
# Body
|
||||
if is_missing:
|
||||
body_msg = (
|
||||
"Wir haben die Hauptseite durchsucht, aber kein Dokument fuer "
|
||||
"diese Pflichtangabe gefunden. Pruefen Sie, ob es auf der "
|
||||
"Website existiert und tragen Sie die URL manuell nach."
|
||||
if r.error.startswith("Auf der Website")
|
||||
else "Keine URL oder Text fuer dieses Dokument angegeben. "
|
||||
"Tragen Sie die Quelle im Compliance-Check Formular nach, "
|
||||
"um diese Pflichtangabe zu pruefen."
|
||||
)
|
||||
html.append(
|
||||
'<div style="padding:12px 16px;color:#6b7280;font-size:12px;'
|
||||
'background:#fafafa;border-top:1px solid #f3f4f6">'
|
||||
'Keine URL oder Text fuer dieses Dokument angegeben. '
|
||||
'Tragen Sie die Quelle im Compliance-Check Formular nach, '
|
||||
'um diese Pflichtangabe zu pruefen.'
|
||||
'</div>'
|
||||
+ body_msg + '</div>'
|
||||
)
|
||||
elif r.error:
|
||||
html.append(f'<div style="padding:12px 16px;color:#991b1b">{r.error}</div>')
|
||||
|
||||
Reference in New Issue
Block a user