feat(compliance-check): auto-discover missing doc types from homepage
When the user leaves some doc-type rows empty, the tool now actively searches the website for them — only marks 'not found' as last resort. Flow: 1. User submits N URLs (e.g. just DSI) 2. For each canonical doc_type with no submitted URL/text, the route identifies the most-common base (scheme://netloc) from submitted URLs 3. Calls consent-tester /dsi-discovery on the homepage with max_documents=15 (180s timeout) 4. Classifies every discovered doc into a canonical doc_type via title/URL keyword rules (_DISCOVERY_RULES — covers cookie/widerruf/ social_media/agb/nutzungsbedingungen/dsb/impressum/dse) 5. Fills matching empty entries with the discovered text, marks auto_discovered=True and discovery_attempted=True Padding now differentiates: - 'Auf der Website nicht gefunden' — discovery was attempted, no doc matched. Amber badge, friendly hint to add URL manually. - 'Nicht eingereicht — Quelle nicht angegeben' — user gave NO URLs at all, nothing to crawl from. Grey badge. Email + frontend: - Status labels: NICHT GEFUNDEN (amber) vs NICHT EINGEREICHT (grey) - 'Gepruefte Quellen' table tags auto-discovered URLs with a small blue 'auto-entdeckt' badge so GF sees what tool found vs user submitted. Implementation only runs when ≥1 URL was submitted (no base to crawl from otherwise). Adds 30-90s for unsubmitted types but avoids the 'just say nicht gefunden' anti-pattern.
This commit is contained in:
@@ -167,7 +167,11 @@ export function ChecklistView({ results }: { results: DocResult[] }) {
|
||||
</div>
|
||||
</div>
|
||||
<div className="flex items-center gap-3 shrink-0 ml-3">
|
||||
{r.error && r.error.startsWith("Nicht eingereicht") ? (
|
||||
{r.error && r.error.startsWith("Auf der Website nicht gefunden") ? (
|
||||
<span className="text-xs text-amber-700 font-medium px-2 py-0.5 bg-amber-100 rounded-full whitespace-nowrap">
|
||||
Nicht gefunden
|
||||
</span>
|
||||
) : r.error && r.error.startsWith("Nicht eingereicht") ? (
|
||||
<span className="text-xs text-gray-500 font-medium px-2 py-0.5 bg-gray-100 rounded-full whitespace-nowrap">
|
||||
Nicht eingereicht
|
||||
</span>
|
||||
|
||||
Reference in New Issue
Block a user