feat(audit): Text-Paste-Mode pro Row — Crawler optional umgehen
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / nodejs-build (push) Successful in 3m27s
CI / iace-gt-coverage (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 17s
CI / loc-budget (push) Failing after 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / test-python-backend (push) Successful in 47s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped

Hintergrund: VW liefert ueber URL-Crawler nur 6 Vendors statt der 100+
die in der echten Cookie-Tabelle stehen. Wenn der User die Tabelle aber
direkt von der Site kopieren kann (was bei den meisten OEM-Sites moeglich
ist), umgehen wir den Crawler komplett und parsen den Text deterministisch.

Backend:
* doc_type_classifier.py — 7 Pattern-Gruppen (§5 TMG, Art.13 DSGVO,
  AGB-Klauseln, Widerrufs-Frist, Cookie-Tabellen-Header, etc). Wenn der
  User Text ins falsche Doc-Type-Feld kopiert (Impressum->DSE),
  detect_mismatch liefert detected + action ('reclassify' bei sehr hoher
  Konfidenz, 'warn' bei medium).
* cookies_table_parser.py — Tab/Pipe/Komma/Semicolon-Separator-Auto-
  Detection, Spalten-Mapping per Header-Keyword. Aggregiert Cookie-
  Eintraege zu Vendor-Records (mit _guess_vendor-Fallback). Voll
  deterministisch, kein LLM.
* doc_input_warnings.py — Mail-Block ueber dem Audit, der Mismatches +
  Auto-Reclassifies dem User transparent macht.
* Pipeline: text gewinnt ueber url (war schon im Schema vermerkt), neue
  Felder declared_doc_type / input_source / reclassify_hint in doc_entries.
  Pasted-Tabellen-Vendors haben Vorrang vor Library-Fallback + LLM-Cascade
  (sind 100% genau).

Frontend (DocCheckTab):
* Pro Row Mode-Toggle 'URL' / 'Text einfuegen' (lila wenn aktiv).
* Textarea (h-32, monospace) im text-mode mit kontext-spezifischem
  Placeholder (Cookie-Hinweis ggue. anderen Doc-Types) und Live-
  Zeichen-/Wort-Counter.
* Submit-Button accepted entries mit URL ODER text.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-21 18:58:32 +02:00
parent 7335f64f4f
commit e411c4f0d3
5 changed files with 670 additions and 49 deletions
@@ -10,6 +10,8 @@ interface DocEntry {
type: string
label: string
url: string
text: string // P-Paste: User kopiert Doc-Text direkt rein
mode: 'url' | 'text' // welcher Input wird aktiv genutzt
}
const DOC_TYPES = [
@@ -28,7 +30,8 @@ const DOC_TYPES = [
]
function newEntry(): DocEntry {
return { id: crypto.randomUUID().slice(0, 8), type: 'dse', label: '', url: '' }
return { id: crypto.randomUUID().slice(0, 8), type: 'dse', label: '',
url: '', text: '', mode: 'url' }
}
export function DocCheckTab() {
@@ -81,7 +84,7 @@ export function DocCheckTab() {
}
const handleSubmit = async () => {
const validEntries = entries.filter(e => e.url.trim())
const validEntries = entries.filter(e => e.url.trim() || e.text.trim())
if (validEntries.length === 0) return
setLoading(true)
@@ -96,8 +99,13 @@ export function DocCheckTab() {
body: JSON.stringify({
entries: validEntries.map(e => ({
doc_type: e.type,
label: e.label || e.url.split('/').pop() || 'Dokument',
url: e.url.trim(),
label: e.label
|| (e.url ? e.url.split('/').pop() : '')
|| `${e.type}-paste`,
url: e.mode === 'text' ? '' : e.url.trim(),
// Backend nimmt text > url. Wenn beide gefuellt sind und
// mode='url', schicken wir den text NICHT mit.
text: e.mode === 'text' ? e.text.trim() : '',
})),
check_cookie_banner: checkCookieBanner,
use_agent: useAgent,
@@ -148,41 +156,83 @@ export function DocCheckTab() {
{/* P79 Pre-Scan-Wizard — 8 Pflichtfelder */}
<PreScanWizard value={scanContext} onChange={setScanContext} />
{/* URL Entries */}
<div className="space-y-2">
{/* URL / Text Entries */}
<div className="space-y-3">
{entries.map((entry, i) => (
<div key={entry.id} className="flex items-center gap-2">
<select
value={entry.type}
onChange={e => updateEntry(entry.id, 'type', e.target.value)}
className="w-48 px-3 py-2.5 border border-gray-300 rounded-lg text-sm bg-white shrink-0"
>
{DOC_TYPES.map(t => (
<option key={t.id} value={t.id}>{t.label}</option>
))}
</select>
<input
type="text"
value={entry.label}
onChange={e => updateEntry(entry.id, 'label', e.target.value)}
placeholder={entry.type === 'other' ? 'Dokumentname' : 'Version / Stand (optional)'}
className="w-40 px-3 py-2.5 border border-gray-300 rounded-lg text-sm shrink-0"
/>
<input
type="url"
value={entry.url}
onChange={e => updateEntry(entry.id, 'url', e.target.value)}
onBlur={() => autoLabel(entry)}
placeholder="https://example.com/datenschutz"
className="flex-1 px-3 py-2.5 border border-gray-300 rounded-lg text-sm"
/>
{entries.length > 1 && (
<button onClick={() => removeEntry(entry.id)}
className="p-2 text-gray-400 hover:text-red-500 shrink-0">
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
<div key={entry.id} className="space-y-1.5">
<div className="flex items-center gap-2">
<select
value={entry.type}
onChange={e => updateEntry(entry.id, 'type', e.target.value)}
className="w-48 px-3 py-2.5 border border-gray-300 rounded-lg text-sm bg-white shrink-0"
>
{DOC_TYPES.map(t => (
<option key={t.id} value={t.id}>{t.label}</option>
))}
</select>
<input
type="text"
value={entry.label}
onChange={e => updateEntry(entry.id, 'label', e.target.value)}
placeholder={entry.type === 'other' ? 'Dokumentname' : 'Version / Stand (optional)'}
className="w-40 px-3 py-2.5 border border-gray-300 rounded-lg text-sm shrink-0"
/>
{/* Mode-Toggle URL / Text */}
<div className="inline-flex border border-gray-300 rounded-lg overflow-hidden text-xs shrink-0">
<button type="button"
onClick={() => updateEntry(entry.id, 'mode', 'url')}
className={`px-3 py-2 ${entry.mode === 'url'
? 'bg-purple-600 text-white' : 'bg-white text-gray-600 hover:bg-gray-50'}`}>
URL
</button>
<button type="button"
onClick={() => updateEntry(entry.id, 'mode', 'text')}
className={`px-3 py-2 ${entry.mode === 'text'
? 'bg-purple-600 text-white' : 'bg-white text-gray-600 hover:bg-gray-50'}`}>
Text einfügen
</button>
</div>
{entry.mode === 'url' && (
<input
type="url"
value={entry.url}
onChange={e => updateEntry(entry.id, 'url', e.target.value)}
onBlur={() => autoLabel(entry)}
placeholder="https://example.com/datenschutz"
className="flex-1 px-3 py-2.5 border border-gray-300 rounded-lg text-sm"
/>
)}
{entries.length > 1 && (
<button onClick={() => removeEntry(entry.id)}
className="p-2 text-gray-400 hover:text-red-500 shrink-0">
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
)}
</div>
{entry.mode === 'text' && (
<div className="ml-[400px]">
<textarea
value={entry.text}
onChange={e => updateEntry(entry.id, 'text', e.target.value)}
placeholder={
entry.type === 'cookie'
? 'Kopiere hier die komplette Cookie-Tabelle rein (Tab-getrennt oder mit | als Trenner — wir parsen alle Spalten deterministisch)…'
: 'Kopiere hier den vollständigen Doc-Text rein. Wir erkennen automatisch ob es zu „' + (DOC_TYPES.find(t => t.id === entry.type)?.label ?? entry.type) + '" passt.'
}
className="w-full h-32 px-3 py-2 border border-gray-300 rounded-lg text-xs font-mono resize-y"
/>
<div className="text-[10px] text-gray-500 mt-1">
{entry.text.trim().length > 0
? `${entry.text.trim().length.toLocaleString('de-DE')} Zeichen · ${entry.text.trim().split(/\s+/).length.toLocaleString('de-DE')} Wörter`
: 'Der Crawler wird übersprungen — die Analyse läuft direkt auf dem eingefügten Text.'}
</div>
</div>
)}
</div>
))}
@@ -225,7 +275,9 @@ export function DocCheckTab() {
{/* Submit */}
<button
onClick={handleSubmit}
disabled={loading || entries.every(e => !e.url.trim()) || !contextReady}
disabled={loading
|| entries.every(e => !e.url.trim() && !e.text.trim())
|| !contextReady}
className="w-full px-4 py-3 bg-purple-600 text-white rounded-lg font-medium hover:bg-purple-700 disabled:opacity-50 transition-colors text-sm flex items-center justify-center gap-2"
title={!contextReady ? 'Bitte zuerst die 8 Pflichtfelder ausfüllen' : undefined}
>