last-build/main
509 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
99901bba0a |
fix(cookie): Präfix-Matcher über-matcht kurze generische Basen nicht mehr
CI / detect-changes (push) Successful in 15s
CI / guardrail-integrity (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 10s
CI / validate-canonical-controls (push) Successful in 17s
CI / loc-budget (push) Successful in 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Successful in 59s
CI / iace-gt-coverage (push) Successful in 28s
CI / test-python-backend (push) Successful in 37s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Die Deklaration-vs-Bibliothek-Sicht deckte sofort einen Fehl-Match auf: 'cct_chatSessionToken' (Genesys-Webchat) traf die Library-Basis 'cct' (actual_category Marketing, purpose 'shopping cart') → falsches 'necessary→Marketing'-Finding. Ursache: gekürzte 3-Zeichen-Basis ohne führenden _. _is_distinctive_base: gekürzte Präfix-Basis nur akzeptieren bei ≥4 Zeichen ODER führendem '_' (kanonische Cookies wie '_ga'). GTM-/AdobeOrg-/Hash- Suffix-Stripping bleibt erhalten (Tests grün), generische 'cct'/'sid'/'gtm' über-matchen nicht mehr. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
403e3c66d2 |
feat(cookie): Deklaration-vs-Bibliothek-Diff-Sicht + Funnel-KPI
Für die Library-getroffene Teilmesse (~32%) pro Cookie die Feld- Abweichungen deklariert→Library (Kategorie/Laufzeit/Zweck) als Diff-Karte, plus ehrlicher Funnel (gesamt → geprüft → abweichend) — nicht-getroffene Cookies sind nicht prüfbar (kein Pass/Fail), passend zur Tonalität. - analyze_cookies: 'expected'-Soll-Wert an tracker_as_necessary/ excessive_lifetime/missing_purpose (+ _CAT_LABEL_DE). - neues cookie_declaration_diff.build_declaration_diff: reine Regroup- Aggregation der Findings pro Cookie (single source = analyze_cookies), Hinweis-Typen (third_country/eu_alternative) bewusst ausgeschlossen. - cookie-check exponiert out['declaration_diff']. - CookieDeclarationDiff.tsx oben im Cookie-Tab (vor Panel/ResultView). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
97e39579d5 |
feat(cookie+routing): Storage-Typ-Filter + legal_notice capture-only
#3 Storage-Filter: cookie-check exponiert per-Cookie-Speichertyp (storage_inventory.per_cookie); CookieResultView bekommt Filter-Chips (Cookie/Local Storage/Framework …) + eine Speicher-Spalte, Anbieter ohne passenden Treffer werden ausgeblendet, KPI zeigt gefilterte Zahl. A-Routing: legal_notice ist jetzt ein kanonischer Doc-Type. Eigene Discovery-Regel (legal-disclaimer/rechtlicher-hinweis) VOR impressum → die Disclaimer-Seite wird nicht mehr als Impressum substituiert (Ursache, dass die Cross-Doc-Reconciliation nie zündete). capture-only: als doc_entry für B persistiert, aber nicht einzeln gescort (keine 0%-Noise, da ohne eigene Checkliste). Im Scan-Form als Option auswählbar. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
0f6cdc93fd |
fix(snapshot): Cookie-Dedup + schneller Impressum-Tab + Tabellen-Zahl
- Cookies werden je Vendor nach Name dedupliziert (Consent-Phasen-Dubletten; BMW 2196 → ~772) — in cookie-check + get_snapshot, behebt aufgeblähte Kachel-/Finding-Zahlen. - Impressum-Snapshot-Check überspringt den ~40s-LLM-Schritt (context skip_llm) → Tab lädt sofort statt leer zu bleiben. - Vendor-Tabelle zeigt nur die Cookie-Zahl (kein 'Cookies'-Wort je Zeile). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
ba7d98be36 |
feat(reconcile): B — Cross-Doc-Reconciliation (Pflicht in anderem Doc erfüllt)
Ein 'X fehlt'/'zu prüfen'-Finding wird unterdrückt, wenn die Pflicht in einem ANDEREN Snapshot-Dokument erfüllt ist (z.B. § 36 VSBG / OS-Link stehen bei BMW in AGB/'Rechtlicher Hinweis', nicht im Impressum → war False Positive). Konservative Allowlist (impressum: verbraucher_streitbeilegung, odr_link) gegen False-Reconciliation. Verdrahtet in _run_doc_agent (alle Doc-Checks). Frontend: 'In anderem Dokument abgedeckt'-Sektion. Greift voll nach Scan + Legal-Capture. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
9dfdaae8e4 |
feat(cookie): präfix-bewusster Library-Match (Runtime-Suffixe)
load_big_library matchte nur EXAKT → nur ~27% der BMW-Cookies trafen die Open-Cookie-DB, weil Per-Instanz-Suffixe abweichen (_ga_GTM-XYZ, AMCVS_###@ AdobeOrg, _pk_id.5.7d8). Jetzt: Library einmal laden, Namen entwildcarden, über _candidate_keys (exact + Präfix an Trennzeichen, Mindestlänge 3 gegen Über-Match) matchen. Reuse der bewährten _strip_wildcards-Logik. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
4fb476e4be |
fix(agb): Gewährleistung erkennt 'bei Mängeln' / '§ N Mängel'-Heading
BMW-AGB nutzt '§ 9 Mängel' + 'Rechte und Ansprüche bei Mängeln' statt 'Gewährleistung' — Pattern ergänzt (False Negative auf Realdaten behoben). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
7258744107 |
refactor+feat: Snapshot-Router-Split + generischer ChecklistAgent + AGB-Modul
- Item 2: Snapshot-Doc-Checks (cookie/impressum/dse/agb) in snapshot_check_routes.py (agent_compliance_check_routes.py 464→365 Z.); gleiche Pfade, in main.py registriert. - ChecklistAgent-Basis: DSE-Logik generalisiert (L1/L2, kurze Titel, _severity_ override-Hook). DSEAgent + AGBAgent sind jetzt Thin-Subclasses → künftige Doc-Agenten (widerruf/avv/…) trivial. - Item 4: AGBAgent (§§ 305 ff. BGB, AGB_CHECKLIST) + agb-check + AGB-Tab via AgentModuleTab. Kein Library-Firehose. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
3c6deac1c5 |
fix(dse+linter): Drittland-Applicability, kein na-Detail, kurze Titel, Linter-Wortgrenzen
- Linter: FORBIDDEN_OUTPUT_TERMS per Wortgrenze → 'Schutzgarantien'/'geeignete Garantien' (Art. 46) passieren, 'garantiert'-Claims bleiben geblockt. - DSE: L2-Detail wird übersprungen statt 'na', wenn die L1-Pflichtangabe fehlt (kein irreführendes 'nicht anwendbar' für z.B. Transfermechanismus). - DSE: Drittland → HIGH bei dokumentiertem Drittlandtransfer (scan_context via AgentInput.context) — BMW (Konzern, US-Provider) ist kein weiches MEDIUM. - DSE: Titel/Maßnahme kurz (treibt den Recommendation-Titel); ausführliche Begründung als evidence — behebt 120-Zeichen-abgeschnittene Überschriften. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
76be96556d |
feat(dse): kuratierter DSEAgent + Snapshot-Tab (Art. 13/14, kein Firehose)
DSEAgent wrappt die existierende ART13_CHECKLIST (33 kuratierte Pflichtangaben
L1 + Detailchecks L2) → strukturierter AgentOutput, NICHT der 90k-Library-
Firehose (eCall/Gesundheit/Telekom-Lärm). GET /snapshots/{id}/dse-check spiegelt
impressum-check; doc_input_from_snapshot generalisiert. Frontend: generischer
AgentModuleTab (lazy → AgentResultTab) für Impressum + DSE; DSE-Tab in der
Snapshot-Seite. Plus HRB-Pattern \d→\d+ (volle Registernummer als Beleg).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
be93859645 |
fix(impressum): Pflichtangaben-Beleg = exakter Treffer statt Textpassage
_match_value gibt genau den gematchten Bereich zurück (nur die E-Mail unter Email, nur die USt-IdNr, nur die Telefonnummer) — nicht mehr ein Fenster/den umgebenden Satz. Behebt die Wiederholung desselben Anfangssatzes bei Texten ohne Zeilenumbrüche (BMW = ein Block). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
877d540ce1 |
fix(cookie+impressum): Drittland-FP, Impressum-Beleg, neuer Opt-Out-Finding
- Drittland: unbekannte Herkunft ('N/A') + Self-Hosting feuern nicht mehr —
First-Party-Session-Cookies (PHPSESSID/JSESSIONID) waren False Positives.
- Impressum _line_of: enges Fenster um den Treffer bei Texten ohne Umbrüche
(BMW = ein Block) → jede Pflichtangabe zeigt IHREN Beleg statt denselben Satz.
- Neuer Finding-Typ missing_opt_out: einwilligungspflichtiger Anbieter mit
Cookies ohne Opt-Out-/Widerspruchs-Link (Art. 7 Abs. 3 + Art. 21).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
5b36b3f367 |
feat(impressum): Snapshot-Modul-Tab — ImpressumAgent auf gespeichertem Text
Snapshot-Detailseite wird zu Modul-Tabs (Cookies & Tracking | Impressum).
Backend GET /snapshots/{id}/impressum-check laeuft den v3 ImpressumAgent auf
dem gespeicherten Impressum-Text (kein Re-Crawl); Input-Erzeugung in
impressum_input_from_snapshot() ausgelagert (pure + getestet: Text/Scope/
company_name-Fallback/None-Pfad). Frontend laedt lazy beim Tab-Wechsel und
rendert mit dem bestehenden AgentResultTab (keine zweite Engine).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
39cb6afc23 |
feat(cookie): Findings bearbeitbar — gruppiert nach Typ + Matrix + Hinweise-Split
CookieFindings: Umschalter [Nach Fehlertyp] (je Typ: Maßnahme + betroffene Cookies + Ticket-Text) ↔ [Matrix] (Cookie×Typ, ✗ Handlung / ⚠ Hinweis). Trennung FINDINGS (zu beheben) vs HINWEISE (neutral, gegen DSE zu prüfen). Backend: kind-Klassifikation (third_country/eu_alternative=hinweis); Drittland- Remediation neutral formuliert (pro Verarbeiter prüfen, keine 'in DSE benennen'- Befehle, da DSE-Abdeckung wie BMWs 'in der Regel SCC' oft unzureichend). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
b0115cb10b |
feat(cookie): 2. Sicht Banner-Kategorie + Fehl-Einsortierung
CookieResultView bekommt einen Umschalter [Rechtliche Rolle] ↔ [Banner-Kategorie] (Notwendig/Funktional/Statistik/Marketing). In beiden Sichten zeigt jede Cookie-Zeile '→ sollte: Marketing', wenn die tatsächliche Kategorie laut Library von der deklarierten abweicht (rot bei Tracker als notwendig, § 25 TDDDG). Neue KPI 'Falsch einsortiert'. Backend liefert dazu cookie_categories (name→actual_category) aus big_lib im cookie-check-Output; Seite lädt cookie-check einmal und reicht es an beide Komponenten. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
7fa9968ce1 |
feat(cookie): missing_retention — Vendor ohne Speicherdauer/Löschfrist
Vendor-Ebenen-Finding: greift, wenn ein Vendor eine Verarbeitung deklariert (Kategorie/Zweck), aber KEINE Cookies gelistet sind UND keine persistence angegeben ist (z.B. Nayoki GmbH — 'necessary' Auftragsverarbeiter ohne Löschfrist). Die Pro-Cookie-Schleife sah solche Vendors nie (0 Cookies → 0 Findings). Remediation = Ticket-Text 'bitte Löschfrist festlegen'. Art. 5 Abs. 1 lit. e + Art. 13 Abs. 2 lit. a → Control AUTH-2051-A03. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
05a1795ea8 |
feat(cookie): ② Documentation Drift — Richtlinie vs. Browser-Realität
Cookie-Check-Endpoint liefert jetzt out["drift"] (audit_cookie_compliance): deklariert (Cookie-Richtlinie-Text) vs. tatsaechlich geladen (Browser). Frontend zeigt den Reality-Check-Strip oben im Panel: X dokumentiert · Y geladen · Z undokumentiert. Pinnt den Vertrag mit test_cookie_drift.py (undokumentiert-geladen + beide Drift-Richtungen) + Vitest Drift-Strip. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
289988d23e |
feat(cookie): ① Storage Inventory + storage_transparency-Finding
Trennt echte Cookies von anderem Endgeraete-Speicher (Local/Session Storage,
IndexedDB, Salesforce-Framework-Artefakte) — § 25 TDDDG ist technologieneutral.
- cookie_storage_inventory: detect_storage_type (Name-Muster ComponentDefStorage/
__MUTEX/LSKey + Laufzeit-Text) + build_storage_inventory + storage_transparency-
Summenbefund ('X als Cookie gelistet -> Y echte + Z andere').
- Endpoint cookie-check liefert storage_inventory; Frontend zeigt den Breakdown.
Tests: 4 + Frontend-Vitest gruen. Differenzierungsmerkmal: '740 -> 132 + 608'.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
901de1ca97 |
feat(cookie): A — Findings auditfest an Controls verdrahten
Jeder Cookie-Befund traegt jetzt ein strukturiertes control-Feld (control_id aus doc_check_controls + regulation + article) statt nur hardcodeter Strings: vague_duration->AUTH-2051-A03 (Art.5(1)e+13), tracker_as_necessary->DATA-2851-A05 (§25 TDDDG), third_country-> DATA-1624-A04 (Art.44). Kette Regulation->Article->Control->Finding. Frontend zeigt die Rechtsgrundlage je Befund. (Controls tragen regulation/article noch NULL -> hier mitgeliefert bis gepflegt.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
4c45f11e43 |
feat(cookie): Finding 'vague_duration' — unkonkrete Speicherdauer
Flaggt Laufzeit-Angaben ohne konkrete Dauer/Kriterium ('dauerhaft', 'bis zur
Loeschung', 'bis Nutzer deaktiviert', 'unbegrenzt' …) — Art. 5(1)(e) + Art. 13
DSGVO. Library-unabhaengig, gilt fuer ALLE Cookies (Coverage auf BMWs 780).
'13 Monate'/'Session'/'bis Widerruf, max. X' bleiben ok.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
d18ef79f18 |
feat(cookie): Pro-Cookie-Library-Abgleich (2287er OCD + 35er rich) + Panel
- analyze_cookies gleicht Cookies gegen BEIDE Libraries ab: compliance.cookie_library
(2287, OCD/CC0 — Kategorie/Retention) + 35er rich-DB (technical_necessity/reid/
schrems/eu_alternative). 5 Befund-Typen: tracker_as_necessary, missing_purpose,
excessive_lifetime (Art.5), third_country (Art.44), eu_alternative (kommerziell).
- Endpoint GET /snapshots/{id}/cookie-check (load_big_library batch + analyze).
- Frontend CookieLibraryPanel im Snapshot-Detail.
- Fix CookieResultView: Zweck nicht mehr auf 60 Zeichen gekuerzt; Rolle 'unknown'
als Strich statt 'Unbekannt'.
Tests: 7 backend + frontend vitest gruen.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
3f23a64d5f |
feat(agent): Impressum-Tab auf Haupt-Engine + Profil/§36-Fixes
Ergebnis-Tab rendert jetzt result.results (Haupt-Doc-Check) statt des abweichenden v3-Agenten — BMW korrekt statt False Positives: - DocResultView: ein Dokument als Pflichtangaben-Tabelle (Label + gefundener Text + 3-Tier-Status), KEINE MC-IDs. ComplianceResultTabs speist Tabs aus result.results; ChecklistView-Bausteine exportiert + wiederverwendet. - profile_extractor: Firmenname/Rechtsform = fruehester Treffer + ausge- schriebene Formen (Aktiengesellschaft) -> BMW AG statt "juris GmbH". - 36 VSBG (MC-010): reines b2c -> POSSIBLY_APPLICABLE (Pruef-Hinweis) statt MEDIUM-FAIL; hart nur bei ecommerce. possibly_hint pro MC. - McCoverage traegt label + found (Snippet); mc_possibly-Aggregat. - AgentFindingCard/Methodik: interne check_id/mc_id nicht mehr angezeigt. Tests: test_four_status (16) + Frontend-Vitest gruen; CI-Suite 206, v3/GT unveraendert. Nur eigene Dateien (geteilter Tree). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
97575cc9c0 |
feat(agent): 4-Status-Modell (NOT_APPLICABLE/INSUFFICIENT_EVIDENCE/POSSIBLY_APPLICABLE) für Impressum
Kanonisches Compliance-Datenmodell, Impressum-Agent als Referenz: - CheckStatus-Enum + Finding.status GETRENNT von severity (Verdikt ≠ Risiko) - Unbestimmte Rechtsform (weder Text noch Wizard) → INSUFFICIENT_EVIDENCE (INFO) statt hartem HIGH-FAIL; legal_form_dependent-Gate + detect_legal_form_present - §18-MStV-Graubereich (Corporate-Blog via has_editorial_content) → POSSIBLY_APPLICABLE (LOW Prüf-Hinweis); 3-stufig via scope_disposition - Recommendations nur aus echten FAILs; mc_insufficient/mc_possibly-Aggregate - Frontend: Verdikt-Pill + Coverage-Vokabular - 19 neue Tests (test_four_status.py, AgentFindingCard); CI-Suite 204 grün, v3 25 / GT 13 unverändert Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
b7a7e70731 |
feat(agent): Impressum Rechtsform-Gates + USt-optional (Phase 3)
Die 8 Audit-Klassifizierungs-Felder (scan_context) treiben jetzt den business_scope der Agenten (vorher gespeichert, aber nicht genutzt). Rechtsform-Gates als opt-out (excludes_scope): Verein -> kein Handelsregister-Finding, e.K. -> kein Vertretungsberechtigte-Finding; unbekannte Rechtsform bleibt anwendbar. USt-IdNr optional -> fehlt = kein Finding. Rechts-Zuordnung vom Domain-Experten bestaetigt. - _classification.py: scan_context_to_scope (8 Felder -> scope-Tokens) - mcs.py: MC.excludes_scope + MC.optional; IMP-MC-004/006 Gate-Tokens; IMP-MC-005 optional; scope_matches respektiert excludes_scope - agent.py: optional -> kein Finding bei Abwesenheit - _agent_outputs.py: scope = scan_context vereinigt LLM-Profil-Fallback - Tests gruen: v3 25, Groundtruth 13, CI-Pfad 14 (+ SSE-Loop-Fix) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
65de90114a |
feat(agent): SSE — progressive Themen-Tabs (Phase 2)
Der Compliance-Check streamt jetzt progressive Events; der Impressum-Tab
erscheint, sobald das Thema fertig ist, statt am Ende alles auf einmal.
Additiv — das Polling fürs finale Ergebnis bleibt.
- backend: _sse.py (Queue/emit/event_generator) + Endpoint
/compliance-check/{id}/stream; _update emittiert progress,
run_agent_outputs emittiert topic (laeuft jetzt frueh nach Phase B),
Orchestrator emittiert complete/error.
- frontend: SSE-Proxy-Route + EventSource in ComplianceCheckTab merged
topic-Events in agent_outputs -> Tab erscheint progressiv.
- Tests: backend 5 passed (SSE + agent_outputs); tsc 0 neue Fehler,
vitest 2 passed, check-loc 0.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
e21984e0ad |
feat(agent): strukturierte Ergebnis-Tabs — Impressum (Phase 1)
Der Compliance-Check legt zusätzlich einen strukturierten v3-AgentOutput pro Thema in result.agent_outputs ab (additiv; B18-HTML + Firehose-Mail bleiben unangetastet). Frontend: standardisiertes Ergebnis-Tab statt Firehose — Impressum-Tab (AgentResultTab) + "Alle Checks (roh)" (ChecklistView). - backend: _agent_outputs.py ruft den registrierten v3-ImpressumAgent, gewired in _orchestrator nach B18, surfaced via _phase_f_persist. - frontend: AgentResultView (aus AgentSlotCard extrahiert, DRY), AgentResultTab, ComplianceResultTabs; ComplianceCheckTab 490->391 Zeilen. - Tests: backend 2 passed, frontend 2 passed; tsc 0 neue Fehler; check-loc 0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
bb6139df3e |
MC mapping: defensive route + MinIO overridable + iace migration 151 (#27)
CI / detect-changes (push) Successful in 18s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 8s
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m25s
CI / test-go (push) Failing after 41s
CI / iace-gt-coverage (push) Successful in 26s
CI / test-python-backend (push) Successful in 35s
CI / test-python-document-crawler (push) Successful in 23s
CI / test-python-dsms-gateway (push) Successful in 21s
MC mapping deploy: defensive route + MinIO overridable + Migration 151 + loc-exception [migration-approved] [guardrail-change] |
||
|
|
372e1fe9e9 |
Use-Case-Mapping-Filter für Master Controls + Mapper-Präzisionsfix
CI / detect-changes (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 7s
CI / validate-canonical-controls (push) Successful in 13s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m23s
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 34s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Phase 2: Live-Filter an /sdk/master-controls (Use Case, Quell-Regulierung, Verifikations-Methode, Coverage, Primärzweck-Toggle, category via Member-EXISTS). API mit EXISTS-Filtern + gecachten Meta-Counts in master-controls/route.ts. Phase A: neue UseCase telekommunikation + Fix der Impressum-Fehlrouten im Register (TKG/AT-TKG->telekommunikation, telemedien->dse, GewO->handelsrecht); echte Impressum-Quellen (TMG/Mediengesetz) bleiben impressum. Deterministischer Seed aus source_regulation; Tests grün. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
ef746ea8f0 |
fix(use-cases): Verifikations-Methode aus Primaer-Use-Case ableiten (Fallback)
CI / detect-changes (push) Successful in 6s
CI / branch-name (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / nodejs-build (push) Has been skipped
Member-canonical_controls tragen meist kein evidence_type/verification_method (wie schon source_citation). primary_verification_method() leitet die Methode deterministisch aus dem Primaer-Use-Case ab (impressum->document, code_security->source_code, ...). Populiert mc_verification beim naechsten Seed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
6ca4dcde3e |
feat(use-cases): deterministisches source_regulation-Mapping + Primaerzweck [migration-approved]
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 31s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Use-Case-Zuordnung jetzt DETERMINISTISCH aus der Quell-Regulierung (statt LLM/scope-category): control_parent_links.source_regulation (79% der 13.588 MCs) -> Keyword-Mapper -> ~30 Domaenen-Use-Cases. 117/117 Regulierungen gemappt (dse 44 Leitlinien, code_security 10, network_security 9, ...). - use_case_registry.py: 37 Use Cases (Doku + Security + Produkt/Sektor: cra/ai_act/mica/mdr/maschinen/batterie/ehds/dsa/dma/psd2/aml/lksg/...) + use_case_for_regulation() Keyword-Mapper (117 Regulierungen abgedeckt). - migration 150: is_primary auf mc_use_case_mappings + neue mc_regulations (MC->source_regulation, n:m, is_primary) als feine Filter-Dimension. - classify_mc_use_cases.py: source_regulation-getriebener Seed; Primaerzweck = dominante Regulierung, Mehrfachzwecke = weitere. PYTHONPATH-Bootstrap. - 18 Registry-Tests gruen (Mapper-Abdeckung + Konsistenz-Invariante). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
dca7740d8c |
feat(use-cases): Fundament — Use-Case-Register + n:m-Mapping-Migration + Seed [migration-approved]
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Layer 1+2 (Fundament) des Use-Case-Mapping-Systems (Plan genehmigt): - compliance/data/use_case_registry.py: Single Source of Truth fuer 14 Use Cases x Verifikations-Methoden (Doku/Source-Code/Netzwerk/IT-Prozess). Erweiterbar (neuer UC = 1 Eintrag). code_security/network_security als Uebergabe-Punkte fuers Security-Team (SBOM/SAST/DAST/Pentest). - migrations/149_mc_use_case_mappings.sql: add-only n:m mc_use_case_mappings + mc_verification (1/MC) + sync_state. use_case ohne SQL-CHECK (erweiterbar). - scripts/classify_mc_use_cases.py: Seed-Stufe (deterministisch, kein LLM). LLM-Stufe (Phase 3) folgt. - Tests: test_use_case_registry.py (14 gruen). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
bc78ddd3e5 |
fix(impressum): Findings aus 12 §5-TMG-Pattern-MCs statt verunreinigtem DB-Set
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 5s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Der Agent lieferte "alles gruen": _load_controls gab auf macmini nur 3 von 75 doc_type='impressum'-MCs zurueck (Sidecar mc_classification.db hat nur 4/75 als text-matchbar klassifiziert). Tiefere Ursache: die 75 doc_type='impressum'-MCs sind fehl-klassifiziert (60/75 canonical_scope='other'; Prefixes TRD/SEC/GOV = Geschaeftsbriefe/Marktplatz/Bestellung, NICHT §5 TMG Website-Impressum). Fix: Der Impressum-Agent erzeugt Findings jetzt aus seinen 12 autoritativen §5-TMG/DDG-Pattern-MCs (mcs.py) statt aus dem verunreinigten DB-Set — deterministisch, scope-aware, field_id = semantisches Feld. Semantic-Validator- Demote + Massnahmen + Rollup bleiben. Die 5-Impressum-GT-Tests laufen jetzt echt durch: 0 Falsch-Positive. DB-Master-Controls fuer Impressum deaktiviert bis zum MC-Re-Filtering (separate Aufgabe: die doc_type-Klassifizierung der Vorgaenger-Session muss bereinigt werden). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
08c08fcba2 |
feat(crawl): Vollstaendigkeit — Shadow-DOM/versteckte Links + Interaktions-Fixpunkt + Wayback-CDX-Orphans
CI / test-python-backend (push) Successful in 30s
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Damit die Specialist-Agents auf vollstaendigem Website-Content arbeiten:
A — _find_dsi_links pierct jetzt Shadow-DOM (Web-Components wie Usercentrics/
Mercedes) rekursiv; versteckte (display:none) Links werden erfasst + als
Coverage-Metadatum geflaggt.
B — _expand_to_fixpoint klappt Akkordeons/Tabs/Hover-Menues in einer Schleife
auf, bis das DOM stabil ist (statt 1 Pass); erweiterte Selektoren;
Coverage-Telemetrie (Runden, expandierte Elemente, DOM-Wachstum, Shadow-/
versteckte Links) → Response + Backend-Log.
C — legacy_url_cdx.cdx_enumerate listet via Wayback-CDX-API ALLE je
archivierten URLs der Domain → findet Orphan-/Legacy-Seiten, die nie im
Slug-Raster standen (z.B. nicht mehr verlinktes /datenschutz, per Direkt-
URL noch erreichbar). Fliesst durch das bestehende Legacy-URL-Inventar.
Tests: test_legacy_url_cdx.py (6) + consent-tester/tests/test_dsi_discovery.py
(Pure-Helper + Real-Browser-Integration). Alle gruen, LOC-Gate gruen.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
||
|
|
389e6de0c7 |
fix(agents): Impressum+Cookie delegieren MC-Laden ans Main Tool — Scope-Filter + Maßnahmen
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
Regression: Der v3-Agent-Pfad baute eine parallele MC-Pipeline (_load_impressum_mcs / _load_cookie_mcs, Roh-SELECT) und lief damit an allen Schutzmechanismen der Engine vorbei → GOV/Branchen-MCs als HIGH bei OEM/Zulieferer, fremde MCs (Bestellbestätigung), und action=check_question (Fragen statt Maßnahmen im Frontend). - Agent delegiert MC-Laden an rag_document_checker._load_controls (P72-Scope, check_type='text', fits_doc_type/scope_requires). - Subtraktives Sektor-Gate (SECTOR_PREFIXES) + Themen-Gate am Agent-Rand. - action = konkrete Maßnahme (Imperativ) statt check_question. - rag_document_checker: from __future__ import annotations (3.9-Import). - mcs: Name-Pattern erkennt "Aktiengesellschaft" (OEM-Impressums). - Tote GT-/Semantic-/Routes-Tests wiederbelebt (v3-Mismatch + agent.cascade-Patch-Target). Alle 72 Specialist-Tests grün. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
bd4882e143 |
feat(agents): Sprint 1.12 Phase 2 — Cookie-Policy v3 + ImpressumAgent v3 finetune
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / sbom-scan (push) Has been skipped
ImpressumAgent v3 (Refactor):
- v3_engine: laedt direkt alle 75 doc_check_controls['impressum'] ohne
Sidecar-Filter (Sidecar war zu streng, lieferte nur 3 von 75 MCs).
- Layer 0 Boost prueft pass+fail_criteria gegen meine 12 Patterns mit
erweiterten Initial-Seeds (User-Vorgabe 2026-06-09:
manuelle Initial-Seeds OK, Auto-Learning erweitert zur Laufzeit).
- ETO-Smoke: 75 DB-MCs · 7 Pattern-Boosts · 24 Boost-Overrides
(versus 3 DB-MCs vorher).
CookiePolicyAgent v3 (Refactor):
- cookie_policy/v3_engine.py + cookie_policy/regex_boost.py
- Laedt direkt alle 381 Cookie-MCs aus doc_check_controls
- Layer 0 mit 12 eigenen Patterns als Initial-Seed
- KB-Layer (CMP-Vendor-Cross-Check) bleibt erhalten
- agent_version='3.0'
Tests: 27/27 gruen (12 v3-impressum, 6 cookie-policy, 9 cross-placement).
Alte v2-cookie-tests umgeschrieben auf v3-Pipeline-Mock.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d3ac33d53a |
feat(impressum): v3 — Layer-Architektur auf doc_check_controls (75 DB-MCs)
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 31s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Sprint 1.12 Phase 1 (User-Vorgabe 2026-06-09):
Statt eigener 12 hartgepatchter Patterns nutzt der Impressum-Agent jetzt
die 75 echten Master-Controls aus compliance.doc_check_controls. Pipeline:
Layer 0 — Regex-Boost (meine 12 Patterns aus mcs.py / regex_boost.py)
→ wenn Pattern hits, MC wird zu PASS überschrieben
Layer 1 — Keyword-Match aus pass_criteria der 75 DB-MCs
(rag_document_checker.check_document_with_controls)
Layer 2 — BGE-M3 Embedding-Match (in rag_document_checker integriert)
Layer 3 — Semantic-Validator (LLM) für übriggebliebene HIGH/MEDIUM
+ Auto-Learning-Pattern-Library
Output-Layer bleibt unverändert: Disclaimer-Linter + Rollup-Dedup +
Methodik-First-UI.
Neue Dateien:
- impressum/v3_engine.py — Pipeline-Orchestrator
- impressum/regex_boost.py — meine 12 Patterns + Boost-Mapping
Refactored:
- impressum/agent.py — komplett umgeschrieben, agent_version=3.0
255 LOC (unter 500-Cap)
Tests: test_impressum_v3.py mit 10 neuen Tests, alle gruen. Mockt
run_v3_pipeline für offline-Lauf. Bestaetigt:
- Layer-0 erkennt Tesla-typische Felder
- Boost matched DB-MC nur bei ≥2 Keyword-Treffern in pass_criteria
- 12 Pattern-Boost-Slots + N DB-MCs in coverage
- Notes enthalten Telemetrie (v3-pipeline, Boost-Overrides)
Telemetrie wird in AgentOutput.notes ausgegeben, damit Frontend
sehen kann: 75 DB-MCs geprueft · 5 Pattern-Boosts · 3 Boost-Overrides.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
18e4f98201 |
fix(agents): klarere Naming + korrektes LLM-Default-Modell
CI / detect-changes (push) Successful in 6s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / nodejs-build (push) Successful in 2m20s
CI / test-go (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
User-Korrektur 2026-06-09:
(1) Begriff 'MC' steht im Projekt fuer Master-Control aus
canonical_controls (314k Eintraege, ~1.800 fuer dieses Tool). Mein
neuer Agent-Code hatte 'MC' als Abkuerzung fuer 'Machine-Check'
verwendet — Naming-Konflikt. Frontend-Methodik-Box jetzt:
- 'Pattern-Check' statt 'Machine-Check'
- Explizit: 'Diese Pattern-IDs (IMP-MC-001) sind interne Test-IDs,
NICHT die Master-Control-IDs aus der canonical_controls-DB'
- Roadmap-Hinweis: formale Verknuepfung Pattern→Master-Control folgt
Backend-Variablen mc_id bleiben technisch unveraendert (Refactor
waere gross), aber UI darf sie nicht als 'Master-Control' bezeichnen.
(2) LLM-Modell-Default war 'qwen2.5:7b' — Projekt nutzt aber das
groessere 'qwen3.5:35b-a3b' auf macmini (ENV SELF_HOSTED_LLM_MODEL).
_escalation.py default jetzt: SELF_HOSTED_LLM_MODEL als Fallback,
und Methodik-Erklaerung nennt das richtige Modell.
(3) Methodik-Erklaerung erweitert um Sprint-1.10 Semantic-Validator
und Sprint-1.11 Auto-Learning-Pattern-Library + Cross-Placement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
154e8c293b |
feat(agents): Cross-Placement-Agent (deplatzierter Content)
CI / detect-changes (push) Successful in 6s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 29s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Sprint 1.9 (User-Vorgabe 2026-06-09):
Erkennt im Impressum Inhalts-Sektionen die thematisch besser in
einen Footer-Reiter 'Legal' gehoeren:
- Urheberrecht / Copyright -> LOW (Footer 'Legal')
- Bilder & Lizenzen -> LOW (Seite 'Bildquellen')
- Haftungsausschluss / Disclaimer -> LOW (Seite 'Disclaimer')
- Nutzungsbedingungen -> LOW (Seite 'AGB')
- Aenderungsvorbehalt -> LOW
- ElektroG / WEEE-Reg -> MEDIUM (Produktinfo)
- VerpackG / LUCID -> MEDIUM
- BattG -> MEDIUM
Each Finding empfiehlt konkret den 'Legal'-Footer-Reiter
einzufuehren als Best Practice ('Impressum bleibt schlank
und enthaelt ausschliesslich die Pflichtangaben nach § 5
TMG/DDG').
Tests gegen die 5 GT-Impressums:
- Safetykon: 3 Findings (Urheberrecht, Bilder/Lizenzen,
Haftungsausschluss)
- Hectronic: 3 Findings (WEEE-MEDIUM, Copyright, Haftung)
- ETO/BMW/Elli: 0 Findings (sauber)
- 9/9 Tests gruen.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
ca8c388f37 |
feat(agents): Semantic-Validator + Auto-Learning-Pattern-Library
CI / detect-changes (push) Successful in 5s
CI / branch-name (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 29s
CI / test-python-document-crawler (push) Has been skipped
Sprint 1.10 — Semantic-Validator (User-Vorgabe 2026-06-09):
- Statt unendlich Regex-Pattern fuer jede Schreibweise zu pflegen
(Tel/Telefon/Telefonnr/Phone/Fon/Funkanschluss/…), nutzen wir
bei MC-MISS einen LLM-Call: 'Ist die Pflichtangabe semantisch
doch da, nur unter abweichendem Label?'
- Bei LLM-Treffer: HIGH/MEDIUM-Finding wird zu LOW demoted,
Empfehlung wird zu 'Best-Practice Umbenennung: Management ->
Geschaeftsfuehrer' (mit STANDARD_LABELS-Mapping).
- 1 LLM-Call pro Slot statt N: cost-effizient.
Sprint 1.11 — Auto-Learning-Pattern-Library:
- Jedes Label das SVL findet wird in JSON persistiert:
/tmp/breakpilot/agent_learned_patterns.json
- Beim naechsten Run prueft der Agent zuerst gelernte Patterns
BEVOR er das HIGH-Finding emittiert -> kein LLM-Call mehr.
- Asymptotisch 0 LLM-Calls fuer haeufige Edge-Cases.
- Halluzinations-Schutz: prune_low_confidence() loescht Patterns
mit <0.5 Avg-Confidence nach 100 Beobachtungen.
- Idempotent: gleicher (field_id, label, agent) -> Counter +1.
Tests: 40/40 gruen (10 Pattern-Library + 7 SVL + 13 GT + 11 v2).
STANDARD_LABELS-Map deckt Impressum + Cookie-Policy. Spaeter
erweiterbar fuer DSE, AGB, Widerrufs-Agenten.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
882e4f9798 |
test(impressum): GT-Fixtures + Fix 'Telefonnummer' Pattern
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 13s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / nodejs-build (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Ground-Truth-Fixtures fuer 5 echte Impressums (ETO, Safetykon, BMW,
Elli, Hectronic). Pro Impressum:
- text (User-eingegeben)
- expected_clean (Felder die da sind → keine Findings)
- business_scope
- placement_concerns (Texte die deplatziert sind — fuer kommenden
Cross-Placement-Agent)
13 GT-Tests + 11 Specialist-Tests = 24/24 gruen.
Bug-Fix: Elli schreibt 'Telefonnummer:' (kein 'Telefon:'),
mein Pattern matched nur Tel/Telefon. Erweitert:
'Tel(?:efon(?:nummer)?)?|Phone|Fon'
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
593baace7c |
fix(agents): HTML-Entity-Decode vor Agent + Pattern duldet '('
CI / detect-changes (push) Successful in 6s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 28s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
Bug bei BMW: dsi-discovery liefert HTML-Entities ( ) als
Literal-Strings ohne Decode. Beispiel im BMW-Impressum:
'wird gesetzlich durch den Vorstand (Milan Nedeljkovic, …)'
Mein Pattern erwartet ':' / '.' / Whitespace nach Vorstand →
matched nicht das '&' → false-positive HIGH-Finding.
Fix 1 (Hauptfix): Test-Harness ruft html.unescape() vor agent.evaluate()
auf, so dass jeder Agent sauberen Text bekommt — entkoppelt von
dsi-discovery-Eigenarten.
Fix 2 (Belt-and-suspenders): Pattern duldet jetzt auch '(' direkt
nach Vorstand/Geschaeftsfuehrer (falls Decode mal fehlschlaegt).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
361a5e7605 |
feat(agents): Test-Harness nutzt volle Compliance-Pipeline für Fetch
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 10s
CI / loc-budget (push) Successful in 12s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / test-python-backend (push) Successful in 28s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Statt der simplen dsi-discovery-Wrapper-Funktion ruft der Test-Harness
jetzt _fetch_text() aus agent_check/_fetch.py — die VOLLE Pipeline
die auch der produktive Compliance-Check verwendet:
- consent-tester dsi-discovery mit 240s Timeout (statt 120s)
- doc_type-aware max_documents (1 für cookie/dse, 3 für impressum)
- CMP-Payload-Capture (ePaaS, OneTrust …)
- HTTP-Fallback mit Browser-User-Agent + DomainRateLimiter
- HTML-Tag-Strip wenn Playwright fail
Damit funktionieren Cloudflare-/Anti-Bot-geschützte Sites wie BMW
und Elli auch im Test-Harness — vorher Timeout nach 90s.
Plus: bei leerem Fetch klare Fehlermeldung im Slot
('Cloudflare-/Anti-Bot-geschützt — Tipp: Text manuell einfügen')
statt silent-fail. cmp_payloads landen jetzt auch im Vault.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
702e7a6333 |
fix(impressum): Pattern fasst Geschäftsführung/Vorstand/Inhaber
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 13s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m21s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 29s
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Safetykon-Bug: 'Geschäftsführung:' (Sammelbegriff für GF einer GmbH)
matched das alte Pattern 'Geschäftsführer' nicht — False-Positive
IMPRESSUM-AGENT-VERTRETUNGSBERECHTIGTE_LABEL_KORREKT.
Pattern erweitert: Geschäftsführer|Geschäftsführung|Geschäftsführerin
+ Vorstand|Vorstandsvorsitzender + Inhaber|persönlich haftend.
Test test_safetykon_geschaeftsfuehrung_passes ergänzt (11/11 grün).
frontend: SlotCard zeigt jetzt Badge bei 0/0/0-Slots
('Dokument konnte nicht geladen werden') statt silent-fail, +
bei 0 Findings ein 'alle MCs OK'-Badge.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
860469d4b1 |
fix(agents): Default-Vault-Pfad nach /tmp damit Container-User schreiben kann
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / loc-budget (push) Successful in 13s
CI / validate-canonical-controls (push) Successful in 11s
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
/app/artifacts gehört root und appuser darf nicht mkdir machen — Endpoint crashte mit PermissionError. Default jetzt /tmp/breakpilot/agent_runs. EVIDENCE_VAULT_ROOT-Env-Var bleibt für persistente Volumes nutzbar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3ae4e60c9d |
feat(agents): SSE-Endpoint + Agent-Test-Tab (5-URL parallel)
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m24s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 29s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Backend:
- specialist_agent_routes.py: GET /agents, POST /test/start (run_id),
GET /test/stream/{run_id} (SSE), GET /run/{run_id}/result,
GET /run/{run_id}/artifacts, GET /run/{run_id}/artifact/{path},
DELETE /run/{run_id}, GET /runs.
- Per-URL async orchestrator: text fetch via consent-tester
dsi-discovery → agent.evaluate() → vault.put_json + stream events.
- Tests: 7/7 grün.
Frontend:
- /api/sdk/v1/specialist-agent proxy mit SSE-passthrough.
- AgentTestTab.tsx: Agent-Wähler + 5 URL-Slots + Live-Events +
Speedometer (OK/N-A/HIGH/MEDIUM/LOW) + Findings + Recommendations +
Eskalations-Log + Artefakt-Link pro Slot.
- Neuer Tab "Agent-Test" in /sdk/agent.
User-Wunsch 2026-06-08: pro Agent isoliert testen, 5 URLs gleichzeitig,
Live-Updates statt Polling-Wartespiel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
f4357a2e9b |
feat(agents): Specialist-Agents Phase 2 Foundation + Cookie-Policy-Agent
Sprint 1 — Foundation (User-Vorgabe 2026-06-08): Foundation: - _base.py: BaseSpecialistAgent ABC + Pydantic Contract (AgentInput/AgentOutput/Finding/Recommendation/McCoverage/EscalationLog). - _base.lint_output(): Disclaimer-Linter verbietet "rechtssicher" / "garantiert" / "gesetzeskonform" — scrubbed inline + Log in notes. - _registry.py: AgentRegistry mit MC-Owner-Mapping (verhindert Doppel-Ownership). - _escalation.py: cascade(local → ovh). qwen2.5:7b default, OVH 120b als Stage-2 (deaktiviert wenn OVH_URL leer). - _rollup.py: deterministisches Dedup ähnlicher actions zu Recommendations mit related_finding_ids[]. - _evidence_vault.py: Pro-Run File-Vault für Playwright-Videos, Screenshots, CSV. SHA256 + manifest.json. DSR-tauglich (delete_run). Agenten: - ImpressumAgent v2 (impressum/agent.py + mcs.py) — konsolidiert v1-Pattern-Match + v2-LLM-MVP unter dem neuen Contract. 12 MCs. - CookiePolicyAgent v1 (cookie_policy/agent.py + mcs.py) — 12 MCs zu Cookie-Richtlinie-Vollständigkeit + KB-Layer für CMP-Vendor-Cross-Check. Tests: 25/25 grün (10 Impressum + 9 Vault + 6 Cookie-Policy). Roadmap: SSE-Test-Endpoint + Frontend-Tab → DSE/AGB-Agents → Cookie-Banner-Themen-Agent → Cross-Doc-Konsistenz-Agent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d6b8bf87c2 |
fix: 4 Bugs gemeinsam — B22 PDF + B17 Walk-Fallback + company_name + Plausibility-Fallback
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / test-python-backend (push) Successful in 29s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 10s
CI / loc-budget (push) Successful in 13s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
(1) B22 Cross-Domain (fix #59):
Elli-Test fand AGB auf logpay.de NICHT obwohl URL in doc_entries
korrekt. Vermutete Ursache: Discovery-Phase A drops/überschreibt
Original-URL bei PDF-Fetch-Fail (word_count=0).
Fix: _collect_audit_urls() iteriert über state.doc_entries +
rejected_url + req.documents — Cross-Domain-Hosting ist
unabhängig vom Text-Inhalt. Plus Trace-Logging für künftige
Diagnose. Dedup per (doc_type, host_sld).
(2) B17 Audit-Walk-Fail-Fallback (fix #60):
BMW v5 hatte audit_walk=None ohne Mail-Hinweis. Vermutlich
180s-Timeout bei OneTrust-CMP-Banner-Tour.
Fix: Timeout 180s → 300s. Plus: Bei Fail wird ein Hinweis-
Stub mit error-Grund in state["audit_walk"] + HTML-Block
geschrieben — Reviewer sieht den Fail statt silent-skip.
(3) company_name + origin_domain im Backend (fix #61):
Frontend sendet seit
|
||
|
|
b4ce3528e5 |
feat(impressum-agent): Tesla-Pattern + KBA-Hint + News-Doc-Type
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m20s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 6s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
User-Feedback Tesla-Impressum: 10 FAIL bei 46 Worten — viele False-
Positives. Nach Tuning: 5 juristisch saubere Findings.
Impressum-Agent Patterns:
- name_anbieter zusätzlich label-frei matchen (Firma+Rechtsform+
Anschrift, Tesla schreibt ohne "Anbieter:" Label).
- vertretungsberechtigte akzeptiert jetzt "Management" / "Director"
als alternative (US-Konzern-Habit), aber emittiert separates
Sub-Finding "Label sollte Geschäftsführer für § 5 TMG sein".
- aufsichtsbehoerde-Pattern um KBA / Bundesnetzagentur erweitert.
- NEU: verantwortlicher_redaktion (§ 18 MStV bei Blog/News).
- NEU: verbraucher_streitbeilegung (§ 36 VSBG bei B2C).
- Auto-Detection von Automotive-Branche: explizite Begriffe ODER
bekannte Hersteller-Namen (Tesla/BMW/Mercedes/Audi/VW/Porsche…).
Triggert KBA-Hint im aufsichtsbehoerde-Finding-Action.
Frontend (_document_types.ts):
- Extrahiert aus ComplianceCheckTab.tsx (vorher inline).
- NEU: doc_type "news" für Blog/Newsroom-URL → § 18 MStV-Pflicht-
angaben prüfen. User-Hinweis: tesla.com/de_de/blog ist
relevanter Audit-Input neben DSE/Impressum.
Smoke gegen Tesla-Impressum (46 Worte):
Vorher 10 Findings (5 davon FP).
Jetzt 5 Findings — alle juristisch korrekt:
[MED] Management statt Geschäftsführer
[LOW] KBA als Aufsichtsbehörde fehlt
[MED] § 18 MStV-Verantwortlicher fehlt (Tesla Blog!)
[MED] § 36 VSBG-Hinweis fehlt
[MED] ODR-Plattform-Link fehlt
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d208a2bde2 |
feat: Mail-Restrukturierung + B22 Cross-Domain-Doc-Detector
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 13s
CI / go-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / python-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
User-Feedback BMW v5: "740 Cookies verschwunden auf 31, Übersicht
verloren". Drei Anpassungen:
Mail-Restrukturierung (_executive_summary.py + _compose.py):
- render_executive_summary(): Top-of-mail TL;DR mit
Compliance-Score (gross + farbig), Top-3-Findings nach
Severity, Cookie-Statistik (deklariert/Browser/Drittland),
Severity-Verteilungs-Chips.
- collapsible(): wrapt jeden Block in <details>/<summary>.
Mailpit + alle modernen Mail-Clients rendern das nativ.
- _compose.py: alle 18+ B-Blöcke + per_doc + per_theme +
legacy_html in Akkordeons. NUR Critical-Findings + Sofort-
massnahmen sind immer offen — Reviewer sieht ~15 Zeilen
Übersicht und klappt selektiv auf.
- Cookie-Inventar (742) hat jetzt eigene Sektion ganz oben
(Akkordeon "🍪 Cookie-Inventar"), Vendor-Karten parallel.
B22 Cross-Domain-Legal-Doc-Detector (cross_domain_doc_check.py):
Real-Beispiel User-Feedback: Elli's AGB liegt auf docs.logpay.de
statt elli.eco. Detektor erkennt SLD-Mismatch:
- HIGH bei agb / widerruf (vertragsrelevant)
- MEDIUM bei dse / nutzungsbedingungen
- INFO bei cookie / impressum (Best-Practice)
Norm: DSGVO Art. 28 (AVV-Pflicht für Hosting) + Art. 13 Abs. 1
lit. e (Empfänger) + § 312i BGB (Cool-URLs).
9/9 Tests grün inkl. Elli/LogPay Pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5c5d676f01 |
feat: Plan B + A + C — DSE-Versions-MCs + Legacy-URL + Multi-Version
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / loc-budget (push) Failing after 11s
CI / python-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 28s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 10s
CI / go-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
Drei verwandte Mechanismen für DSE-Beweisbarkeit + URL-Hygiene.
Plan B + PDF — Versions-Beweisbarkeit-MCs (dse_checks.py):
- mc-dse_version_date (HIGH) — sichtbares Stand/Versionsdatum
Pflicht. 12 Regex-Pattern: "Stand: April 2024", ISO-Datum,
"Letzte Aktualisierung", "Version 3.2", englische
Varianten ("Last updated", "Effective date as of …").
Norm: Art. 7 Abs. 1 DSGVO (Nachweisbarkeit Einwilligung).
- mc-dse_version_proof (MED) — PDF-Download oder
versionierte Archiv-URL. Reine HTML-DSE ohne Snapshot ist
juristisch fragil. 8 Pattern: .pdf, Download-Hinweis,
web.archive.org, /dse-vNNN.html.
Norm: DSK-Orientierungshilfe 2024.
Plan A — Legacy-URL-Discovery (legacy_url_discovery.py + B20):
Vier komplementäre Quellen:
A.1 /sitemap.xml + Sub-Sitemaps parsen, auf compliance-
relevante Slugs filtern
A.2 archive.org/wayback/available pro Slug — wenn Wayback
zeigt ≥18 Monate alten Snapshot UND Seite heute noch
200 liefert UND nicht im Footer → Legacy-Verdacht
A.3 Slug-Permutations: 6 doc_types × 6 Slug-Varianten ×
5 Lang-Prefixe × 4 Brand-Parameter
A.4 Banner-Modal-Links (über consent-tester Stufe 4 Tour)
Mail-Block "🗂️ Legacy-URL-Inventar" mit Tabelle: URL · HTTP ·
Wayback-Alter · Footer · Empfehlung (301/Offline/Behalten).
Engine entscheidet NICHT was Legacy ist — präsentiert das
Inventar, Kunde wählt.
Real-World-Smoke Elli:
/en/cookies → HTTP 200, Wayback 69 Mo alt, nicht im Footer
→ "Legacy-Verdacht, 301 setzen"
/en/impressum → HTTP 302, redirected → "behalten"
Plan C — Multi-Version-DSE-Analyse (multi_version_dse.py):
Wenn ≥2 DSE-URLs reachable: pro Variante DSB-Name + Datum +
Wortzahl + SHA-256 extrahieren, Inkonsistenzen flaggen
(date_divergent, dsb_divergent, no_date_count).
Mail-Block "📑 Mehrere DSE-Versionen erkannt" mit
Vergleichstabelle + rotem Hinweis "Nur eine Version kann
gültig sein". Beispiel Elli: /de/datenschutz (Mollstr-DSB,
2022) vs /de/datenschutzerklaerung?brand=elli (Proliance,
ohne Datum).
API-Response erweitert um legacy_url_inventory +
html_blocks.legacy_urls + multi_version_dse_html im V2-Layout.
ENV-Override: LEGACY_URL_DISABLED=1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|