9783657da3
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 43s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 37s
BatchDedup since-Parameter (services/batch_dedup_runner.py + api): - Neuer 'since: datetime' Param scoped Phase 1 + Phase 2 SQL auf created_at >= since. - Phase 2 checkpoint wird beim scoped Lauf geloescht (verhindert Skip neuer Atomics deren control_id alphabetisch unter dem stale last_id liegt). - 6-13x schneller fuer nachgeschobene Dokumente (19k statt 172k Atomics). - Doku: control-pipeline/docs/incremental-dedup.md. Neue Scripts: - gpre1_object_groups_incremental.py: Append neuer Objects an object_groups via bge-m3 nearest-neighbor (threshold default 0.85, empfehlbar 0.78 fuer breiteres Synonym-Matching). Pure INSERT/UPDATE, kein DELETE. - gpre2_master_controls_incremental.py: Non-destructive Master-Controls-Update. Existing MCs unangetastet (UUIDs + master_control_id bleiben), nur neue Members appended + neue MCs fuer Object-Groups die jetzt min-phases erreichen. - ingest_enisa_cra.py: Ingestion der 8 CRA-relevanten ENISA-Dokumente (Standards Mapping, EUCC-Implementation, NIS2 TIG, SRP FAQ, EUCC Eval Methodology, CVD Policies, Threat Landscape 2025). chunk_strategy=legal, requirement_strength=guidance|consultation_draft|evidentiary. Quelldaten: legal-sources/enisa/enisa_cra_single_reporting_platform_faq.html (PDFs sind .gitignore-gefiltert). Ergebnis dieser Pipeline-Iteration: - 1.296 neue CRA-Controls + 19.652 atomare Children - +362 neue Master-Controls, 10.017 existing erweitert - Total: 13.950 MCs, 620 CRA-MCs (vorher 566), 1.304 CRA-Atomics (vorher 841) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>