1 Commits

Author SHA1 Message Date
Benjamin Admin
35784c35eb feat: Batch Dedup Runner — 85k→~18-25k Master Controls
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 32s
CI/CD / test-python-backend-compliance (push) Successful in 30s
CI/CD / test-python-document-crawler (push) Successful in 20s
CI/CD / test-python-dsms-gateway (push) Successful in 16s
CI/CD / validate-canonical-controls (push) Successful in 9s
CI/CD / Deploy (push) Successful in 1s
Adds batch orchestration for deduplicating ~85k Pass 0b atomic controls
into ~18-25k unique masters with M:N parent linking.

New files:
- migrations/078_batch_dedup.sql: merged_into_uuid column, perf indexes,
  link_type CHECK extended for cross_regulation
- batch_dedup_runner.py: BatchDedupRunner with quality scoring, merge-hint
  grouping, title-identical short-circuit, parent-link transfer, and
  cross-regulation pass
- tests/test_batch_dedup_runner.py: 21 tests (all passing)

Modified:
- control_dedup.py: optional collection param on Qdrant functions
- crosswalk_routes.py: POST/GET batch-dedup endpoints

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 07:06:38 +01:00