fix: adapt batch dedup to NULL pattern_id — group by merge_group_hint
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 31s
CI/CD / test-python-backend-compliance (push) Successful in 31s
CI/CD / test-python-document-crawler (push) Successful in 21s
CI/CD / test-python-dsms-gateway (push) Successful in 19s
CI/CD / validate-canonical-controls (push) Successful in 10s
CI/CD / Deploy (push) Successful in 2s
All checks were successful
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Successful in 31s
CI/CD / test-python-backend-compliance (push) Successful in 31s
CI/CD / test-python-document-crawler (push) Successful in 21s
CI/CD / test-python-dsms-gateway (push) Successful in 19s
CI/CD / validate-canonical-controls (push) Successful in 10s
CI/CD / Deploy (push) Successful in 2s
All Pass 0b controls have pattern_id=NULL. Rewritten to: - Phase 1: Group by merge_group_hint (action:object:trigger), 52k groups - Phase 2: Cross-group embedding search for semantically similar masters - Qdrant search uses unfiltered cross-regulation endpoint - API param changed: pattern_id → hint_filter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -776,12 +776,12 @@ _batch_dedup_runner = None
|
||||
@router.post("/migrate/batch-dedup", response_model=MigrationResponse)
|
||||
async def migrate_batch_dedup(
|
||||
dry_run: bool = Query(False, description="Preview mode — no DB changes"),
|
||||
pattern_id: Optional[str] = Query(None, description="Only process this pattern"),
|
||||
hint_filter: Optional[str] = Query(None, description="Only process hints matching this prefix"),
|
||||
):
|
||||
"""Batch dedup: reduce ~85k Pass 0b controls to ~18-25k masters.
|
||||
|
||||
Groups controls by pattern_id + merge_group_hint, picks the best
|
||||
quality master, and links duplicates via control_parent_links.
|
||||
Phase 1: Groups by merge_group_hint, picks best quality master, links rest.
|
||||
Phase 2: Cross-group embedding search for semantically similar masters.
|
||||
"""
|
||||
global _batch_dedup_runner
|
||||
from compliance.services.batch_dedup_runner import BatchDedupRunner
|
||||
@@ -790,7 +790,7 @@ async def migrate_batch_dedup(
|
||||
try:
|
||||
runner = BatchDedupRunner(db=db)
|
||||
_batch_dedup_runner = runner
|
||||
stats = await runner.run(dry_run=dry_run, pattern_filter=pattern_id)
|
||||
stats = await runner.run(dry_run=dry_run, hint_filter=hint_filter)
|
||||
return MigrationResponse(status="completed", stats=stats)
|
||||
except Exception as e:
|
||||
logger.error("Batch dedup failed: %s", e)
|
||||
|
||||
Reference in New Issue
Block a user