feat(pipeline): Anthropic Batch API, source/regulation filter, cost optimization
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 35s
CI/CD / test-python-backend-compliance (push) Successful in 34s
CI/CD / test-python-document-crawler (push) Successful in 22s
CI/CD / test-python-dsms-gateway (push) Successful in 19s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Has been skipped

- Add Anthropic API support to decomposition Pass 0a/0b (prompt caching, content batching)
- Add Anthropic Batch API (50% cost reduction, async 24h processing)
- Add source_filter (ILIKE on source_citation) for regulation-based filtering
- Add category_filter to Pass 0a for selective decomposition
- Add regulation_filter to control_generator for RAG scan phase filtering
  (prefix match on regulation_code — enables CE + Code Review focus)
- New API endpoints: batch-submit-0a, batch-submit-0b, batch-status, batch-process
- 83 new tests (all passing)

Cost reduction: $2,525 → ~$600-700 with all optimizations combined.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-17 13:22:01 +01:00
parent 825e070ed9
commit d22c47c9eb
6 changed files with 1525 additions and 163 deletions

View File

@@ -53,6 +53,7 @@ class GenerateRequest(BaseModel):
batch_size: int = 5
skip_web_search: bool = False
dry_run: bool = False
regulation_filter: Optional[List[str]] = None # Only process these regulation_code prefixes
class GenerateResponse(BaseModel):
@@ -144,6 +145,7 @@ async def start_generation(req: GenerateRequest):
max_chunks=req.max_chunks,
skip_web_search=req.skip_web_search,
dry_run=req.dry_run,
regulation_filter=req.regulation_filter,
)
if req.dry_run: