Benjamin Admin 1f8667c7da feat(control-pipeline): replace similarity-only dedup with LLM-verified dedup in pipeline
Stage 4 (Harmonization) now uses two-tier approach:
- Score >= 0.92: auto-duplicate (embedding only, fast)
- Score 0.85-0.92: LLM verification via local qwen3.5 (think=false, ~3s)
- Score < 0.85: not a duplicate

This eliminates ~44% false positives from pure embedding similarity.
LLM_DEDUP_ENABLED env var controls the feature (default: true).

Also adds 10 applicability use case tests (bank+TAN, webshop+Stripe,
SaaS startup, energy provider, health app, automotive, law firm, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 16:57:37 +02:00
Description
No description provided
52 MiB
Languages
Python 36.4%
TypeScript 34.1%
Go 24.5%
HTML 3.1%
Shell 0.8%
Other 1%