breakpilot-core

Author	SHA1	Message	Date
Benjamin Admin	043bcb65d8	fix(control-pipeline): harmonization recheck indexes ALL drafts, not just atomics Previous version searched against atomic_controls_dedup collection which only contains Pass 0b atomic controls. Now creates a temporary collection with ALL draft controls as reference, then checks targets against it. Two phases: 1. Index ~53k reference drafts into temp Qdrant collection (batch 32) 2. Search each of 14k target controls, Embedding + LLM for borderline 3. Cleanup temp collection when done Status updates every 50 controls (fixed counter bug). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 15:42:40 +02:00
Benjamin Admin	d31fccbe0e	feat(control-pipeline): add harmonization recheck endpoint POST /generate/harmonization-recheck verifies promoted controls against Qdrant dedup collection via Embedding + LLM. Runs as stable asyncio background task inside the container (no docker exec issues). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 13:25:56 +02:00
Benjamin Admin	3ffa3f5793	feat(control-pipeline): add Document Compliance Engine — scope detection + document requirements New service: document_scope_resolver.py with 28 document rules covering: - Base (impressum, privacy_policy) - Tracking (cookie_banner, cookie_policy) - E-Commerce (AGB, withdrawal, shipping, pricing, payment) - Digital (digital_content_terms, no_withdrawal_notice) - SaaS (ToS, service_description, DPA, SLA) - AI (transparency_notice, automated_decisions) - Hardware (warranty, return, CE, safety) - Environmental (WEEE, battery disposal) - Marketplace (seller terms, ranking transparency) - Subscription (cancellation terms) API: POST /v1/document-compliance/required Input: company flags + jurisdiction → Output: required documents + assessment Includes confidence scoring, escalation detection (e.g. ecommerce without distance_selling flag), and reasoning. 19 tests covering all business model combinations including B2B-only exclusions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 08:39:55 +02:00
Benjamin Admin	f1359d63ba	fix: handle new numeric batch custom_id format in Pass 0a result processing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 07:21:50 +02:00
Benjamin Admin	bbfcd44407	fix: use numeric batch index as custom_id (64 char limit, alphanumeric only) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 00:39:13 +02:00
Benjamin Admin	5da5a5597b	fix: increase Batch API upload timeout to 600s for large payloads Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 00:31:50 +02:00
Benjamin Admin	38684dd903	feat(control-pipeline): add Assessment Layer to Applicability Engine Adds confidence scoring, escalation detection, and reasoning to the deterministic filter. All assessment is deterministic (no LLM). Confidence scoring (0.0-1.0): - +0.25 industry specified - +0.15 company size specified - +0.20-0.30 scope signals provided - +0.15 controls found - +0.15 no contradictions - Capped at 0.75 for escalation cases Escalation triggers: - Contradictory signals (holds_client_funds without operates_payment_service) - Ambiguous signals (provides_embedded_connectivity) - Financial signals without explicit payment service declaration - Incomplete profile (no industry, size, or signals) Reasoning: template-based, includes active signals, control count, scope-condition descriptions, and warnings. Response now includes "assessment" field with confidence, escalation_flag, escalation_reason, inferred_signals, reasoning, and warnings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 20:36:11 +02:00
Benjamin Admin	716bc651c4	fix(control-pipeline): remove fictional demo packages, add real DB integration tests Deleted 3 packages that were copied without validation: - applicability_demo/ (fictional control IDs, wrong API schema) - applicability_demo_sdk/ (wrong endpoint URL, fictional request format) - applicability_demo_ci/ (GitHub Actions instead of Gitea, duplicated code) Replaced with real integration in test_applicability_use_cases.py: - TestApplicabilityIntegration calls real get_applicable_controls() - Checks source_citation->source and control_id domain prefixes - Runs against actual DB when DATABASE_URL is set - 128 structure/acceptance tests pass, 24 integration tests skip without DB Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 19:59:56 +02:00
Benjamin Admin	27f12e4659	feat(control-pipeline): add CI regression suite for applicability tests Makefile + pytest + GitHub Actions workflow for automated regression: - make install / make eval / make test - pytest integration with demo_cases.yaml - Golden outputs for 6 priority cases - Report generation (JSON + Markdown) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 19:12:44 +02:00
Benjamin Admin	a7c6ffe4dd	feat(control-pipeline): add SDK endpoint demo package for applicability tests Request payloads + response contract + api_runner.py for 6 priority cases. Can be run directly against /v1/applicability/evaluate endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 19:11:44 +02:00
Benjamin Admin	ae5c5c24eb	feat(control-pipeline): add applicability demo test package with evaluator 6 priority demo cases with golden outputs, evaluator.py and run_demo.py: - CASE-001: Webshop+Stripe (anti-PSD2 false positive) - CASE-002: Bank+TAN-Generator (scope override for batteries) - CASE-004: FinTech Wallet (true positive PSD2/AML) - CASE-006: SaaS+SMS Gateway (anti-TKG false positive) - CASE-008: Software→IoT Hardware (multi-regime scope) - CASE-011: Embedded Finance (escalation case) Self-test passes 6/6 against golden outputs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 19:08:31 +02:00
Benjamin Admin	e8ec50e0fc	feat(control-pipeline): 24 demo test cases for applicability engine YAML-based test package with 4 categories (6 each): - Standard sector cases (Telko, SaaS, Energie, Automotive, Health, Law) - Scope-beats-sector (Bank+Battery, KI-Recruiting, White-Label, Payments) - False friends (Stripe!=PSD2, Hotline!=TKG, Repo-signals!=regulation) - Escalation (IoT-SIM, FinTech unclear, Treuhand, KI-Diagnose) Enforces 5 acceptance rules: no false certainty, scope>sector, repo signals insufficient, standard first, 40%+ negative tests. Scoring framework: must_include + must_not_include + reasoning + escalation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 17:42:38 +02:00
Benjamin Admin	1f8667c7da	feat(control-pipeline): replace similarity-only dedup with LLM-verified dedup in pipeline Stage 4 (Harmonization) now uses two-tier approach: - Score >= 0.92: auto-duplicate (embedding only, fast) - Score 0.85-0.92: LLM verification via local qwen3.5 (think=false, ~3s) - Score < 0.85: not a duplicate This eliminates ~44% false positives from pure embedding similarity. LLM_DEDUP_ENABLED env var controls the feature (default: true). Also adds 10 applicability use case tests (bank+TAN, webshop+Stripe, SaaS startup, energy provider, health app, automotive, law firm, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 16:57:37 +02:00
Benjamin Admin	bed41dcbdf	feat(control-pipeline): add applicability backfill endpoint (Phase 5/C3) POST /v1/canonical/generate/backfill-applicability enriches controls with applicable_industries, applicable_company_size, scope_conditions via Anthropic API. Targets ~26k controls from pipeline version < 3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 16:25:50 +02:00
Benjamin Admin	736ddf647d	fix(llm-dedup): use think:false instead of /no_think, restore 30s timeout Ollama API supports "think": false to disable extended thinking mode on qwen3.5. Reduces response time from 95s to ~3s per comparison. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 20:31:13 +02:00
Benjamin Admin	2188d6645e	fix(llm-dedup): increase timeout to 120s, add /no_think, limit output to 200 tokens qwen3.5 uses extended thinking by default which causes 95s+ responses and 30s timeouts. Add /no_think to system prompt and num_predict=200 to keep responses short. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 20:27:58 +02:00
Benjamin Admin	076a6cd567	feat(control-pipeline): add LLM dedup endpoint for borderline review queue POST /v1/canonical/generate/llm-dedup uses local Ollama (qwen3.5:35b-a3b) to verify borderline duplicate matches (score 0.85-0.91). More accurate than embedding similarity for compliance controls with subtle scope differences (e.g. "documented" vs "implemented"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 20:15:46 +02:00
Benjamin Admin	fc855f52f9	fix(batch-dedup): don't crash on FK violation in _write_review Stale UUIDs in the Qdrant dedup collection can reference controls that were deprecated in earlier batches. Log warning and continue instead of raising and killing the entire job. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 13:25:28 +02:00
Benjamin Admin	642a8587b5	feat(control-pipeline): add batch-dedup endpoint + source_citation JSONB migration - Add POST /v1/canonical/generate/batch-dedup endpoint for Pass 0b atomic controls deduplication (Phase 1: intra-group, Phase 2: cross-group) - source_citation column migrated from TEXT to JSONB (5,459 rows converted) - migrate_jsonb.py script added for generation_metadata conversion Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 08:44:31 +02:00
Benjamin Admin	4716023abc	feat(control-pipeline): JSONB migration for generation_metadata - Add migration script (scripts/migrate_jsonb.py) that converts 89,443 Python dict repr rows to valid JSON via ast.literal_eval - Column altered from TEXT to native JSONB - Index created on generation_metadata->>'merge_group_hint' - Remove unnecessary ::jsonb casts in pipeline_adapter.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 07:49:11 +02:00
Benjamin Admin	8b7671d310	feat(control-pipeline): add repair backfill endpoint for missing title/objective/requirements POST /v1/canonical/generate/backfill-repair uses Anthropic API to generate missing fields from available context (source text, other fields). Handles 1,470 controls with incomplete data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 07:06:19 +02:00
Benjamin Admin	fb53c8be90	fix(anchor-finder): use correct Qdrant payload fields (regulation_id, regulation_name_de) Qdrant collections use regulation_id (not regulation_code), regulation_name_de, guideline_name, download_url etc. Also search bp_compliance_datenschutz collection where OWASP/ENISA docs live. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 18:17:36 +02:00
Benjamin Admin	b29dc33708	fix(control-pipeline): anchor finder uses direct Qdrant search instead of Go SDK The Go SDK RAG proxy returns 401 (Qdrant API key mismatch). Switch AnchorFinder to use direct Qdrant vector search + embedding service, same approach as the main pipeline. No dependency on Go SDK anymore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 18:13:12 +02:00
Benjamin Admin	91f4202e88	feat(control-pipeline): add anchor backfill endpoint + normalize target_audience - Add POST /v1/canonical/generate/backfill-anchors endpoint for batch populating open_anchors on controls generated with skip_web_search=true - Uses AnchorFinder Stage A (RAG search) to find OWASP/NIST/ENISA refs - Background job with progress tracking (same pattern as other backfills) - Promotes needs_review controls that gain anchors to draft state - Target audience normalization (enterprise/authority/provider → JSON arrays) already applied via SQL Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 18:04:50 +02:00
Benjamin Admin	c4e993e3f8	fix: Leere Controls (title/objective=None) filtern vor Store CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 44s Details CI / test-python-voice (push) Successful in 33s Details CI / test-bqas (push) Successful in 30s Details CI / Deploy (push) Failing after 4s Details - Batch-Postprocessing: Controls mit title/objective = None/null/"" werden gefiltert und nicht gespeichert. Title wird aus Objective abgeleitet falls nur Title fehlt. - _store_control: Pre-store Quality Guard lehnt leere Controls ab - Verhindert "None"-Controls die durch LLM-Parsing-Fehler entstehen Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 06:59:47 +02:00
Benjamin Admin	a58d1aa403	fix: KRITISCH — 12 Pipeline-Bugs gefixt, 36.000 verlorene Controls retten CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 36s Details CI / test-python-voice (push) Successful in 37s Details CI / test-bqas (push) Successful in 31s Details CI / Deploy (push) Failing after 2s Details Root Cause: _generate_control_id erzeugte ID-Kollisionen (String-Sort statt numeric), ON CONFLICT DO NOTHING verwarf Controls stillschweigend, Chunks wurden als "processed" markiert obwohl Store fehlschlug → permanent verloren. Fixes: 1. _generate_control_id: Numeric MAX statt String-Sort, Collision Guard mit UUID-Suffix Fallback, Exception wird geloggt statt verschluckt 2. _store_control: ON CONFLICT DO UPDATE statt DO NOTHING → ID immer returned 3. Store-Logik: Chunk wird bei store_failed NICHT mehr als processed markiert → Retry beim naechsten Lauf moeglich 4. Counter: controls_generated nur bei erfolgreichem Store inkrementiert Neue Counter: controls_stored + controls_store_failed 5. Anthropic API: HTTP 429/500/502/503/504 werden jetzt retried (2 Versuche) 6. Monitoring: Progress-Log zeigt Store-Rate (%), ALARM bei <80% 7. Post-Job Validierung: Vergleicht Generated vs Stored vs DB-Realitaet WARNUNG wenn store_failed > 0, KRITISCH wenn Rate < 90% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 00:39:12 +02:00
Benjamin Admin	756d068b4f	fix: skip_web_search Default auf True — 5x schnellere Pipeline Anchor-Search (DuckDuckGo + RAG via SDK) verlangsamt Pipeline von ~50 Chunks/min auf ~10 Chunks/min. Anchors (OWASP/NIST-Referenzen) koennen nachtraeglich in einem Batch-Job befuellt werden. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:26:01 +02:00
Benjamin Admin	b14be8583d	feat: Betriebsrats-Compliance — BAG-Ingestion Script + BV-Template 1. BAG-Urteile Ingestion Script (21 kuratierte Urteile zu §87 BetrVG) - Microsoft 365, SAP ERP, E-Mail, Standardsoftware, Video, SaaS/Cloud - 14 erfolgreich ingestiert (4.726 Chunks in bp_compliance_datenschutz) 2. Betriebsvereinbarung Template (6. Document Template) - SQL-Migration mit 13 Sektionen (A-M), ~30 Placeholders - Conditional Blocks fuer KI-Systeme, Video, HR - Python-Generator mit automatischer TOM-Befuellung Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:49:01 +02:00
Benjamin Admin	f89ce46631	fix: Pipeline-Skalierung — 6 Optimierungen für 80k+ Controls 1. control_generator: GeneratorResult.status Default "completed" → "running" (Bug) 2. control_generator: Anthropic API mit Phase-Timeouts + Retry bei Disconnect 3. control_generator: regulation_exclude Filter + Harmonization via Qdrant statt In-Memory 4. decomposition_pass: Enrich Pass Batch-UPDATEs (400k → ~400 DB-Calls) 5. decomposition_pass: Merge Pass single Query statt N+1 6. batch_dedup_runner: Cross-Group Dedup parallelisiert (asyncio.gather) 7. canonical_control_routes: Framework Controls API Pagination (limit/offset) 8. DB-Indizes: idx_oc_parent_release, idx_oc_trigger_null, idx_cc_framework Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 14:09:32 +02:00
Benjamin Admin	441d5740bd	feat: Applicability Engine + API-Filter + DB-Sync + Cleanup CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 35s Details CI / test-python-voice (push) Successful in 33s Details CI / test-bqas (push) Successful in 37s Details CI / Deploy (push) Failing after 2s Details - Applicability Engine (deterministisch, kein LLM): filtert Controls nach Branche, Unternehmensgroesse, Scope-Signalen - API-Filter auf GET /controls, /controls-count, /controls-meta - POST /controls/applicable Endpoint fuer Company-Profile-Matching - 35 Unit-Tests fuer Engine - Port-8098-Konflikt mit Nginx gefixt (nur expose, kein Host-Port) - CLAUDE.md: control-pipeline Dokumentation ergaenzt - 6 internationale Gesetze geloescht (ES/FR/HU/NL/SE/CZ — nur DACH) - DB-Backup-Import-Script (import_backup.py) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:58:17 +02:00
Benjamin Admin	e3ab428b91	feat: control-pipeline Service aus Compliance-Repo migriert Control-Pipeline (Pass 0a/0b, BatchDedup, Generator) als eigenstaendiger Service in Core, damit Compliance-Repo unabhaengig refakturiert werden kann. Schreibt weiterhin ins compliance-Schema der shared PostgreSQL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 14:40:47 +02:00

31 Commits