breakpilot-core

Author	SHA1	Message	Date
Benjamin Admin	42ab5ead26	feat(pipeline): implement Control Dependency Engine (Block 9) Core engine (dependency_engine.py): - 5 dependency types: prerequisite, supersedes, compensating_control, conditional_requirement, scope_exclusion - Generic condition evaluator (JSONB rules with AND/OR/NOT/field ops) - Priority-based conflict resolution - Cycle detection (DFS) + topological sort - Full evaluation with MCP-compatible dependency_resolution trace - 39 tests all passing (incl. GHV scenario from user requirements) Automatic generator (dependency_generator.py): - Ontology-based: same normalized_object + phase sequence -> prerequisite - Pattern-based: define->implement, implement->monitor, etc. - Domain packs: YAML rules for GDPR, AI Act, CRA, Security, Labor Contracts - 14 tests all passing API routes (dependency_routes.py): - CRUD for dependencies - POST /evaluate with dependency resolution - POST /generate (auto-generation with dry_run) - POST /validate (cycle detection) - GET /graph (nodes + edges for visualization) Prompt enhancement (decomposition_pass.py): - Added dependency_hints + lifecycle_phase_order to Pass 0b prompt - Stored in generation_metadata for post-processing DB migration: control_dependencies + control_evaluation_results tables 126 tests total, all passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 20:28:10 +02:00
Benjamin Admin	5aaa62dca7	fix(pipeline): improve quality metrics heuristics - Fix truncated title detection: only flag near-200-char titles or mid-word cutoffs - Fix evidence leak detection: check title start patterns, not keyword substring ("nachweisen" verb is valid action, "Nachweis vorliegen" is evidence) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:53:52 +02:00
Benjamin Admin	d583971afd	feat(pipeline): add quality metrics endpoint for Pass 0b controls GET /generate/quality-metrics — reports: - controls_per_obligation ratio - duplicate merge_key rate - evidence leak rate - truncated title rate - MCP field coverage - merge_key coverage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:51:27 +02:00
Benjamin Admin	d1f3b9ffcd	feat(pipeline): add submit-pass0b endpoint for batch submission Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:42:06 +02:00
Benjamin Admin	d93321275c	feat(pipeline): add batch API status + result processing endpoints - GET /generate/batch-api-status/{batch_id} — check Anthropic batch status - POST /generate/process-batch — process completed batch results (background) - GET /generate/process-batch-status/{job_id} — poll processing progress Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:36:47 +02:00
Benjamin Admin	1a3101066e	fix: paginated indexing to avoid OOM on 53k controls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 16:31:20 +02:00
Benjamin Admin	043bcb65d8	fix(control-pipeline): harmonization recheck indexes ALL drafts, not just atomics Previous version searched against atomic_controls_dedup collection which only contains Pass 0b atomic controls. Now creates a temporary collection with ALL draft controls as reference, then checks targets against it. Two phases: 1. Index ~53k reference drafts into temp Qdrant collection (batch 32) 2. Search each of 14k target controls, Embedding + LLM for borderline 3. Cleanup temp collection when done Status updates every 50 controls (fixed counter bug). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 15:42:40 +02:00
Benjamin Admin	d31fccbe0e	feat(control-pipeline): add harmonization recheck endpoint POST /generate/harmonization-recheck verifies promoted controls against Qdrant dedup collection via Embedding + LLM. Runs as stable asyncio background task inside the container (no docker exec issues). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 13:25:56 +02:00
Benjamin Admin	3ffa3f5793	feat(control-pipeline): add Document Compliance Engine — scope detection + document requirements New service: document_scope_resolver.py with 28 document rules covering: - Base (impressum, privacy_policy) - Tracking (cookie_banner, cookie_policy) - E-Commerce (AGB, withdrawal, shipping, pricing, payment) - Digital (digital_content_terms, no_withdrawal_notice) - SaaS (ToS, service_description, DPA, SLA) - AI (transparency_notice, automated_decisions) - Hardware (warranty, return, CE, safety) - Environmental (WEEE, battery disposal) - Marketplace (seller terms, ranking transparency) - Subscription (cancellation terms) API: POST /v1/document-compliance/required Input: company flags + jurisdiction → Output: required documents + assessment Includes confidence scoring, escalation detection (e.g. ecommerce without distance_selling flag), and reasoning. 19 tests covering all business model combinations including B2B-only exclusions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 08:39:55 +02:00
Benjamin Admin	bed41dcbdf	feat(control-pipeline): add applicability backfill endpoint (Phase 5/C3) POST /v1/canonical/generate/backfill-applicability enriches controls with applicable_industries, applicable_company_size, scope_conditions via Anthropic API. Targets ~26k controls from pipeline version < 3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 16:25:50 +02:00
Benjamin Admin	736ddf647d	fix(llm-dedup): use think:false instead of /no_think, restore 30s timeout Ollama API supports "think": false to disable extended thinking mode on qwen3.5. Reduces response time from 95s to ~3s per comparison. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 20:31:13 +02:00
Benjamin Admin	2188d6645e	fix(llm-dedup): increase timeout to 120s, add /no_think, limit output to 200 tokens qwen3.5 uses extended thinking by default which causes 95s+ responses and 30s timeouts. Add /no_think to system prompt and num_predict=200 to keep responses short. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 20:27:58 +02:00
Benjamin Admin	076a6cd567	feat(control-pipeline): add LLM dedup endpoint for borderline review queue POST /v1/canonical/generate/llm-dedup uses local Ollama (qwen3.5:35b-a3b) to verify borderline duplicate matches (score 0.85-0.91). More accurate than embedding similarity for compliance controls with subtle scope differences (e.g. "documented" vs "implemented"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 20:15:46 +02:00
Benjamin Admin	642a8587b5	feat(control-pipeline): add batch-dedup endpoint + source_citation JSONB migration - Add POST /v1/canonical/generate/batch-dedup endpoint for Pass 0b atomic controls deduplication (Phase 1: intra-group, Phase 2: cross-group) - source_citation column migrated from TEXT to JSONB (5,459 rows converted) - migrate_jsonb.py script added for generation_metadata conversion Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 08:44:31 +02:00
Benjamin Admin	8b7671d310	feat(control-pipeline): add repair backfill endpoint for missing title/objective/requirements POST /v1/canonical/generate/backfill-repair uses Anthropic API to generate missing fields from available context (source text, other fields). Handles 1,470 controls with incomplete data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 07:06:19 +02:00
Benjamin Admin	b29dc33708	fix(control-pipeline): anchor finder uses direct Qdrant search instead of Go SDK The Go SDK RAG proxy returns 401 (Qdrant API key mismatch). Switch AnchorFinder to use direct Qdrant vector search + embedding service, same approach as the main pipeline. No dependency on Go SDK anymore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 18:13:12 +02:00
Benjamin Admin	91f4202e88	feat(control-pipeline): add anchor backfill endpoint + normalize target_audience - Add POST /v1/canonical/generate/backfill-anchors endpoint for batch populating open_anchors on controls generated with skip_web_search=true - Uses AnchorFinder Stage A (RAG search) to find OWASP/NIST/ENISA refs - Background job with progress tracking (same pattern as other backfills) - Promotes needs_review controls that gain anchors to draft state - Target audience normalization (enterprise/authority/provider → JSON arrays) already applied via SQL Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 18:04:50 +02:00
Benjamin Admin	756d068b4f	fix: skip_web_search Default auf True — 5x schnellere Pipeline Anchor-Search (DuckDuckGo + RAG via SDK) verlangsamt Pipeline von ~50 Chunks/min auf ~10 Chunks/min. Anchors (OWASP/NIST-Referenzen) koennen nachtraeglich in einem Batch-Job befuellt werden. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:26:01 +02:00
Benjamin Admin	f89ce46631	fix: Pipeline-Skalierung — 6 Optimierungen für 80k+ Controls 1. control_generator: GeneratorResult.status Default "completed" → "running" (Bug) 2. control_generator: Anthropic API mit Phase-Timeouts + Retry bei Disconnect 3. control_generator: regulation_exclude Filter + Harmonization via Qdrant statt In-Memory 4. decomposition_pass: Enrich Pass Batch-UPDATEs (400k → ~400 DB-Calls) 5. decomposition_pass: Merge Pass single Query statt N+1 6. batch_dedup_runner: Cross-Group Dedup parallelisiert (asyncio.gather) 7. canonical_control_routes: Framework Controls API Pagination (limit/offset) 8. DB-Indizes: idx_oc_parent_release, idx_oc_trigger_null, idx_cc_framework Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 14:09:32 +02:00
Benjamin Admin	441d5740bd	feat: Applicability Engine + API-Filter + DB-Sync + Cleanup CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 35s Details CI / test-python-voice (push) Successful in 33s Details CI / test-bqas (push) Successful in 37s Details CI / Deploy (push) Failing after 2s Details - Applicability Engine (deterministisch, kein LLM): filtert Controls nach Branche, Unternehmensgroesse, Scope-Signalen - API-Filter auf GET /controls, /controls-count, /controls-meta - POST /controls/applicable Endpoint fuer Company-Profile-Matching - 35 Unit-Tests fuer Engine - Port-8098-Konflikt mit Nginx gefixt (nur expose, kein Host-Port) - CLAUDE.md: control-pipeline Dokumentation ergaenzt - 6 internationale Gesetze geloescht (ES/FR/HU/NL/SE/CZ — nur DACH) - DB-Backup-Import-Script (import_backup.py) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:58:17 +02:00
Benjamin Admin	e3ab428b91	feat: control-pipeline Service aus Compliance-Repo migriert Control-Pipeline (Pass 0a/0b, BatchDedup, Generator) als eigenstaendiger Service in Core, damit Compliance-Repo unabhaengig refakturiert werden kann. Schreibt weiterhin ins compliance-Schema der shared PostgreSQL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 14:40:47 +02:00

21 Commits