breakpilot-core

Author	SHA1	Message	Date
Benjamin Admin	d660a45bb5	feat(pipeline): implement golden test suite + fix ontology patterns - Add test_golden_controls.py: 37 tests covering all 8 YAML categories (container, framework, evidence, negative, title, split, scope, merge_key) - Fix evidence detection: handle German feminine articles (eine/einer/etc.) - Fix framework detection: use verb stems for conjugated German verbs - Add framework patterns: OWASP API6, CCM without CSA prefix, generic category - Fix negative patterns: use "nicht übertragen/gespeichert/erscheinen" before generic "dürfen nicht" to correctly route prevent vs exclude All 73 tests passing (36 ontology + 37 golden). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:48:12 +02:00
Benjamin Admin	629b9d9ca5	feat(pipeline): store MCP fields (assertion, pass/fail criteria, check_type) in generation_metadata - Add assertion, pass_criteria, fail_criteria, check_type to AtomicControlCandidate dataclass - Parse MCP fields from LLM output in _process_pass0b_control - Store MCP fields in generation_metadata JSON for later use by MCP scanner - Fields default to empty when not present (backward-compatible with old prompts) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:32:56 +02:00
Benjamin Admin	7e3b1108e2	feat: integrate Ontology pre-LLM filter into Pass 0b submit Obligations classified before API call: - evidence → skipped (saves API cost) - composite → skipped (not atomic) - framework_container → skipped (decompose separately) - atomic → sent to LLM Filter stats returned in submit response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:13:32 +02:00
Benjamin Admin	b3fbbbacfe	feat(control-pipeline): Control Ontology v1 — action types, evidence/container/framework detection Block 7.1-7.2 from masterplan: - 26 action_types with German aliases + phase mapping - Negative obligation patterns (exclude, prevent, enforce) - Container detection (11 composite objects that must not become atomic) - Evidence detection (14 indicators + "X dokumentieren" pattern) - Framework reference detection (OWASP, NIST, BSI, CSA, ISO patterns) - classify_obligation() routes to: atomic, composite, evidence, framework_container - build_canonical_key() for deterministic dedup - 36 tests covering all classification functions Also: merge_key bug fix in _process_pass0b_control() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:06:39 +02:00
Benjamin Admin	3a100fa1f1	feat: Pass 0b prompt v3 — compound action ban, evidence-of-action rule, pflicht-vs-prozess merge Fixes from v2 evaluation (7.9/10 avg, 28 controls): 1. COMPOUND BAN: "durchführen UND Maßnahmen ergreifen" → pick primary action only 2. EVIDENCE-OF-ACTION: "Tests dokumentieren" → evidence field, not own control 3. PFLICHT=PROZESS: "Behörden informieren" + "Verfahren etablieren" = 1 control 4. MERGE-KEY BUG: merge_key from LLM output now stored in generation_metadata Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 00:25:38 +02:00
Benjamin Admin	7a53f5bee1	feat: Pass 0b prompt v2 — container detection, merge-key, evidence separation, actionable titles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 00:00:59 +02:00
Benjamin Admin	ea30ceb1f1	feat(control-pipeline): improved Pass 0b prompt for actionable control titles Key changes to system prompt: - Evidence/documentation belongs in evidence field, NOT as separate control - SBOM = 1 control (not "maintain" + "document" separately) - Security lifecycle phases (identify/assess/remediate/monitor) = separate controls - Same object + same action + same actor = 1 control (merge, not split) - Titles must contain the ACTION, not just the subject WRONG: "Vertraulichkeit Mitarbeiter" RIGHT: "Mitarbeiter zur Vertraulichkeit verpflichten" Titles serve as MCP search queries against customer documents/code. Bad titles = bad search results = unusable product. All 52,566 old pass0b controls deprecated (not deleted) for full regeneration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 23:45:37 +02:00
Benjamin Admin	cd33777d75	fix: Pass 0b INSERT ON CONFLICT DO UPDATE + per-result commit/rollback Prevents UniqueViolation from blocking entire batch. Each result is committed individually, errors are rolled back without affecting subsequent results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 22:15:21 +02:00
Benjamin Admin	c73a489075	fix: Pass 0b filter — skip obligations whose parent already has pass0b controls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 21:54:32 +02:00
Benjamin Admin	7ddb572f5d	fix: Pass 0b batch custom_id + result handler for numeric format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 16:08:19 +02:00
Benjamin Admin	3ffa3f5793	feat(control-pipeline): add Document Compliance Engine — scope detection + document requirements New service: document_scope_resolver.py with 28 document rules covering: - Base (impressum, privacy_policy) - Tracking (cookie_banner, cookie_policy) - E-Commerce (AGB, withdrawal, shipping, pricing, payment) - Digital (digital_content_terms, no_withdrawal_notice) - SaaS (ToS, service_description, DPA, SLA) - AI (transparency_notice, automated_decisions) - Hardware (warranty, return, CE, safety) - Environmental (WEEE, battery disposal) - Marketplace (seller terms, ranking transparency) - Subscription (cancellation terms) API: POST /v1/document-compliance/required Input: company flags + jurisdiction → Output: required documents + assessment Includes confidence scoring, escalation detection (e.g. ecommerce without distance_selling flag), and reasoning. 19 tests covering all business model combinations including B2B-only exclusions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 08:39:55 +02:00
Benjamin Admin	f1359d63ba	fix: handle new numeric batch custom_id format in Pass 0a result processing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 07:21:50 +02:00
Benjamin Admin	bbfcd44407	fix: use numeric batch index as custom_id (64 char limit, alphanumeric only) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 00:39:13 +02:00
Benjamin Admin	5da5a5597b	fix: increase Batch API upload timeout to 600s for large payloads Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 00:31:50 +02:00
Benjamin Admin	38684dd903	feat(control-pipeline): add Assessment Layer to Applicability Engine Adds confidence scoring, escalation detection, and reasoning to the deterministic filter. All assessment is deterministic (no LLM). Confidence scoring (0.0-1.0): - +0.25 industry specified - +0.15 company size specified - +0.20-0.30 scope signals provided - +0.15 controls found - +0.15 no contradictions - Capped at 0.75 for escalation cases Escalation triggers: - Contradictory signals (holds_client_funds without operates_payment_service) - Ambiguous signals (provides_embedded_connectivity) - Financial signals without explicit payment service declaration - Incomplete profile (no industry, size, or signals) Reasoning: template-based, includes active signals, control count, scope-condition descriptions, and warnings. Response now includes "assessment" field with confidence, escalation_flag, escalation_reason, inferred_signals, reasoning, and warnings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 20:36:11 +02:00
Benjamin Admin	1f8667c7da	feat(control-pipeline): replace similarity-only dedup with LLM-verified dedup in pipeline Stage 4 (Harmonization) now uses two-tier approach: - Score >= 0.92: auto-duplicate (embedding only, fast) - Score 0.85-0.92: LLM verification via local qwen3.5 (think=false, ~3s) - Score < 0.85: not a duplicate This eliminates ~44% false positives from pure embedding similarity. LLM_DEDUP_ENABLED env var controls the feature (default: true). Also adds 10 applicability use case tests (bank+TAN, webshop+Stripe, SaaS startup, energy provider, health app, automotive, law firm, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 16:57:37 +02:00
Benjamin Admin	fc855f52f9	fix(batch-dedup): don't crash on FK violation in _write_review Stale UUIDs in the Qdrant dedup collection can reference controls that were deprecated in earlier batches. Log warning and continue instead of raising and killing the entire job. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 13:25:28 +02:00
Benjamin Admin	4716023abc	feat(control-pipeline): JSONB migration for generation_metadata - Add migration script (scripts/migrate_jsonb.py) that converts 89,443 Python dict repr rows to valid JSON via ast.literal_eval - Column altered from TEXT to native JSONB - Index created on generation_metadata->>'merge_group_hint' - Remove unnecessary ::jsonb casts in pipeline_adapter.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 07:49:11 +02:00
Benjamin Admin	fb53c8be90	fix(anchor-finder): use correct Qdrant payload fields (regulation_id, regulation_name_de) Qdrant collections use regulation_id (not regulation_code), regulation_name_de, guideline_name, download_url etc. Also search bp_compliance_datenschutz collection where OWASP/ENISA docs live. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 18:17:36 +02:00
Benjamin Admin	b29dc33708	fix(control-pipeline): anchor finder uses direct Qdrant search instead of Go SDK The Go SDK RAG proxy returns 401 (Qdrant API key mismatch). Switch AnchorFinder to use direct Qdrant vector search + embedding service, same approach as the main pipeline. No dependency on Go SDK anymore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 18:13:12 +02:00
Benjamin Admin	91f4202e88	feat(control-pipeline): add anchor backfill endpoint + normalize target_audience - Add POST /v1/canonical/generate/backfill-anchors endpoint for batch populating open_anchors on controls generated with skip_web_search=true - Uses AnchorFinder Stage A (RAG search) to find OWASP/NIST/ENISA refs - Background job with progress tracking (same pattern as other backfills) - Promotes needs_review controls that gain anchors to draft state - Target audience normalization (enterprise/authority/provider → JSON arrays) already applied via SQL Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 18:04:50 +02:00
Benjamin Admin	c4e993e3f8	fix: Leere Controls (title/objective=None) filtern vor Store Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 44s Details CI / test-python-voice (push) Successful in 33s Details CI / test-bqas (push) Successful in 30s Details CI / Deploy (push) Failing after 4s Details - Batch-Postprocessing: Controls mit title/objective = None/null/"" werden gefiltert und nicht gespeichert. Title wird aus Objective abgeleitet falls nur Title fehlt. - _store_control: Pre-store Quality Guard lehnt leere Controls ab - Verhindert "None"-Controls die durch LLM-Parsing-Fehler entstehen Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 06:59:47 +02:00
Benjamin Admin	a58d1aa403	fix: KRITISCH — 12 Pipeline-Bugs gefixt, 36.000 verlorene Controls retten Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 36s Details CI / test-python-voice (push) Successful in 37s Details CI / test-bqas (push) Successful in 31s Details CI / Deploy (push) Failing after 2s Details Root Cause: _generate_control_id erzeugte ID-Kollisionen (String-Sort statt numeric), ON CONFLICT DO NOTHING verwarf Controls stillschweigend, Chunks wurden als "processed" markiert obwohl Store fehlschlug → permanent verloren. Fixes: 1. _generate_control_id: Numeric MAX statt String-Sort, Collision Guard mit UUID-Suffix Fallback, Exception wird geloggt statt verschluckt 2. _store_control: ON CONFLICT DO UPDATE statt DO NOTHING → ID immer returned 3. Store-Logik: Chunk wird bei store_failed NICHT mehr als processed markiert → Retry beim naechsten Lauf moeglich 4. Counter: controls_generated nur bei erfolgreichem Store inkrementiert Neue Counter: controls_stored + controls_store_failed 5. Anthropic API: HTTP 429/500/502/503/504 werden jetzt retried (2 Versuche) 6. Monitoring: Progress-Log zeigt Store-Rate (%), ALARM bei <80% 7. Post-Job Validierung: Vergleicht Generated vs Stored vs DB-Realitaet WARNUNG wenn store_failed > 0, KRITISCH wenn Rate < 90% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 00:39:12 +02:00
Benjamin Admin	756d068b4f	fix: skip_web_search Default auf True — 5x schnellere Pipeline Anchor-Search (DuckDuckGo + RAG via SDK) verlangsamt Pipeline von ~50 Chunks/min auf ~10 Chunks/min. Anchors (OWASP/NIST-Referenzen) koennen nachtraeglich in einem Batch-Job befuellt werden. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:26:01 +02:00
Benjamin Admin	f89ce46631	fix: Pipeline-Skalierung — 6 Optimierungen für 80k+ Controls 1. control_generator: GeneratorResult.status Default "completed" → "running" (Bug) 2. control_generator: Anthropic API mit Phase-Timeouts + Retry bei Disconnect 3. control_generator: regulation_exclude Filter + Harmonization via Qdrant statt In-Memory 4. decomposition_pass: Enrich Pass Batch-UPDATEs (400k → ~400 DB-Calls) 5. decomposition_pass: Merge Pass single Query statt N+1 6. batch_dedup_runner: Cross-Group Dedup parallelisiert (asyncio.gather) 7. canonical_control_routes: Framework Controls API Pagination (limit/offset) 8. DB-Indizes: idx_oc_parent_release, idx_oc_trigger_null, idx_cc_framework Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 14:09:32 +02:00
Benjamin Admin	441d5740bd	feat: Applicability Engine + API-Filter + DB-Sync + Cleanup Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 35s Details CI / test-python-voice (push) Successful in 33s Details CI / test-bqas (push) Successful in 37s Details CI / Deploy (push) Failing after 2s Details - Applicability Engine (deterministisch, kein LLM): filtert Controls nach Branche, Unternehmensgroesse, Scope-Signalen - API-Filter auf GET /controls, /controls-count, /controls-meta - POST /controls/applicable Endpoint fuer Company-Profile-Matching - 35 Unit-Tests fuer Engine - Port-8098-Konflikt mit Nginx gefixt (nur expose, kein Host-Port) - CLAUDE.md: control-pipeline Dokumentation ergaenzt - 6 internationale Gesetze geloescht (ES/FR/HU/NL/SE/CZ — nur DACH) - DB-Backup-Import-Script (import_backup.py) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:58:17 +02:00
Benjamin Admin	e3ab428b91	feat: control-pipeline Service aus Compliance-Repo migriert Control-Pipeline (Pass 0a/0b, BatchDedup, Generator) als eigenstaendiger Service in Core, damit Compliance-Repo unabhaengig refakturiert werden kann. Schreibt weiterhin ins compliance-Schema der shared PostgreSQL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 14:40:47 +02:00

27 Commits