breakpilot-core

Author	SHA1	Message	Date
Benjamin Admin	9437e029d0	feat(pipeline): F1 regulation registry — DB-backed license/source-type lookup Migrates REGULATION_LICENSE_MAP (135 entries) and SOURCE_REGULATION_CLASSIFICATION (58 entries) from hardcoded Python dicts to compliance.regulation_registry table. - SQL migration: 002_regulation_registry.sql (table + indexes + trigger) - Migration script: f1_migrate_regulation_registry.py (162 rows, --dry-run) - RegulationRegistry cache: 5min TTL, prefix fallback, graceful degradation - control_generator._classify_regulation() delegates to DB with dict fallback - source_type_classification.classify_source_regulation() delegates to DB - 34 new tests (lookup, cache, degradation, migration data consistency) - 421 total tests pass, 0 regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 23:14:06 +02:00
Benjamin Admin	4fd2bfefcd	docs: session handover updated for Block F start Next: F1 Regulation Registry (DB + API + Frontend + Auto-Create) Frontend at /sdk/regulation-registry in breakpilot-compliance admin Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 22:51:23 +02:00
Benjamin Admin	fac9280716	feat(pipeline): Block D5+-E complete session — 20k+ new chunks Session 02-03.05.2026 accomplishments: - D5+: NIST/ENISA PDF quality fix (0%→45% section rate) - D5+: 4 lost NIST PDFs restored (11k chunks) - D5+: Text normalization + section detection for NIST/BSI - D6: Citation backfill (3,651 controls updated, old archived) - E2: 8 DE laws ingested (ArbZG, MuSchG, GmbHG, AktG, InsO...) - E3: 5 EU regulations (CSRD, CSDDD, Taxonomy, eIDAS, Pay Trans.) - E4: Standards (GoBD, BAIT, VAIT) - E6: 3 CH + 4 AT laws (OR, DSV, ArG, ArbVG, AngG, AZG, NISG) - E7: 9 court judgments as full text (Schrems II 154 chunks, Meta 101, BVerfG 161, DSK OH 119, Planet49 42, SCHUFA 41, Schadenersatz 29, BAG 48, Google Fonts 14) - Infra: Qdrant snapshot mechanism, upload-before-delete safety Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 22:31:57 +02:00
Benjamin Admin	118be3540d	feat(pipeline): D6 citation backfill + E2/E3 law ingestion scripts - d6_citation_backfill.py: 3-tier matching (hash/prefix/overlap), archives old citations, updated 3.651 controls (93.6% coverage) - ingest_de_laws.py: 8 German laws ingested (ArbZG, MuSchG, NachwG, MiLoG, GmbHG, AktG, InsO, BUrlG — 1.629 chunks) - ingest_eu_regulations.py: EUR-Lex ingestion (needs manual HTML due to AWS WAF). CSRD, CSDDD, EU Taxonomy, eIDAS 2.0, Pay Transparency manually ingested (1.057 chunks) - Updated session handover with current state Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 13:19:27 +02:00
Benjamin Admin	a9671a572b	fix(embedding): single-number ALL-CAPS section detection for ENISA/BSI Add case-sensitive _SINGLE_NUM_ALLCAPS_RE for "1. INTRODUCTION" style headers (ENISA, BSI docs). Cannot use _LEGAL_SECTION_RE for this because it uses re.IGNORECASE which would false-positive on "1. Erstens" etc. Also re-downloaded 2 corrupt PDFs from nist.gov (nistir_8259a, nist_ai_rmf) — originals in MinIO were 263-byte XML error responses, not PDFs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 08:56:02 +02:00
Benjamin Admin	2f4a3f2ea2	fix(embedding): add NIST control IDs to _SECTION_NUMBER_RE _SECTION_NUMBER_RE only had patterns for §/Art/Section/Kapitel/Annex but missed NIST-style identifiers (AC-1, GV.OC-01, 3.1, A01:2021). This caused 0% section rate for all NIST/BSI/ENISA documents even though sections were correctly detected — the section NUMBER wasn't extracted from the header. Also adds: - reupload_legal_strategy.py: re-upload with legal chunking - extract_and_upload_nist.py: local PDF extraction workaround - qdrant-snapshot.sh: backup mechanism for Qdrant collections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 07:42:06 +02:00
Benjamin Admin	0b0eed27b0	feat(embedding): NIST PDF text normalization + safe re-ingest script Fix broken multi-column PDF extraction for NIST/BSI/ENISA documents: - _normalize_pdf_text(): fixes broken section numbers (1 . 1 → 1.1), control IDs (AC - 1 → AC-1), ligatures, soft hyphens - pdfplumber tolerances increased (x=3,y=4) for better column handling - 3 new regex patterns: NIST CSF 2.0, NIST enhancements, OWASP Top 10 - reingest_nist.py: safe upload-before-delete for 4 lost NIST PDFs - reingest_d5.py: safety fix — upload first, verify, then delete old Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 06:42:46 +02:00
Benjamin Admin	97a7f6f264	docs: comprehensive session handover with full roadmap (Blocks A-G) Complete instructions for next session including: - Current quality metrics per document type - Prioritized action items (NIST fix, citation backfill, missing laws) - Full Block E-G roadmap with details - All critical files, DB state, test commands - Known issues (3 lost NIST PDFs, frontend 500s, D5 script safety) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 22:30:50 +02:00
Benjamin Admin	ff21bc258a	docs: session handover — D2-D5 complete, quality report, NIST plan Major session achievements: - Structural metadata end-to-end (D2-D4) - 430 docs re-ingested with new chunking - HTML stripping + charset detection (0% → 97.6%) - 20 EU regulations from EUR-Lex HTML (DSGVO: 0% → 92%) - Quality report script (500 controls: 13% fully correct) - Frontend requirements.map fix Open: NIST/ENISA text normalization, citation backfill, D5 script safety (upload-before-delete), BEG IV ingestion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 22:22:55 +02:00
Benjamin Admin	3009f3d13a	feat(embedding): add NIST/ENISA/standard section numbering to chunker Extends _LEGAL_SECTION_RE to detect: - Numbered sections: 1.1 Title, 2.3.1 Subtitle - Control family IDs: AC-1, AU-2, PO.1, PW.1.1 - Table/Figure/Appendix references Also adds EUR-Lex HTML replacement script. 58 embedding-service tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 19:24:10 +02:00
Benjamin Admin	5a6e588641	docs: update session handover — D2-D5 complete, EU PDF issue documented Session achieved: structural metadata end-to-end (D2-D4), overlap bug fix, HTML stripping with charset detection, 430/436 docs re-ingested. Remaining: ~40 EU Official Journal PDFs need HTML from EUR-Lex (broken multi-column PDF extraction), 3 missing EDPB PDFs, 1 corrupt PDF. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 17:34:34 +02:00
Benjamin Admin	41183ff93d	fix(docker): set PDF_EXTRACTION_BACKEND to auto (was pymupdf) The default was 'pymupdf' which doesn't exist as a backend, causing fallthrough to pypdf every time. With 'auto', the priority is: unstructured > pdfplumber > pypdf. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 17:30:33 +02:00
Benjamin Admin	75dda9ac92	feat(embedding): add pdfplumber backend for multi-column PDF extraction EU Official Journal PDFs (AI Act, CRA, NIS2, DSGVO, etc.) use multi-column layouts that pypdf breaks into fragmented words ("Ar tik el" instead of "Artikel"). pdfplumber handles these correctly. Backend priority: unstructured > pdfplumber > pypdf (auto mode). Also increases D5 re-ingestion timeout to 3600s for large PDFs. 58 embedding-service tests passing. pdfplumber: MIT license. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 15:42:25 +02:00
Benjamin Admin	a459636bc4	fix(rag): HTML charset detection + opening block tag newlines Two bugs fixed: 1. Opening block tags (<h3>, <div>) now also create newlines, not just closing tags. Fixes: gesetze-im-internet.de puts § inside <h3> which followed inline <a> text — § ended up mid-line, not at line start. 2. HTML charset detection from meta tag (charset=iso-8859-1). Files from gesetze-im-internet.de use ISO-8859-1, not UTF-8. The § byte (0xA7) was destroyed by UTF-8 decode. Now: try UTF-8 → check meta charset → fallback ISO-8859-1. 32 rag-service tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 08:35:47 +02:00
Benjamin Admin	ddad58f607	fix(rag): strip HTML tags before chunking + D5 re-ingestion scripts HTML files from gesetze-im-internet.de were decoded as raw UTF-8, keeping <div>/<p> tags intact. The legal chunker regex requires § at line start, which never matched inside HTML tags → 0% section metadata for HTML docs. Fix: detect HTML content and strip tags before sending to embedding service. Block elements become newlines, entities are decoded. § signs now appear at line starts → section detection works. Also adds D5 re-ingestion scripts (reingest_d5.py + config) for batch re-processing of all documents in Qdrant collections. 27 rag-service tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 08:18:25 +02:00
Benjamin Admin	93099b2770	feat(pipeline): structural metadata end-to-end (Blocks D2-D4) D2: RAG service stores section/section_title/paragraph/paragraph_num/page from embedding service chunks_with_metadata into Qdrant payloads. D3: Control generator prefers section > article > section_title from Qdrant, adds page to source_citation and generation_metadata. D4: Validated with real BGB §§ 312-312k text. Found and fixed critical bug where Phase 3 overlap destroyed the [§ ...] section prefix, causing only the first chunk per document to have metadata. All subsequent chunks lost section info. Also fixes pre-existing lint issues (unused imports, ambiguous variable names, duplicate dict key, bare except). 456 tests passing (58 embedding + 387 pipeline + 11 rag-service). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 20:34:00 +02:00
Benjamin Admin	da21339e76	docs: add session handover instructions for next session Covers: completed blocks A-D1, remaining D2-G, critical files, DB state, memory files, test commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 15:33:05 +02:00
Benjamin Admin	6ab10415d8	feat(embedding): add structural metadata to legal chunking (Block D1) chunk_text_legal_structured() returns metadata per chunk: - section: "§ 312k", "Art. 5" - section_title: "Kündigungsbutton" - paragraph: "Abs. 1", "Nr. 3" - paragraph_num: 1, 3 - page: (prepared for PDF integration) - index: sequential position /chunk endpoint now returns chunks_with_metadata alongside plain chunks. Backward compatible — existing consumers use chunks field unchanged. New regex: _PARAGRAPH_RE (Abs/Nr/Satz/lit), _SECTION_NUMBER_RE New functions: _parse_section_metadata(), _extract_paragraph_ref() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 15:25:23 +02:00
Benjamin Admin	d9c16fb914	feat(pipeline): add adversarial tests (30 cases) + regression harness Block C implementation: - adversarial_cases.yaml: 30 tricky cases in 5 categories (wrong legal basis, dark patterns, incomplete docs, similar-but-different, homonyms) - test_adversarial.py: 63 tests validating adversarial cases - test_regression.py: ontology stability, dependency engine, quality metrics - conftest.py: shared fixtures (DB session, sample controls) Total: 371 tests passing (221 existing + 150 new). Real-world benchmarks (C1) need manual ground truth creation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 13:02:29 +02:00
Benjamin Admin	6f58fdbaa5	docs: add test strategy instruction for dedicated session (Block C) 3 test levels: Real-World Benchmarks (10 DE websites), Adversarial Tests (30 tricky cases), Regression Harness (CI/CD quality gate). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 12:28:58 +02:00
Benjamin Admin	b8ff4e9290	feat(pipeline): add review-verify endpoint — LLM decides DUPLIKAT/VERSCHIEDEN Sends 67k review candidates to Haiku Batch API in pairs. Each pair gets a DUPLIKAT/VERSCHIEDEN decision with reasoning. Results stored in control_dedup_reviews.review_status. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 09:36:30 +02:00
Benjamin Admin	f2104768a0	fix(docker): re-enable healthcheck after dedup completion Dedup is done (162k controls). Re-enable healthcheck with generous timeouts (10 retries × 30s) and restart: unless-stopped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 08:39:57 +02:00
Benjamin Admin	e8df15c0f8	fix: add proxy_read_timeout 300s to admin-compliance location block Scan endpoint needs up to 3-5 min (multi-page crawl + LLM calls). Without explicit timeout, nginx defaults to 60s → 504 Gateway Timeout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 11:23:02 +02:00
Benjamin Admin	7c5592b50e	feat(pipeline): add checkpoint to dedup Phase 2 — survives container restart Stores last_control_id in canonical_generation_jobs after each page. On restart, resumes from checkpoint instead of starting over. Checkpoint is deleted on completion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 09:12:23 +02:00
Benjamin Admin	e8f018f2c6	fix: increase client_max_body_size to 50M for ports 3007 + 8093 CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 56s Details CI / test-python-voice (push) Successful in 38s Details CI / test-bqas (push) Successful in 31s Details Port 3007 (admin-compliance) had no limit (nginx default 1M) causing 413 on SDK state saves. Port 8093 (SDK) had 10M, now 50M. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 08:54:06 +02:00
Benjamin Admin	b151951448	fix(pipeline): make dedup Phase 2 resilient — paginated, timeout, per-control error handling - Paginated DB queries (100 rows/page) instead of loading all 166k rows - Individual timeout (30s) per embedding + qdrant call - Per-control try/except — one failure doesn't kill the job - Sequential processing (no asyncio.gather) for stability - Progress logging every 500 controls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 15:31:28 +02:00
Benjamin Admin	2e2e81b3e1	fix(docker): disable healthcheck + auto-restart for control-pipeline during dedup The dedup job blocks the event loop for extended periods, causing health checks to fail repeatedly. Even 10 retries × 30s wasn't enough. Disabled healthcheck and restart policy until dedup is complete. TEMPORARY — re-enable after dedup is finished. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 14:39:19 +02:00
Benjamin Admin	b873c0e4ae	fix(docker): increase control-pipeline healthcheck tolerance for long-running jobs Dedup Phase 2 blocks the event loop for extended periods, causing health checks to fail. Docker then restarts the container and kills the job. Increased retries from 3 to 10, timeout from 10s to 30s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 12:35:39 +02:00
Benjamin Admin	9dc16674e2	perf(pipeline): skip singleton groups in dedup Phase 1 153k of 160k merge groups have only 1 control — no intra-group dedup possible. Skip them in Phase 1, they become masters automatically. Phase 2 (cross-group) still checks them via Qdrant embeddings. Reduces Phase 1 from ~96h to ~2h. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 00:31:22 +02:00
Benjamin Admin	e6e2688b56	fix(pipeline): add idempotency guard to submit-pass0b endpoint Prevents duplicate batch submissions that caused ~$170 in extra costs. Refuses new submit if a batch was submitted in the last 10 minutes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-27 18:59:03 +02:00
Benjamin Admin	28aa74b4b0	Merge remote-tracking branch 'gitea/main' Build pitch-deck / build-push-deploy (push) Failing after 1m13s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 49s Details CI / test-python-voice (push) Successful in 38s Details CI / test-bqas (push) Successful in 31s Details # Conflicts: # pitch-deck/components/slides/MilestonesSlide.tsx # pitch-deck/lib/finanzplan/engine.ts	2026-04-27 13:14:54 +02:00
Benjamin Admin	8e37441782	perf(pipeline): switch back to v4 prompt — backfill costs nearly the same v3+backfill=$31.60/10k vs v4=$33/10k — not worth the extra complexity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-27 00:44:23 +02:00
Benjamin Admin	6a0e7c947f	perf(pipeline): switch to v3 prompt for generation, v4 fields via Haiku backfill Remove applicability/scanner_hint/evidence_type/provides_context from Pass 0b prompt to reduce output tokens (~40% less). These 6 fields are added via cheap Haiku backfill afterwards (~$1.50 per 10k controls). Saves ~$200 over the remaining 160k obligations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-27 00:14:47 +02:00
Benjamin Admin	3c1a2d9c41	Remove re-export shim from keycloak_auth.py, update consumer imports - rbac_api.py: import get_current_user from auth.dependencies directly - keycloak_auth.py: remove re-export of dependencies module symbols - pdf_service.py, file_processor.py: remove misleading compat comments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-27 00:13:30 +02:00
Benjamin Admin	92c86ec6ba	[split-required] [guardrail-change] Enforce 500 LOC budget across all services Install LOC guardrails (check-loc.sh, architecture.md, pre-commit hook) and split all 44 files exceeding 500 LOC into domain-focused modules: - consent-service (Go): models, handlers, services, database splits - backend-core (Python): security_api, rbac_api, pdf_service, auth splits - admin-core (TypeScript): 5 page.tsx + sidebar extractions - pitch-deck (TypeScript): 6 slides, 3 UI components, engine.ts splits - voice-service (Python): enhanced_task_orchestrator split Result: 0 violations, 36 exempted (pipeline, tests, pure-data files). Go build verified clean. No behavior changes — pure structural splits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-27 00:09:30 +02:00
Benjamin Admin	5ef039a6bc	feat(pipeline): Pass 0b prompt v4 + Haiku backfill endpoint Prompt v4 adds 6 new fields to Pass 0b output: - applicability: condition rules (same format as dependency engine) - check_type: expanded to 10 granular types - scanner_hint: search_terms + negative_indicators for MCP - manual_review_required_if: escalation conditions - evidence_type: code/process/hybrid - provides_context: context variables this control creates New endpoint POST /generate/backfill-extended: - Backfills existing 9k controls via Haiku Batch API (~$1.50) - Adds all 6 new fields to generation_metadata - Supports dry_run mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 23:14:59 +02:00
Benjamin Admin	96b8f25747	fix(pipeline): use action_type-derived phase order in ontology generator LLM merge_key phases (e.g. "submission") don't always match PHASE_ORDER keys. Derive phase order from action_type via get_phase_order() instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 20:32:58 +02:00
Benjamin Admin	42ab5ead26	feat(pipeline): implement Control Dependency Engine (Block 9) Core engine (dependency_engine.py): - 5 dependency types: prerequisite, supersedes, compensating_control, conditional_requirement, scope_exclusion - Generic condition evaluator (JSONB rules with AND/OR/NOT/field ops) - Priority-based conflict resolution - Cycle detection (DFS) + topological sort - Full evaluation with MCP-compatible dependency_resolution trace - 39 tests all passing (incl. GHV scenario from user requirements) Automatic generator (dependency_generator.py): - Ontology-based: same normalized_object + phase sequence -> prerequisite - Pattern-based: define->implement, implement->monitor, etc. - Domain packs: YAML rules for GDPR, AI Act, CRA, Security, Labor Contracts - 14 tests all passing API routes (dependency_routes.py): - CRUD for dependencies - POST /evaluate with dependency resolution - POST /generate (auto-generation with dry_run) - POST /validate (cycle detection) - GET /graph (nodes + edges for visualization) Prompt enhancement (decomposition_pass.py): - Added dependency_hints + lifecycle_phase_order to Pass 0b prompt - Stored in generation_metadata for post-processing DB migration: control_dependencies + control_evaluation_results tables 126 tests total, all passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 20:28:10 +02:00
Benjamin Admin	5aaa62dca7	fix(pipeline): improve quality metrics heuristics - Fix truncated title detection: only flag near-200-char titles or mid-word cutoffs - Fix evidence leak detection: check title start patterns, not keyword substring ("nachweisen" verb is valid action, "Nachweis vorliegen" is evidence) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:53:52 +02:00
Benjamin Admin	d583971afd	feat(pipeline): add quality metrics endpoint for Pass 0b controls GET /generate/quality-metrics — reports: - controls_per_obligation ratio - duplicate merge_key rate - evidence leak rate - truncated title rate - MCP field coverage - merge_key coverage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:51:27 +02:00
Benjamin Admin	d660a45bb5	feat(pipeline): implement golden test suite + fix ontology patterns - Add test_golden_controls.py: 37 tests covering all 8 YAML categories (container, framework, evidence, negative, title, split, scope, merge_key) - Fix evidence detection: handle German feminine articles (eine/einer/etc.) - Fix framework detection: use verb stems for conjugated German verbs - Add framework patterns: OWASP API6, CCM without CSA prefix, generic category - Fix negative patterns: use "nicht übertragen/gespeichert/erscheinen" before generic "dürfen nicht" to correctly route prevent vs exclude All 73 tests passing (36 ontology + 37 golden). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:48:12 +02:00
Benjamin Admin	d1f3b9ffcd	feat(pipeline): add submit-pass0b endpoint for batch submission Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:42:06 +02:00
Benjamin Admin	d93321275c	feat(pipeline): add batch API status + result processing endpoints - GET /generate/batch-api-status/{batch_id} — check Anthropic batch status - POST /generate/process-batch — process completed batch results (background) - GET /generate/process-batch-status/{job_id} — poll processing progress Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:36:47 +02:00
Benjamin Admin	629b9d9ca5	feat(pipeline): store MCP fields (assertion, pass/fail criteria, check_type) in generation_metadata - Add assertion, pass_criteria, fail_criteria, check_type to AtomicControlCandidate dataclass - Parse MCP fields from LLM output in _process_pass0b_control - Store MCP fields in generation_metadata JSON for later use by MCP scanner - Fields default to empty when not present (backward-compatible with old prompts) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:32:56 +02:00
Benjamin Admin	7e3b1108e2	feat: integrate Ontology pre-LLM filter into Pass 0b submit Obligations classified before API call: - evidence → skipped (saves API cost) - composite → skipped (not atomic) - framework_container → skipped (decompose separately) - atomic → sent to LLM Filter stats returned in submit response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:13:32 +02:00
Benjamin Admin	b3fbbbacfe	feat(control-pipeline): Control Ontology v1 — action types, evidence/container/framework detection Block 7.1-7.2 from masterplan: - 26 action_types with German aliases + phase mapping - Negative obligation patterns (exclude, prevent, enforce) - Container detection (11 composite objects that must not become atomic) - Evidence detection (14 indicators + "X dokumentieren" pattern) - Framework reference detection (OWASP, NIST, BSI, CSA, ISO patterns) - classify_obligation() routes to: atomic, composite, evidence, framework_container - build_canonical_key() for deterministic dedup - 36 tests covering all classification functions Also: merge_key bug fix in _process_pass0b_control() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 09:06:39 +02:00
Benjamin Admin	3a100fa1f1	feat: Pass 0b prompt v3 — compound action ban, evidence-of-action rule, pflicht-vs-prozess merge Fixes from v2 evaluation (7.9/10 avg, 28 controls): 1. COMPOUND BAN: "durchführen UND Maßnahmen ergreifen" → pick primary action only 2. EVIDENCE-OF-ACTION: "Tests dokumentieren" → evidence field, not own control 3. PFLICHT=PROZESS: "Behörden informieren" + "Verfahren etablieren" = 1 control 4. MERGE-KEY BUG: merge_key from LLM output now stored in generation_metadata Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 00:25:38 +02:00
Benjamin Admin	fbeb93046d	docs: Pass 0b v2 evaluation — 28 controls, 7.9/10 avg, 3 findings for v3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 00:19:06 +02:00
Benjamin Admin	0cce8a2011	feat: add Golden Test Suite v1 (40 regression tests for Pass 0b pipeline) 8 categories: duplicate explosion, compound split, negative obligations, container detection, framework decomposition, evidence leakage, scope dimension, title quality. Includes global quality gates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 00:05:08 +02:00
Benjamin Admin	7a53f5bee1	feat: Pass 0b prompt v2 — container detection, merge-key, evidence separation, actionable titles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 00:00:59 +02:00

1 2 3 4 5 ...

600 Commits