Commit Graph

30 Commits

Author SHA1 Message Date
Benjamin Admin d5bcd0bd5b feat(pipeline): G4 Pre-Deployment Enforcement — CI/CD compliance gate
New table: deployment_checks (verdict, blocking/warning controls, risk score)
New API:
  POST /v1/deployment-checks (SDK asks: "can I deploy?")
  GET /v1/deployment-checks/{id} (check result)
  POST /v1/deployment-checks/{id}/override (manual override with justification)
  GET /v1/deployment-checks/stats (approval/block rate)

Check logic: queries G1 decision_traces + G3 open failures per affected control.
Verdict: approved (0 blocking) or blocked (with fix recommendations).
454 tests pass, 0 regressions.

Block G complete: G1-G4 all implemented.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 20:24:45 +02:00
Benjamin Admin c398e74d5e feat(pipeline): G3 Full Decision Memory — compliance lifecycle event stream
New table: decision_events (assessment→decision→fix→verification→failure cycle)
New API:
  POST /v1/decision-events (record lifecycle event)
  GET /v1/decision-events (list with filters)
  GET /v1/decision-events/timeline/{control_id} (full chronological timeline)
  GET /v1/decision-events/stats (failure rate, cycle times)

Each event captures input_state, output_state, actor, evidence.
454 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 20:16:25 +02:00
Benjamin Admin e82f99b8cb feat(pipeline): G2 Compliance Commit Ledger — code↔control audit trail
New table: compliance_commits (commit hash, affected controls, risk level)
New API:
  POST /v1/compliance-commits (SDK registers commit + impact)
  GET /v1/compliance-commits (list with filters)
  GET /v1/compliance-commits/by-control/{id} (all commits for a control)
  GET /v1/compliance-commits/stats (dashboard)
  GET /v1/compliance-commits/{id} (detail)

GIN index on affected_control_ids for fast @> containment queries.
454 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 19:17:45 +02:00
Benjamin Admin 66a70ab31c feat(pipeline): G1 Decision Trace — compliance decision tracking
New table: decision_traces (status, reason, evidence, fix plan per control)
New API:
  POST/GET/PUT /v1/decision-traces (CRUD for decisions)
  GET /v1/decision-traces/stats (compliance dashboard)
  GET /v1/controls/{id}/full-trace (Regulation→Obligation→Control→Decision→Evidence)

454 tests pass, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 18:26:21 +02:00
Benjamin Admin ad24835940 feat(pipeline): G-pre1/2/3 — Object Clustering + Master Controls + API
G-pre1: 144k objects clustered into 7,466 groups via Mini-Batch K-Means
  on bge-m3 embeddings. Two-stage: k=5000 base + sub-cluster groups >50.
G-pre2: 5,114 Master Controls from lifecycle phase chains
  (define→implement→test→monitor), linking 172,504 atomic controls.
G-pre3: REST API for Master Controls
  GET /v1/master-controls (list, search, filter)
  GET /v1/master-controls/stats
  GET /v1/master-controls/{mc_id} (detail with phase-controls)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 15:11:38 +02:00
Benjamin Admin 64f45be63a feat(pipeline): add Pass 0a endpoint to core control-pipeline
Registers /generate/run-pass0a and /generate/pass0a-status/{job_id}
on the core control-pipeline (port 8098). Previously Pass 0a was only
available on the compliance backend which connects to Production DB,
causing a split-brain when controls are generated locally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 07:21:58 +02:00
Benjamin Admin b8ff4e9290 feat(pipeline): add review-verify endpoint — LLM decides DUPLIKAT/VERSCHIEDEN
Sends 67k review candidates to Haiku Batch API in pairs.
Each pair gets a DUPLIKAT/VERSCHIEDEN decision with reasoning.
Results stored in control_dedup_reviews.review_status.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 09:36:30 +02:00
Benjamin Admin e6e2688b56 fix(pipeline): add idempotency guard to submit-pass0b endpoint
Prevents duplicate batch submissions that caused ~$170 in extra costs.
Refuses new submit if a batch was submitted in the last 10 minutes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 18:59:03 +02:00
Benjamin Admin 5ef039a6bc feat(pipeline): Pass 0b prompt v4 + Haiku backfill endpoint
Prompt v4 adds 6 new fields to Pass 0b output:
- applicability: condition rules (same format as dependency engine)
- check_type: expanded to 10 granular types
- scanner_hint: search_terms + negative_indicators for MCP
- manual_review_required_if: escalation conditions
- evidence_type: code/process/hybrid
- provides_context: context variables this control creates

New endpoint POST /generate/backfill-extended:
- Backfills existing 9k controls via Haiku Batch API (~$1.50)
- Adds all 6 new fields to generation_metadata
- Supports dry_run mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 23:14:59 +02:00
Benjamin Admin 42ab5ead26 feat(pipeline): implement Control Dependency Engine (Block 9)
Core engine (dependency_engine.py):
- 5 dependency types: prerequisite, supersedes, compensating_control,
  conditional_requirement, scope_exclusion
- Generic condition evaluator (JSONB rules with AND/OR/NOT/field ops)
- Priority-based conflict resolution
- Cycle detection (DFS) + topological sort
- Full evaluation with MCP-compatible dependency_resolution trace
- 39 tests all passing (incl. GHV scenario from user requirements)

Automatic generator (dependency_generator.py):
- Ontology-based: same normalized_object + phase sequence -> prerequisite
- Pattern-based: define->implement, implement->monitor, etc.
- Domain packs: YAML rules for GDPR, AI Act, CRA, Security, Labor Contracts
- 14 tests all passing

API routes (dependency_routes.py):
- CRUD for dependencies
- POST /evaluate with dependency resolution
- POST /generate (auto-generation with dry_run)
- POST /validate (cycle detection)
- GET /graph (nodes + edges for visualization)

Prompt enhancement (decomposition_pass.py):
- Added dependency_hints + lifecycle_phase_order to Pass 0b prompt
- Stored in generation_metadata for post-processing

DB migration: control_dependencies + control_evaluation_results tables

126 tests total, all passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 20:28:10 +02:00
Benjamin Admin 5aaa62dca7 fix(pipeline): improve quality metrics heuristics
- Fix truncated title detection: only flag near-200-char titles or mid-word cutoffs
- Fix evidence leak detection: check title start patterns, not keyword substring
  ("nachweisen" verb is valid action, "Nachweis vorliegen" is evidence)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 09:53:52 +02:00
Benjamin Admin d583971afd feat(pipeline): add quality metrics endpoint for Pass 0b controls
GET /generate/quality-metrics — reports:
- controls_per_obligation ratio
- duplicate merge_key rate
- evidence leak rate
- truncated title rate
- MCP field coverage
- merge_key coverage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 09:51:27 +02:00
Benjamin Admin d1f3b9ffcd feat(pipeline): add submit-pass0b endpoint for batch submission
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 09:42:06 +02:00
Benjamin Admin d93321275c feat(pipeline): add batch API status + result processing endpoints
- GET /generate/batch-api-status/{batch_id} — check Anthropic batch status
- POST /generate/process-batch — process completed batch results (background)
- GET /generate/process-batch-status/{job_id} — poll processing progress

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 09:36:47 +02:00
Benjamin Admin 1a3101066e fix: paginated indexing to avoid OOM on 53k controls
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 16:31:20 +02:00
Benjamin Admin 043bcb65d8 fix(control-pipeline): harmonization recheck indexes ALL drafts, not just atomics
Previous version searched against atomic_controls_dedup collection which
only contains Pass 0b atomic controls. Now creates a temporary collection
with ALL draft controls as reference, then checks targets against it.

Two phases:
1. Index ~53k reference drafts into temp Qdrant collection (batch 32)
2. Search each of 14k target controls, Embedding + LLM for borderline
3. Cleanup temp collection when done

Status updates every 50 controls (fixed counter bug).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 15:42:40 +02:00
Benjamin Admin d31fccbe0e feat(control-pipeline): add harmonization recheck endpoint
POST /generate/harmonization-recheck verifies promoted controls
against Qdrant dedup collection via Embedding + LLM. Runs as stable
asyncio background task inside the container (no docker exec issues).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 13:25:56 +02:00
Benjamin Admin 3ffa3f5793 feat(control-pipeline): add Document Compliance Engine — scope detection + document requirements
New service: document_scope_resolver.py with 28 document rules covering:
- Base (impressum, privacy_policy)
- Tracking (cookie_banner, cookie_policy)
- E-Commerce (AGB, withdrawal, shipping, pricing, payment)
- Digital (digital_content_terms, no_withdrawal_notice)
- SaaS (ToS, service_description, DPA, SLA)
- AI (transparency_notice, automated_decisions)
- Hardware (warranty, return, CE, safety)
- Environmental (WEEE, battery disposal)
- Marketplace (seller terms, ranking transparency)
- Subscription (cancellation terms)

API: POST /v1/document-compliance/required
Input: company flags + jurisdiction → Output: required documents + assessment

Includes confidence scoring, escalation detection (e.g. ecommerce
without distance_selling flag), and reasoning. 19 tests covering all
business model combinations including B2B-only exclusions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 08:39:55 +02:00
Benjamin Admin bed41dcbdf feat(control-pipeline): add applicability backfill endpoint (Phase 5/C3)
POST /v1/canonical/generate/backfill-applicability enriches controls
with applicable_industries, applicable_company_size, scope_conditions
via Anthropic API. Targets ~26k controls from pipeline version < 3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 16:25:50 +02:00
Benjamin Admin 736ddf647d fix(llm-dedup): use think:false instead of /no_think, restore 30s timeout
Ollama API supports "think": false to disable extended thinking mode
on qwen3.5. Reduces response time from 95s to ~3s per comparison.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 20:31:13 +02:00
Benjamin Admin 2188d6645e fix(llm-dedup): increase timeout to 120s, add /no_think, limit output to 200 tokens
qwen3.5 uses extended thinking by default which causes 95s+ responses
and 30s timeouts. Add /no_think to system prompt and num_predict=200
to keep responses short.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 20:27:58 +02:00
Benjamin Admin 076a6cd567 feat(control-pipeline): add LLM dedup endpoint for borderline review queue
POST /v1/canonical/generate/llm-dedup uses local Ollama (qwen3.5:35b-a3b)
to verify borderline duplicate matches (score 0.85-0.91). More accurate
than embedding similarity for compliance controls with subtle scope
differences (e.g. "documented" vs "implemented").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 20:15:46 +02:00
Benjamin Admin 642a8587b5 feat(control-pipeline): add batch-dedup endpoint + source_citation JSONB migration
- Add POST /v1/canonical/generate/batch-dedup endpoint for Pass 0b
  atomic controls deduplication (Phase 1: intra-group, Phase 2: cross-group)
- source_citation column migrated from TEXT to JSONB (5,459 rows converted)
- migrate_jsonb.py script added for generation_metadata conversion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 08:44:31 +02:00
Benjamin Admin 8b7671d310 feat(control-pipeline): add repair backfill endpoint for missing title/objective/requirements
POST /v1/canonical/generate/backfill-repair uses Anthropic API to
generate missing fields from available context (source text, other
fields). Handles 1,470 controls with incomplete data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 07:06:19 +02:00
Benjamin Admin b29dc33708 fix(control-pipeline): anchor finder uses direct Qdrant search instead of Go SDK
The Go SDK RAG proxy returns 401 (Qdrant API key mismatch). Switch
AnchorFinder to use direct Qdrant vector search + embedding service,
same approach as the main pipeline. No dependency on Go SDK anymore.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 18:13:12 +02:00
Benjamin Admin 91f4202e88 feat(control-pipeline): add anchor backfill endpoint + normalize target_audience
- Add POST /v1/canonical/generate/backfill-anchors endpoint for batch
  populating open_anchors on controls generated with skip_web_search=true
- Uses AnchorFinder Stage A (RAG search) to find OWASP/NIST/ENISA refs
- Background job with progress tracking (same pattern as other backfills)
- Promotes needs_review controls that gain anchors to draft state
- Target audience normalization (enterprise/authority/provider → JSON arrays)
  already applied via SQL

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 18:04:50 +02:00
Benjamin Admin 756d068b4f fix: skip_web_search Default auf True — 5x schnellere Pipeline
Anchor-Search (DuckDuckGo + RAG via SDK) verlangsamt Pipeline von
~50 Chunks/min auf ~10 Chunks/min. Anchors (OWASP/NIST-Referenzen)
koennen nachtraeglich in einem Batch-Job befuellt werden.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:26:01 +02:00
Benjamin Admin f89ce46631 fix: Pipeline-Skalierung — 6 Optimierungen für 80k+ Controls
1. control_generator: GeneratorResult.status Default "completed" → "running" (Bug)
2. control_generator: Anthropic API mit Phase-Timeouts + Retry bei Disconnect
3. control_generator: regulation_exclude Filter + Harmonization via Qdrant statt In-Memory
4. decomposition_pass: Enrich Pass Batch-UPDATEs (400k → ~400 DB-Calls)
5. decomposition_pass: Merge Pass single Query statt N+1
6. batch_dedup_runner: Cross-Group Dedup parallelisiert (asyncio.gather)
7. canonical_control_routes: Framework Controls API Pagination (limit/offset)
8. DB-Indizes: idx_oc_parent_release, idx_oc_trigger_null, idx_cc_framework

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:09:32 +02:00
Benjamin Admin 441d5740bd feat: Applicability Engine + API-Filter + DB-Sync + Cleanup
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 37s
CI / Deploy (push) Failing after 2s
- Applicability Engine (deterministisch, kein LLM): filtert Controls
  nach Branche, Unternehmensgroesse, Scope-Signalen
- API-Filter auf GET /controls, /controls-count, /controls-meta
- POST /controls/applicable Endpoint fuer Company-Profile-Matching
- 35 Unit-Tests fuer Engine
- Port-8098-Konflikt mit Nginx gefixt (nur expose, kein Host-Port)
- CLAUDE.md: control-pipeline Dokumentation ergaenzt
- 6 internationale Gesetze geloescht (ES/FR/HU/NL/SE/CZ — nur DACH)
- DB-Backup-Import-Script (import_backup.py)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 21:58:17 +02:00
Benjamin Admin e3ab428b91 feat: control-pipeline Service aus Compliance-Repo migriert
Control-Pipeline (Pass 0a/0b, BatchDedup, Generator) als eigenstaendiger
Service in Core, damit Compliance-Repo unabhaengig refakturiert werden kann.
Schreibt weiterhin ins compliance-Schema der shared PostgreSQL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:40:47 +02:00