Commit Graph

31 Commits

Author SHA1 Message Date
Benjamin Admin
043bcb65d8 fix(control-pipeline): harmonization recheck indexes ALL drafts, not just atomics
Previous version searched against atomic_controls_dedup collection which
only contains Pass 0b atomic controls. Now creates a temporary collection
with ALL draft controls as reference, then checks targets against it.

Two phases:
1. Index ~53k reference drafts into temp Qdrant collection (batch 32)
2. Search each of 14k target controls, Embedding + LLM for borderline
3. Cleanup temp collection when done

Status updates every 50 controls (fixed counter bug).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 15:42:40 +02:00
Benjamin Admin
d31fccbe0e feat(control-pipeline): add harmonization recheck endpoint
POST /generate/harmonization-recheck verifies promoted controls
against Qdrant dedup collection via Embedding + LLM. Runs as stable
asyncio background task inside the container (no docker exec issues).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 13:25:56 +02:00
Benjamin Admin
3ffa3f5793 feat(control-pipeline): add Document Compliance Engine — scope detection + document requirements
New service: document_scope_resolver.py with 28 document rules covering:
- Base (impressum, privacy_policy)
- Tracking (cookie_banner, cookie_policy)
- E-Commerce (AGB, withdrawal, shipping, pricing, payment)
- Digital (digital_content_terms, no_withdrawal_notice)
- SaaS (ToS, service_description, DPA, SLA)
- AI (transparency_notice, automated_decisions)
- Hardware (warranty, return, CE, safety)
- Environmental (WEEE, battery disposal)
- Marketplace (seller terms, ranking transparency)
- Subscription (cancellation terms)

API: POST /v1/document-compliance/required
Input: company flags + jurisdiction → Output: required documents + assessment

Includes confidence scoring, escalation detection (e.g. ecommerce
without distance_selling flag), and reasoning. 19 tests covering all
business model combinations including B2B-only exclusions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 08:39:55 +02:00
Benjamin Admin
f1359d63ba fix: handle new numeric batch custom_id format in Pass 0a result processing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 07:21:50 +02:00
Benjamin Admin
bbfcd44407 fix: use numeric batch index as custom_id (64 char limit, alphanumeric only)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 00:39:13 +02:00
Benjamin Admin
5da5a5597b fix: increase Batch API upload timeout to 600s for large payloads
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 00:31:50 +02:00
Benjamin Admin
38684dd903 feat(control-pipeline): add Assessment Layer to Applicability Engine
Adds confidence scoring, escalation detection, and reasoning to the
deterministic filter. All assessment is deterministic (no LLM).

Confidence scoring (0.0-1.0):
- +0.25 industry specified
- +0.15 company size specified
- +0.20-0.30 scope signals provided
- +0.15 controls found
- +0.15 no contradictions
- Capped at 0.75 for escalation cases

Escalation triggers:
- Contradictory signals (holds_client_funds without operates_payment_service)
- Ambiguous signals (provides_embedded_connectivity)
- Financial signals without explicit payment service declaration
- Incomplete profile (no industry, size, or signals)

Reasoning: template-based, includes active signals, control count,
scope-condition descriptions, and warnings.

Response now includes "assessment" field with confidence, escalation_flag,
escalation_reason, inferred_signals, reasoning, and warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 20:36:11 +02:00
Benjamin Admin
716bc651c4 fix(control-pipeline): remove fictional demo packages, add real DB integration tests
Deleted 3 packages that were copied without validation:
- applicability_demo/ (fictional control IDs, wrong API schema)
- applicability_demo_sdk/ (wrong endpoint URL, fictional request format)
- applicability_demo_ci/ (GitHub Actions instead of Gitea, duplicated code)

Replaced with real integration in test_applicability_use_cases.py:
- TestApplicabilityIntegration calls real get_applicable_controls()
- Checks source_citation->source and control_id domain prefixes
- Runs against actual DB when DATABASE_URL is set
- 128 structure/acceptance tests pass, 24 integration tests skip without DB

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 19:59:56 +02:00
Benjamin Admin
27f12e4659 feat(control-pipeline): add CI regression suite for applicability tests
Makefile + pytest + GitHub Actions workflow for automated regression:
- make install / make eval / make test
- pytest integration with demo_cases.yaml
- Golden outputs for 6 priority cases
- Report generation (JSON + Markdown)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 19:12:44 +02:00
Benjamin Admin
a7c6ffe4dd feat(control-pipeline): add SDK endpoint demo package for applicability tests
Request payloads + response contract + api_runner.py for 6 priority cases.
Can be run directly against /v1/applicability/evaluate endpoint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 19:11:44 +02:00
Benjamin Admin
ae5c5c24eb feat(control-pipeline): add applicability demo test package with evaluator
6 priority demo cases with golden outputs, evaluator.py and run_demo.py:
- CASE-001: Webshop+Stripe (anti-PSD2 false positive)
- CASE-002: Bank+TAN-Generator (scope override for batteries)
- CASE-004: FinTech Wallet (true positive PSD2/AML)
- CASE-006: SaaS+SMS Gateway (anti-TKG false positive)
- CASE-008: Software→IoT Hardware (multi-regime scope)
- CASE-011: Embedded Finance (escalation case)

Self-test passes 6/6 against golden outputs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 19:08:31 +02:00
Benjamin Admin
e8ec50e0fc feat(control-pipeline): 24 demo test cases for applicability engine
YAML-based test package with 4 categories (6 each):
- Standard sector cases (Telko, SaaS, Energie, Automotive, Health, Law)
- Scope-beats-sector (Bank+Battery, KI-Recruiting, White-Label, Payments)
- False friends (Stripe!=PSD2, Hotline!=TKG, Repo-signals!=regulation)
- Escalation (IoT-SIM, FinTech unclear, Treuhand, KI-Diagnose)

Enforces 5 acceptance rules: no false certainty, scope>sector,
repo signals insufficient, standard first, 40%+ negative tests.

Scoring framework: must_include + must_not_include + reasoning + escalation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 17:42:38 +02:00
Benjamin Admin
1f8667c7da feat(control-pipeline): replace similarity-only dedup with LLM-verified dedup in pipeline
Stage 4 (Harmonization) now uses two-tier approach:
- Score >= 0.92: auto-duplicate (embedding only, fast)
- Score 0.85-0.92: LLM verification via local qwen3.5 (think=false, ~3s)
- Score < 0.85: not a duplicate

This eliminates ~44% false positives from pure embedding similarity.
LLM_DEDUP_ENABLED env var controls the feature (default: true).

Also adds 10 applicability use case tests (bank+TAN, webshop+Stripe,
SaaS startup, energy provider, health app, automotive, law firm, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 16:57:37 +02:00
Benjamin Admin
bed41dcbdf feat(control-pipeline): add applicability backfill endpoint (Phase 5/C3)
POST /v1/canonical/generate/backfill-applicability enriches controls
with applicable_industries, applicable_company_size, scope_conditions
via Anthropic API. Targets ~26k controls from pipeline version < 3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 16:25:50 +02:00
Benjamin Admin
736ddf647d fix(llm-dedup): use think:false instead of /no_think, restore 30s timeout
Ollama API supports "think": false to disable extended thinking mode
on qwen3.5. Reduces response time from 95s to ~3s per comparison.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 20:31:13 +02:00
Benjamin Admin
2188d6645e fix(llm-dedup): increase timeout to 120s, add /no_think, limit output to 200 tokens
qwen3.5 uses extended thinking by default which causes 95s+ responses
and 30s timeouts. Add /no_think to system prompt and num_predict=200
to keep responses short.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 20:27:58 +02:00
Benjamin Admin
076a6cd567 feat(control-pipeline): add LLM dedup endpoint for borderline review queue
POST /v1/canonical/generate/llm-dedup uses local Ollama (qwen3.5:35b-a3b)
to verify borderline duplicate matches (score 0.85-0.91). More accurate
than embedding similarity for compliance controls with subtle scope
differences (e.g. "documented" vs "implemented").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 20:15:46 +02:00
Benjamin Admin
fc855f52f9 fix(batch-dedup): don't crash on FK violation in _write_review
Stale UUIDs in the Qdrant dedup collection can reference controls
that were deprecated in earlier batches. Log warning and continue
instead of raising and killing the entire job.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 13:25:28 +02:00
Benjamin Admin
642a8587b5 feat(control-pipeline): add batch-dedup endpoint + source_citation JSONB migration
- Add POST /v1/canonical/generate/batch-dedup endpoint for Pass 0b
  atomic controls deduplication (Phase 1: intra-group, Phase 2: cross-group)
- source_citation column migrated from TEXT to JSONB (5,459 rows converted)
- migrate_jsonb.py script added for generation_metadata conversion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 08:44:31 +02:00
Benjamin Admin
4716023abc feat(control-pipeline): JSONB migration for generation_metadata
- Add migration script (scripts/migrate_jsonb.py) that converts
  89,443 Python dict repr rows to valid JSON via ast.literal_eval
- Column altered from TEXT to native JSONB
- Index created on generation_metadata->>'merge_group_hint'
- Remove unnecessary ::jsonb casts in pipeline_adapter.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 07:49:11 +02:00
Benjamin Admin
8b7671d310 feat(control-pipeline): add repair backfill endpoint for missing title/objective/requirements
POST /v1/canonical/generate/backfill-repair uses Anthropic API to
generate missing fields from available context (source text, other
fields). Handles 1,470 controls with incomplete data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 07:06:19 +02:00
Benjamin Admin
fb53c8be90 fix(anchor-finder): use correct Qdrant payload fields (regulation_id, regulation_name_de)
Qdrant collections use regulation_id (not regulation_code), regulation_name_de,
guideline_name, download_url etc. Also search bp_compliance_datenschutz
collection where OWASP/ENISA docs live.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 18:17:36 +02:00
Benjamin Admin
b29dc33708 fix(control-pipeline): anchor finder uses direct Qdrant search instead of Go SDK
The Go SDK RAG proxy returns 401 (Qdrant API key mismatch). Switch
AnchorFinder to use direct Qdrant vector search + embedding service,
same approach as the main pipeline. No dependency on Go SDK anymore.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 18:13:12 +02:00
Benjamin Admin
91f4202e88 feat(control-pipeline): add anchor backfill endpoint + normalize target_audience
- Add POST /v1/canonical/generate/backfill-anchors endpoint for batch
  populating open_anchors on controls generated with skip_web_search=true
- Uses AnchorFinder Stage A (RAG search) to find OWASP/NIST/ENISA refs
- Background job with progress tracking (same pattern as other backfills)
- Promotes needs_review controls that gain anchors to draft state
- Target audience normalization (enterprise/authority/provider → JSON arrays)
  already applied via SQL

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 18:04:50 +02:00
Benjamin Admin
c4e993e3f8 fix: Leere Controls (title/objective=None) filtern vor Store
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 44s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 30s
CI / Deploy (push) Failing after 4s
- Batch-Postprocessing: Controls mit title/objective = None/null/"" werden
  gefiltert und nicht gespeichert. Title wird aus Objective abgeleitet falls
  nur Title fehlt.
- _store_control: Pre-store Quality Guard lehnt leere Controls ab
- Verhindert "None"-Controls die durch LLM-Parsing-Fehler entstehen

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 06:59:47 +02:00
Benjamin Admin
a58d1aa403 fix: KRITISCH — 12 Pipeline-Bugs gefixt, 36.000 verlorene Controls retten
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 31s
CI / Deploy (push) Failing after 2s
Root Cause: _generate_control_id erzeugte ID-Kollisionen (String-Sort statt
numeric), ON CONFLICT DO NOTHING verwarf Controls stillschweigend, Chunks
wurden als "processed" markiert obwohl Store fehlschlug → permanent verloren.

Fixes:
1. _generate_control_id: Numeric MAX statt String-Sort, Collision Guard
   mit UUID-Suffix Fallback, Exception wird geloggt statt verschluckt
2. _store_control: ON CONFLICT DO UPDATE statt DO NOTHING → ID immer returned
3. Store-Logik: Chunk wird bei store_failed NICHT mehr als processed markiert
   → Retry beim naechsten Lauf moeglich
4. Counter: controls_generated nur bei erfolgreichem Store inkrementiert
   Neue Counter: controls_stored + controls_store_failed
5. Anthropic API: HTTP 429/500/502/503/504 werden jetzt retried (2 Versuche)
6. Monitoring: Progress-Log zeigt Store-Rate (%), ALARM bei <80%
7. Post-Job Validierung: Vergleicht Generated vs Stored vs DB-Realitaet
   WARNUNG wenn store_failed > 0, KRITISCH wenn Rate < 90%

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 00:39:12 +02:00
Benjamin Admin
756d068b4f fix: skip_web_search Default auf True — 5x schnellere Pipeline
Anchor-Search (DuckDuckGo + RAG via SDK) verlangsamt Pipeline von
~50 Chunks/min auf ~10 Chunks/min. Anchors (OWASP/NIST-Referenzen)
koennen nachtraeglich in einem Batch-Job befuellt werden.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:26:01 +02:00
Benjamin Admin
b14be8583d feat: Betriebsrats-Compliance — BAG-Ingestion Script + BV-Template
1. BAG-Urteile Ingestion Script (21 kuratierte Urteile zu §87 BetrVG)
   - Microsoft 365, SAP ERP, E-Mail, Standardsoftware, Video, SaaS/Cloud
   - 14 erfolgreich ingestiert (4.726 Chunks in bp_compliance_datenschutz)
2. Betriebsvereinbarung Template (6. Document Template)
   - SQL-Migration mit 13 Sektionen (A-M), ~30 Placeholders
   - Conditional Blocks fuer KI-Systeme, Video, HR
   - Python-Generator mit automatischer TOM-Befuellung

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:49:01 +02:00
Benjamin Admin
f89ce46631 fix: Pipeline-Skalierung — 6 Optimierungen für 80k+ Controls
1. control_generator: GeneratorResult.status Default "completed" → "running" (Bug)
2. control_generator: Anthropic API mit Phase-Timeouts + Retry bei Disconnect
3. control_generator: regulation_exclude Filter + Harmonization via Qdrant statt In-Memory
4. decomposition_pass: Enrich Pass Batch-UPDATEs (400k → ~400 DB-Calls)
5. decomposition_pass: Merge Pass single Query statt N+1
6. batch_dedup_runner: Cross-Group Dedup parallelisiert (asyncio.gather)
7. canonical_control_routes: Framework Controls API Pagination (limit/offset)
8. DB-Indizes: idx_oc_parent_release, idx_oc_trigger_null, idx_cc_framework

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:09:32 +02:00
Benjamin Admin
441d5740bd feat: Applicability Engine + API-Filter + DB-Sync + Cleanup
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 37s
CI / Deploy (push) Failing after 2s
- Applicability Engine (deterministisch, kein LLM): filtert Controls
  nach Branche, Unternehmensgroesse, Scope-Signalen
- API-Filter auf GET /controls, /controls-count, /controls-meta
- POST /controls/applicable Endpoint fuer Company-Profile-Matching
- 35 Unit-Tests fuer Engine
- Port-8098-Konflikt mit Nginx gefixt (nur expose, kein Host-Port)
- CLAUDE.md: control-pipeline Dokumentation ergaenzt
- 6 internationale Gesetze geloescht (ES/FR/HU/NL/SE/CZ — nur DACH)
- DB-Backup-Import-Script (import_backup.py)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 21:58:17 +02:00
Benjamin Admin
e3ab428b91 feat: control-pipeline Service aus Compliance-Repo migriert
Control-Pipeline (Pass 0a/0b, BatchDedup, Generator) als eigenstaendiger
Service in Core, damit Compliance-Repo unabhaengig refakturiert werden kann.
Schreibt weiterhin ins compliance-Schema der shared PostgreSQL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:40:47 +02:00