breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	6ed30dae5b	feat(agent): MC scorecard + audit drill-down + tenant trend (A1-A6) Now that all 1874 MCs run per check (Task #30 cap removal), the report was about to drown in noise. This commit adds the full aggregation / persistence / drill-down stack so each MC is actionable, not just counted. A1 mc_scorecard.py (new): build_scorecard(checks) -> per-regulation PASS/FAIL/SKIP + severity top_fails(checks, n) -> N most severe failed MCs full_audit_records(...) -> flat rows ready for sidecar SQLite A2 Email rendering: agent_doc_check_scorecard.py (new) builds an HTML scorecard table (regulation × passed/failed/HIGH/MEDIUM/score) shown at the top of the email. agent_doc_check_report._render_document now collapses the 500-MC L2 forest into 'X/Y bestanden (Z Fail)' summary plus a top-10 fails block per doc — old verbose render is gone. A3 compliance_audit_log.py (new) — sidecar SQLite at /data/compliance_audits.db (separate from compliance Postgres schema to comply with the no-new-migrations rule in CLAUDE.md): check_runs(check_id, ts, tenant_id, site_name, base_domain, doc_count, scorecard json, vvt_summary json) mc_results(check_id, doc_type, mc_id, label, passed, skipped, severity, regulation, matched_text, hint) Route persists every run after the email is sent. docker-compose.yml adds compliance-audit volume + env. A4 backfill_mc_regulation_llm.py (new) — Qwen-tagged backfill for the 1636 MCs the regex pass couldn't classify. Batches of 25, format=json, output constrained to the canonical regulation list. Run manually: docker exec bp-compliance-backend python3 \ /app/scripts/backfill_mc_regulation_llm.py [--dry-run] A5 Admin audit tab — GET /api/compliance/agent/audit/<check_id> proxied via /api/sdk/v1/agent/audit/<id>. New page /sdk/agent/audit/[checkId] renders scorecard + filterable MC table (status / doc_type / regulation, expandable rows with matched_text + hint). ComplianceCheckTab now shows 'Voll-Audit oeffnen' link. A6 Trend per tenant — GET /api/compliance/agent/audit/tenant/<id> returns recent runs. Email scorecard shows per-regulation delta badges ('(+12%)', '(-3%)') compared with the previous run for the same tenant + base_domain. Lookup is one SQLite query. Plumbing: rag_document_checker.py — SELECT now includes 'article'; MC results carry 'regulation' + 'article' through to CheckItem. agent_doc_check_routes.CheckItem schema gains regulation + article fields (defaults '') so old clients still parse. agent_compliance_check_routes — response gains 'check_id' so the frontend can build the audit link.	2026-05-17 13:45:58 +02:00
Benjamin Admin	873997c13b	feat(vvt): V3 — LLM vendor extraction fallback for unknown CMPs When the cookie text has no captured CMP payload (long-tail sites that don't use ePaaS/OneTrust/Cookiebot/etc.) we now fall back to a Qwen → OVH LLM cascade to extract a structured vendor list from the policy text. New module backend/compliance/services/vendor_llm_extractor.py: - extract_vendors_via_llm(cookie_text): runs Qwen first (local Ollama), then OVH if Qwen returns nothing usable. - System prompt instructs the model to return STRICT JSON only: {vendors: [{name, country, purpose, category, opt_out_url, privacy_policy_url, persistence, cookies: [...]}]} - Lenient JSON parser tolerates code-fences, prose wrappers, dict vs list. - _normalize() caps array sizes (80 vendors, 30 cookies each), validates URLs (must be http(s)), trims fields to reasonable lengths. Route integration (agent_compliance_check_routes.py): - After named-CMP extract: if cmp_vendors is empty AND the cookie text has ≥500 words (otherwise it's likely navigation chrome), invoke the LLM extractor. Progress message 'Vendor-Liste per LLM extrahieren...'. - Vendors then run through the same validate_vendor_urls + score_vendors pipeline → VVT table rendered identically regardless of source. docker-compose.yml: backend-compliance gains OLLAMA_URL, CMP_LLM_MODEL, OVH_LLM_URL/KEY/MODEL env vars (same names as consent-tester so the configuration is unified). This closes the 'every site eventually gets a VVT table' goal: - Known CMP → V1/V2 structured extraction (fast, exact) - Unknown CMP → V3 LLM extraction (slow, best-effort) - No text at all → no vendors, but other compliance checks still run.	2026-05-17 09:55:42 +02:00
Benjamin Admin	2400aa6a9e	feat(consent-tester): Phase C+D — LLM cascade fallback (Qwen → OVH) New module consent-tester/services/cmp_llm_fallback.py: - LLMCookieExtractor: single-endpoint adapter (Ollama OR OpenAI-compat) - LLMCascade: tries Qwen (local Mac Mini Ollama) first; falls through to OVH (managed 120B) when Qwen returns no usable strategy - LLMCascade.from_env(): reads OLLAMA_URL/CMP_LLM_MODEL + OVH_LLM_URL/ OVH_LLM_KEY/OVH_LLM_MODEL from environment - LLM returns JSON {strategy: url\|selector\|text, value: ...} - Valkey-backed cache per netloc (cmp:hint:<netloc>, 7-day TTL) — next run against the same domain skips the LLM entirely dsi_discovery.py: - Wired network_log collector (URL/status/content-type/size of every JSON response on the page) — passed to LLM prompt as observation - After Named CMP (Phase B) + Heuristic (Phase A) both fail AND DOM < 300 words: invoke LLMCascade.analyze(...) - _apply_llm_hint executes the LLM's strategy: refetch URL via Playwright request context, query DOM selector, or use text directly - Cache HIT path: apply cached hint, only fall back to LLM if cache is stale docker-compose.yml: - consent-tester gets env vars + cmp-data volume (for Phase E) - All LLM endpoints configurable via env, sensible defaults consent-tester/requirements.txt: - redis>=5.0 (asyncio client, Valkey-compatible) - httpx>=0.27	2026-05-16 23:06:05 +02:00
Benjamin Admin	f3c0481631	feat: add consent-tester service to docker-compose (port 8094, 2GB mem limit) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 12:33:20 +02:00
Benjamin Admin	295c18c6f7	feat: add DECOMPOSITION_LLM_MODEL env var for runtime model switching CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 46s Details CI/CD / test-python-backend-compliance (push) Successful in 33s Details CI/CD / test-python-document-crawler (push) Successful in 23s Details CI/CD / test-python-dsms-gateway (push) Successful in 19s Details CI/CD / validate-canonical-controls (push) Successful in 12s Details CI/CD / Deploy (push) Successful in 6s Details Allows switching between Haiku 4.5 and Sonnet 4.6 for Pass 0b without rebuilding the backend container. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 09:20:10 +01:00
Benjamin Admin	4d2f4f2d24	feat(qdrant): Migrate to hosted qdrant-dev.breakpilot.ai with API-Key auth CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-ai-compliance (push) Successful in 37s Details CI / test-python-backend-compliance (push) Successful in 32s Details CI / test-python-document-crawler (push) Successful in 22s Details CI / test-python-dsms-gateway (push) Successful in 18s Details - LegalRAGClient: QDRANT_HOST+PORT → QDRANT_URL + QDRANT_API_KEY - docker-compose: env vars updated for hosted Qdrant - AllowedCollections: added bp_compliance_gdpr, bp_dsfa_templates, bp_dsfa_risks - Migration scripts (bash + python) for data transfer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 13:55:21 +01:00
Benjamin Admin	ec4ed1f2ad	feat(infra): Compliance DB auf externe PostgreSQL 46.225.100.82:54321 migrieren CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-ai-compliance (push) Successful in 35s Details CI / test-python-backend-compliance (push) Successful in 32s Details CI / test-python-document-crawler (push) Successful in 21s Details CI / test-python-dsms-gateway (push) Successful in 19s Details - docker-compose.yml: alle 4 DATABASE_URL auf COMPLIANCE_DATABASE_URL (mit Fallback) - .env.example: COMPLIANCE_DATABASE_URL Eintrag ergaenzt - Rollback: ohne .env zeigt Fallback auf bp-core-postgres Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 12:32:01 +01:00
Benjamin Admin	adc95267bd	chore: LLM qwen3:30b-a3b → qwen3.5:35b-a3b CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-ai-compliance (push) Successful in 42s Details CI / test-python-backend-compliance (push) Successful in 35s Details CI / test-python-document-crawler (push) Successful in 20s Details CI / test-python-dsms-gateway (push) Successful in 17s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 07:31:55 +01:00
Benjamin Admin	0fc3e7754f	fix: docs Container Port-Binding entfernt — nur via Nginx HTTPS :8011 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 12:34:20 +01:00
Benjamin Admin	e6d666b89b	feat: Vorbereitung-Module auf 100% — Persistenz, Backend-Services, UCCA Frontend CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-ai-compliance (push) Successful in 37s Details CI / test-python-backend-compliance (push) Successful in 32s Details CI / test-python-document-crawler (push) Successful in 22s Details CI / test-python-dsms-gateway (push) Successful in 18s Details Phase A: PostgreSQL State Store (sdk_states Tabelle, InMemory-Fallback) Phase B: Modules dynamisch vom Backend, Scope DB-Persistenz, Source Policy State Phase C: UCCA Frontend (3 Seiten, Wizard, RiskScoreGauge), Obligations Live-Daten Phase D: Document Import (PDF/LLM/Gap-Analyse), System Screening (SBOM/OSV.dev) Phase E: Company Profile CRUD mit Audit-Logging Phase F: Tests (Python + TypeScript), flow-data.ts DB-Tabellen aktualisiert Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 11:04:31 +01:00
Benjamin Admin	3d9bc285ac	Fix backend-compliance DATABASE_URL: use sync psycopg2 instead of asyncpg CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-ai-compliance (push) Successful in 36s Details CI / test-python-backend-compliance (push) Successful in 29s Details CI / test-python-document-crawler (push) Successful in 23s Details CI / test-python-dsms-gateway (push) Successful in 17s Details The DATABASE_URL was using postgresql+asyncpg:// with ?options= for search_path, but database.py uses synchronous SQLAlchemy (create_engine) and asyncpg doesn't support the 'options' keyword argument. The search_path is already set via an event listener in database.py, so the options parameter is unnecessary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 08:19:23 +01:00
Benjamin Admin	94006be778	Revert LLM model to qwen3:30b-a3b (qwen3.5 download incomplete) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:17:47 +01:00
Benjamin Admin	c78a7b687e	Fix Academy page crash: optional chaining for byStatus and categoryInfo fallback - statistics.byStatus.in_progress could crash on empty object → optional chaining - COURSE_CATEGORY_INFO[course.category] could return undefined → fallback to 'custom' - Update LLM model to qwen3.5:35b-a3b in docker-compose.yml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 14:49:02 +01:00
Benjamin Boenisch	899e22a31b	feat(rag): connect bp_compliance_ce vector corpus to SDK CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-ai-compliance (push) Successful in 46s Details CI / test-python-backend-compliance (push) Successful in 30s Details CI / test-python-dsms-gateway (push) Has been cancelled Details CI / test-python-document-crawler (push) Has been cancelled Details - Switch LegalRAGClient from empty bp_legal_corpus to bp_compliance_ce collection (3,734 chunks across 14 regulations) - Replace embedding-service (384-dim MiniLM) with Ollama bge-m3 (1024-dim) - Add standalone RAG search endpoint: POST /sdk/v1/rag/search - Add regulations list endpoint: GET /sdk/v1/rag/regulations - Add QDRANT_HOST/PORT env vars to docker-compose.yml - Update regulation ID mapping to match actual Qdrant payload schema - Update determineRelevantRegulations for CE corpus regulation IDs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 23:44:47 +01:00
Benjamin Boenisch	375914e568	feat(training): add Media Pipeline — TTS Audio, Presentation Video, Bulk Generation CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-ai-compliance (push) Successful in 36s Details CI / test-python-backend-compliance (push) Successful in 31s Details CI / test-python-document-crawler (push) Successful in 23s Details CI / test-python-dsms-gateway (push) Successful in 21s Details Phase A: 8 new IT-Security training modules (SEC-PWD, SEC-DESK, SEC-KIAI, SEC-BYOD, SEC-VIDEO, SEC-USB, SEC-INC, SEC-HOME) with CTM entries. Bulk content and quiz generation endpoints for all 28 modules. Phase B: Piper TTS service (Python/FastAPI) for local German speech synthesis. training_media table, TTSClient in Go backend, audio generation endpoints, AudioPlayer component in frontend. MinIO storage integration. Phase C: FFmpeg presentation video pipeline — LLM generates slide scripts, ImageMagick renders 1920x1080 slides, FFmpeg combines with audio to MP4. VideoPlayer and ScriptPreview components in frontend. New files: 15 created, 9 modified - compliance-tts-service/ (Dockerfile, main.py, tts_engine.py, storage.py, slide_renderer.py, video_generator.py) - migrations 014-016 (training engine, IT-security modules, media table) - training package (models, store, content_generator, media, handlers) - frontend (AudioPlayer, VideoPlayer, ScriptPreview, api, types, page) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 21:45:05 +01:00
Benjamin Boenisch	364d2c69ff	feat: Add Document Crawler & Auto-Onboarding service (Phase 1.4) New standalone Python/FastAPI service for automatic compliance document scanning, LLM-based classification, IPFS archival, and gap analysis. Includes extractors (PDF, DOCX, XLSX, PPTX), keyword fallback classifier, compliance matrix, and full REST API on port 8098. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 20:35:15 +01:00
Benjamin Boenisch	0923c03756	chore: Add development screens, update navigation and docker-compose Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 20:29:27 +01:00
Benjamin Boenisch	c11270f8e0	Add CLAUDE.md, MkDocs docs, docs page in admin, .claude/rules - CLAUDE.md: Comprehensive documentation for Compliance SDK platform - docs-src: AI-Compliance-SDK docs (architecture, developer, auditor, SBOM) - mkdocs.yml: Compliance-specific nav with purple theme - docker-compose: Added docs service (port 8011, profile: docs) - admin-compliance: New /development/docs page with iframe + quick links - navigation.ts: Added development category with docs module - .claude/rules: testing, docs, open-source, compliance-checklist Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 00:49:28 +01:00
Benjamin Boenisch	4435e7ea0a	Initial commit: breakpilot-compliance - Compliance SDK Platform Services: Admin-Compliance, Backend-Compliance, AI-Compliance-SDK, Consent-SDK, Developer-Portal, PCA-Platform, DSMS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 23:47:28 +01:00

19 Commits