breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	0c0dd4e3a6	feat: ZeroClaw compliance agent — document analysis + role assignment + email Add autonomous compliance agent that fetches web documents (cookie banners, privacy policies), classifies them via Qwen/Ollama, assesses DSGVO compliance, assigns to the responsible role, and sends notification emails. Components: - ZeroClaw SOP (6-step workflow: fetch, classify, assess, summarize, assign, notify) - Backend: /api/compliance/agent/analyze (combined endpoint) - Backend: /api/compliance/agent/notify (standalone email) - Frontend: /sdk/agent page (Manager UI with URL input + results) - Helper scripts + E2E test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-27 23:28:21 +02:00
Sharang Parnerkar	c43d9da6d0	merge: sync with origin/main, take upstream on conflicts # Conflicts: # admin-compliance/lib/sdk/types.ts # admin-compliance/lib/sdk/vendor-compliance/types.ts	2026-04-16 16:26:48 +02:00
Sharang Parnerkar	7344e5806e	refactor(backend/isms): split isms_assessment_service.py to stay under 500 LOC The previous commit (`32e121f`) left isms_assessment_service.py at 639 LOC, exceeding the 500-line hard cap. This follow-up extracts ReadinessCheckService and OverviewService into a new isms_readiness_service.py (400 LOC), leaving isms_assessment_service.py at 257 LOC (Management Reviews, Internal Audits, Audit Trail only). Updated isms_routes.py imports to reference the new service file. File sizes after split: - isms_routes.py: 446 LOC (thin handlers) - isms_governance_service.py: 416 LOC (scope, context, policy, objectives, SoA) - isms_findings_service.py: 276 LOC (findings, CAPA) - isms_assessment_service.py: 257 LOC (mgmt reviews, internal audits, audit trail) - isms_readiness_service.py: 400 LOC (readiness check, ISO 27001 overview) All 58 integration tests + 173 unit/contract tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 20:50:30 +02:00
Sharang Parnerkar	ae008d7d25	refactor(backend/api): extract DSFA schemas + services (Step 4 — file 14 of 18) - Create compliance/schemas/dsfa.py (161 LOC) — extract DSFACreate, DSFAUpdate, DSFAStatusUpdate, DSFASectionUpdate, DSFAApproveRequest - Create compliance/services/dsfa_service.py (386 LOC) — CRUD + helpers + stats + audit-log + CSV export; uses domain errors - Create compliance/services/dsfa_workflow_service.py (347 LOC) — status update, section update, submit-for-review, approve, export JSON, versions - Rewrite compliance/api/dsfa_routes.py (339 LOC) as thin handlers with Depends + translate_domain_errors(); re-export legacy symbols via __all__ - Add [mypy-compliance.api.dsfa_routes] ignore_errors = False to mypy.ini - Update tests: 422 -> 400 for domain ValidationError (6 assertions) - Regenerate OpenAPI baseline (360 paths / 484 operations — unchanged) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 19:20:48 +02:00
Sharang Parnerkar	d2c94619d8	refactor(backend/api): extract LegalDocumentConsentService (Step 4 — file 12 of 18) Extract consent, audit log, cookie category, and consent stats endpoints from legal_document_routes into LegalDocumentConsentService. The route file is now a thin handler layer delegating to LegalDocumentService and LegalDocumentConsentService with translate_domain_errors(). Legacy helpers (_doc_to_response, _version_to_response, _transition, _log_approval) and schemas are re-exported for existing tests. Two transition tests updated to expect domain errors instead of HTTPException. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 08:47:56 +02:00
Sharang Parnerkar	cc1c61947d	refactor(backend/api): extract Incident services (Step 4 — file 11 of 18) compliance/api/incident_routes.py (916 LOC) -> 280 LOC thin routes + two services + 95-line schemas file. Two-service split for DSGVO Art. 33/34 Datenpannen-Management: incident_service.py (460 LOC): - CRUD (create, list, get, update, delete) - Stats, status update, timeline append, close - Module-level helpers: _calculate_risk_level, _is_notification_required, _calculate_72h_deadline, _incident_to_response, _measure_to_response, _parse_jsonb, _append_timeline, DEFAULT_TENANT_ID incident_workflow_service.py (329 LOC): - Risk assessment (likelihood x impact -> risk_level) - Art. 33 authority notification (with 72h deadline tracking) - Art. 34 data subject notification - Corrective measures CRUD Both services use raw SQL via sqlalchemy.text() — no ORM models for incident_incidents / incident_measures tables. Migrated from the Go ai-compliance-sdk; Python backend is Source of Truth. Legacy test compat: tests/test_incident_routes.py imports _calculate_risk_level, _is_notification_required, _calculate_72h_deadline, _incident_to_response, _measure_to_response, _parse_jsonb, DEFAULT_TENANT_ID directly from compliance.api.incident_routes — all re-exported via __all__. Verified: - 223/223 pytest pass (173 core + 50 incident) - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 141 source files - incident_routes.py 916 -> 280 LOC - Hard-cap violations: 8 -> 7 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 08:35:57 +02:00
Sharang Parnerkar	0c2e03f294	refactor(backend/api): extract Email Template services (Step 4 — file 10 of 18) compliance/api/email_template_routes.py (823 LOC) -> 295 LOC thin routes + 402-line EmailTemplateService + 241-line EmailTemplateVersionService + 61-line schemas file. Two-service split along natural responsibility seam: email_template_service.py (402 LOC): - Template type catalog (TEMPLATE_TYPES constant) - Template CRUD (list, create, get) - Stats, settings, send logs, initialization, default content - Shared _template_to_dict / _version_to_dict / _render_template helpers email_template_version_service.py (241 LOC): - Version CRUD (create, list, get, update) - Workflow transitions (submit, approve, reject, publish) - Preview and test-send TEMPLATE_TYPES, VALID_CATEGORIES, VALID_STATUSES re-exported from the route module for any legacy consumers. State-transition errors use ValidationError (-> HTTPException 400) to preserve the original handler's 400 status for "Only draft/review versions can be ..." checks, since the existing TestClient integration tests (47 tests) assert status_code == 400. Verified: - 47/47 tests/test_email_template_routes.py pass - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 138 source files - email_template_routes.py 823 -> 295 LOC - Hard-cap violations: 9 -> 8 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 22:39:19 +02:00
Sharang Parnerkar	a638d0e527	refactor(backend/api): extract EvidenceService (Step 4 — file 9 of 18) compliance/api/evidence_routes.py (641 LOC) -> 240 LOC thin routes + 460-line EvidenceService. Manages evidence CRUD, file upload, CI/CD evidence collection (SAST/dependency/SBOM/container scans), and CI status dashboard. Service injection pattern: EvidenceService takes the EvidenceRepository, ControlRepository, and AutoRiskUpdater classes as constructor parameters. The route's get_evidence_service factory reads these class references from its own module namespace so tests that ``patch("compliance.api.evidence_routes.EvidenceRepository", ...)`` still take effect through the factory. The `_store_evidence` and `_update_risks` helpers stay as module-level callables in evidence_service and are re-exported from the route module. The collect_ci_evidence handler remains inline (not delegated to a service method) so tests can patch `compliance.api.evidence_routes._store_evidence` and have the patch take effect at the handler's call site. Legacy re-exports via __all__: SOURCE_CONTROL_MAP, EvidenceRepository, ControlRepository, AutoRiskUpdater, _parse_ci_evidence, _extract_findings_detail, _store_evidence, _update_risks. Verified: - 208/208 pytest (core + 35 evidence tests) pass - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 135 source files - evidence_routes.py 641 -> 240 LOC - Hard-cap violations: 10 -> 9 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 21:59:03 +02:00
Sharang Parnerkar	7107a31496	refactor(backend/api): extract SourcePolicyService (Step 4 — file 7 of 18) compliance/api/source_policy_router.py (580 LOC) -> 253 LOC thin routes + 453-line SourcePolicyService + 83-line schemas file. Manages allowed data sources, operations matrix, PII rules, blocked-content log, audit trail, and dashboard stats/report. Single-service split. ORM-based (uses compliance.db.source_policy_models). Date-string parsing extracted to a module-level _parse_iso_optional helper so the audit + blocked-content list endpoints share it instead of duplicating try/except blocks. Legacy test compat: SourceCreate, SourceUpdate, SourceResponse, PIIRuleCreate, PIIRuleUpdate, OperationUpdate, _log_audit re-exported from compliance.api.source_policy_router via __all__. Verified: - 208/208 pytest pass (173 core + 35 source policy) - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 132 source files - source_policy_router.py 580 -> 253 LOC - Hard-cap violations: 12 -> 11 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 19:58:02 +02:00
Sharang Parnerkar	b850368ec9	refactor(backend/api): extract CanonicalControlService (Step 4 — file 6 of 18) compliance/api/canonical_control_routes.py (514 LOC) -> 192 LOC thin routes + 316-line CanonicalControlService + 105-line schemas file. Canonical Control Library manages OWASP/NIST/ENISA-anchored security control frameworks and controls. Like company_profile_routes, this file uses raw SQL via sqlalchemy.text() because there are no SQLAlchemy models for canonical_control_frameworks or canonical_controls. Single-service split. Session management moved from bespoke `with SessionLocal() as db:` blocks to Depends(get_db) for consistency. Legacy test imports preserved via re-export (FrameworkResponse, ControlResponse, SimilarityCheckRequest, SimilarityCheckResponse, _control_row). Validation extracted to a module-level `_validate_control_input` helper so both create and update share the same checks. ValidationError (from compliance.domain) replaces raw HTTPException(400) raises. Verified: - 187/187 pytest (173 core + 14 canonical) pass - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 130 source files - canonical_control_routes.py 514 -> 192 LOC - Hard-cap violations: 13 -> 12 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 19:53:55 +02:00
Sharang Parnerkar	4fa0dd6f6d	refactor(backend/api): extract VVTService (Step 4 — file 5 of 18) compliance/api/vvt_routes.py (550 LOC) -> 225 LOC thin routes + 475-line VVTService. Covers the organization header, processing activities CRUD, audit log, JSON/CSV export, stats, and version lookups for the Art. 30 DSGVO Verzeichnis. Single-service split: organization + activities + audit + stats all revolve around the same tenant's VVT document, and the existing test suite (tests/test_vvt_routes.py — 768 LOC, tests/test_vvt_tenant_isolation.py — 205 LOC) exercises them together. Module-level helpers (_activity_to_response, _log_audit, _export_csv) stay module-level in compliance.services.vvt_service and are re-exported from compliance.api.vvt_routes so the two test files keep importing from the old path. Pydantic schemas already live in compliance.schemas.vvt from Step 3 — no new schema file needed this round. mypy.ini flips compliance.api.vvt_routes from ignore_errors=True to False. Two SQLAlchemy Column[str] vs str dict-index errors fixed with explicit str() casts on status/business_function in the stats loop. Verified: - 242/242 pytest (173 core + 69 VVT integration) pass - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 128 source files - vvt_routes.py 550 -> 225 LOC - vvt_service.py 475 LOC (under 500 hard cap) - Hard-cap violations: 14 -> 13 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 19:50:40 +02:00
Sharang Parnerkar	f39c7ca40c	refactor(backend/api): extract CompanyProfileService (Step 4 — file 4 of 18) compliance/api/company_profile_routes.py (640 LOC) -> 154 LOC thin routes. Unusual for this repo: persistence uses raw SQL via sqlalchemy.text() because the underlying compliance_company_profiles table has ~45 columns with complex jsonb coercion and there is no SQLAlchemy model for it. New files: compliance/schemas/company_profile.py (127) — 4 request/response models compliance/services/company_profile_service.py (340) — Service class + row_to_response + log_audit compliance/services/_company_profile_sql.py (139) — 70-line INSERT/UPDATE statements separated for readability Minor behavioral improvement: the handlers now use Depends(get_db) for session management instead of the bespoke `db = SessionLocal(); try: ... finally: db.close()` pattern. This makes the routes consistent with every other refactored service, fixes the broken-ness under test dependency_overrides, and removes 6 duplicate try/finally blocks. Legacy exports preserved: CompanyProfileRequest, CompanyProfileResponse, AuditEntryResponse, AuditListResponse, row_to_response, and log_audit are re-exported from compliance.api.company_profile_routes so that the two existing test files (tests/test_company_profile_routes.py, tests/test_company_profile_extend.py) keep importing from the same path. Pre-existing broken tests noted: 6 tests in those files feed a 40-tuple row into row_to_response, but _BASE_COLUMNS_LIST has 46 columns (has had since the Phase 2 Stammdaten extension). These tests fail on main too (verified via `git stash` round-trip). Not fixed in this commit — they require a rewrite of the test's _make_row helper, which is out of scope for a pure structural refactor. Flagged for follow-up. Verified: - 173/173 pytest compliance/tests/ tests/contracts/ pass - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 127 source files - company_profile_routes.py 640 -> 154 LOC - All new files under soft 300 target except service (340, under hard 500) - Hard-cap violations: 15 -> 14 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 19:47:29 +02:00
Sharang Parnerkar	d571412657	refactor(backend/api): extract TOMService (Step 4 — file 3 of 18) compliance/api/tom_routes.py (609 LOC) -> 215 LOC thin routes + 434-line TOMService. Request bodies (TOMStateBody, TOMMeasureCreate, TOMMeasureUpdate, TOMMeasureBulkItem, TOMMeasureBulkBody) moved to compliance/schemas/tom.py (joining the existing response models from the Step 3 split). Single-service split (not two like banner): state, measures CRUD + bulk upsert, stats, export, and version lookups are all tightly coupled around the TOMMeasureDB aggregate, so splitting would create artificial boundaries. TOMService is 434 LOC — comfortably under the 500 hard cap. Domain error mapping: - ConflictError -> 409 (version conflict on state save; duplicate control_id on create) - NotFoundError -> 404 (missing measure on update; missing version) - ValidationError -> 400 (missing tenant_id on DELETE /state) Legacy test compat: the existing tests/test_tom_routes.py imports TOMMeasureBulkItem, _parse_dt, _measure_to_dict, and DEFAULT_TENANT_ID directly from compliance.api.tom_routes. All re-exported via __all__ so the 44-test file runs unchanged. mypy.ini flips compliance.api.tom_routes from ignore_errors=True to False. TOMService carries the scoped Column[T] header. Verified: - 217/217 pytest (173 baseline + 44 TOM) pass - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 124 source files - tom_routes.py 609 -> 215 LOC - Hard-cap violations: 16 -> 15 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 19:42:17 +02:00
Sharang Parnerkar	10073f3ef0	refactor(backend/api): extract BannerConsent + BannerAdmin services (Step 4) Phase 1 Step 4, file 2 of 18. Same cookbook as audit_routes (`4a91814` + `883ef70`) applied to banner_routes.py. compliance/api/banner_routes.py (653 LOC) is decomposed into: compliance/api/banner_routes.py (255) — thin handlers compliance/services/banner_consent_service.py (298) — public SDK surface compliance/services/banner_admin_service.py (238) — site/category/vendor CRUD compliance/services/_banner_serializers.py ( 81) — ORM-to-dict helpers shared between the two services compliance/schemas/banner.py ( 85) — Pydantic request models Split rationale: the SDK-facing endpoints (consent CRUD, config retrieval, export, stats) and the admin CRUD endpoints (sites + categories + vendors) have distinct audiences and different auth stories, and combined they would push the service file over the 500 hard cap. Two focused services is cleaner than one ~540-line god class. The shared ORM-to-dict helpers live in a private sibling module (_banner_serializers) rather than a static method on either service, so both services can import without a cycle. Handlers follow the established pattern: - Depends(get_consent_service) or Depends(get_admin_service) - `with translate_domain_errors():` wrapping the service call - Explicit return type annotations - ~3-5 lines per handler Services raise NotFoundError / ConflictError / ValidationError from compliance.domain; no HTTPException in the service layer. mypy.ini flips compliance.api.banner_routes from ignore_errors=True to False, joining audit_routes in the strict scope. The services carry the same scoped `# mypy: disable-error-code="arg-type,assignment"` header used by the audit services for the ORM Column[T] issue. Pydantic schemas moved to compliance.schemas.banner (mirroring the Step 3 schemas split). They were previously defined inline in banner_routes.py and not referenced by anything outside it, so no backwards-compat shim is needed. Verified: - 224/224 pytest (173 baseline + 26 audit integration + 25 banner integration) pass - tests/contracts/test_openapi_baseline.py green (360/484 unchanged) - mypy compliance/ -> Success: no issues found in 123 source files - All new files under the 300 soft target (largest: 298) - banner_routes.py drops from 653 -> 255 LOC (below hard cap) Hard-cap violations remaining: 16 (was 17). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:52:31 +02:00
Sharang Parnerkar	883ef702ac	tech-debt: mypy --strict config + integration tests for audit routes Phase 1 Step 4 follow-up addressing the debt flagged in the worked-example commit (`4a91814`). ## mypy --strict policy Adds backend-compliance/mypy.ini declaring the strict-mode scope: Fully strict (enforced today): - compliance/domain/ - compliance/schemas/ - compliance/api/_http_errors.py - compliance/api/audit_routes.py (refactored in Step 4) - compliance/services/audit_session_service.py - compliance/services/audit_signoff_service.py Loose (ignore_errors=True) with a migration path: - compliance/db/* — SQLAlchemy 1.x Column[] vs runtime T; unblocks Phase 1 until a Mapped[T] migration. - compliance/api/<route>.py — each route file flips to strict as its own Step 4 refactor lands. - compliance/services/<legacy util> — 14 utility services (llm_provider, pdf_extractor, seeder, ...) that predate the clean-arch refactor. - compliance/tests/ — excluded (legacy placeholder style). The new TestClient- based integration suite is type-annotated. The two new service files carry a scoped `# mypy: disable-error-code="arg-type,assignment"` header for the ORM Column[T] issue — same underlying SQLAlchemy limitation, narrowly scoped rather than wholesale ignore_errors. Flow: `cd backend-compliance && mypy compliance/` -> clean on 119 files. CI yaml updated to use the config instead of ad-hoc package lists. ## Bugs fixed while enabling strict mypy --strict surfaced two latent bugs in the pre-refactor code. Both were invisible because the old `compliance/tests/test_audit_routes.py` is a placeholder suite that asserts on request-data shape and never calls the handlers: - AuditSessionResponse.updated_at is a required field in the schema, but the original handler didn't pass it. Fixed in AuditSessionService._to_response. - PaginationMeta requires has_next + has_prev. The original audit checklist handler didn't compute them. Fixed in AuditSignOffService.get_checklist. Both are behavior-preserving at the HTTP level because the old code would have raised Pydantic ValidationError at response serialization had the endpoint actually been exercised. ## Integration test suite Adds backend-compliance/tests/test_audit_routes_integration.py — 26 real TestClient tests against an in-memory sqlite backend (StaticPool). Replaces the coverage gap left by the placeholder suite. Covers: - Session CRUD + lifecycle transitions (draft -> in_progress -> completed -> archived), including the 409 paths for illegal transitions - Checklist pagination, filtering, search - Sign-off create / update / auto-start-session / count-flipping - Sign-off 400 (invalid result), 404 (missing requirement), 409 (completed session) - Get-signoff 404 / 200 round-trip Uses a module-scoped schema fixture + per-test DELETE-sweep so the suite runs in ~2.3s despite the ~50-table ORM surface. Verified: - 199/199 pytest (173 original + 26 new audit integration) pass - tests/contracts/test_openapi_baseline.py green, OpenAPI 360/484 unchanged - mypy compliance/ -> Success: no issues found in 119 source files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:39:40 +02:00
Sharang Parnerkar	4a91814bfc	refactor(backend/api): extract AuditSession service layer (Step 4 worked example) Phase 1 Step 4 of PHASE1_RUNBOOK.md, first worked example. Demonstrates the router -> service delegation pattern for all 18 oversized route files still above the 500 LOC hard cap. compliance/api/audit_routes.py (637 LOC) is decomposed into: compliance/api/audit_routes.py (198) — thin handlers compliance/services/audit_session_service.py (259) — session lifecycle compliance/services/audit_signoff_service.py (319) — checklist + sign-off compliance/api/_http_errors.py ( 43) — reusable error translator Handlers shrink to 3-6 lines each: @router.post("/sessions", response_model=AuditSessionResponse) async def create_audit_session( request: CreateAuditSessionRequest, service: AuditSessionService = Depends(get_audit_session_service), ): with translate_domain_errors(): return service.create(request) Services are HTTP-agnostic: they raise NotFoundError / ConflictError / ValidationError from compliance.domain, and the route layer translates those to HTTPException(404/409/400) via the translate_domain_errors() context manager in compliance.api._http_errors. The error translator is reusable by every future Step 4 refactor. Services take a sqlalchemy Session in the constructor and are wired via Depends factories (get_audit_session_service / get_audit_signoff_service). No globals, no module-level state. Behavior is byte-identical at the HTTP boundary: - Same paths, methods, status codes, response models - Same error messages (domain error __str__ preserved) - Same auto-start-on-first-signoff, same statistics calculation, same signature hash format, same PDF streaming response Verified: - 173/173 pytest compliance/tests/ tests/contracts/ pass - OpenAPI 360 paths / 484 operations unchanged - audit_routes.py under soft 300 target - Both new service files under soft 300 / hard 500 Note: compliance/tests/test_audit_routes.py contains placeholder tests that do not actually import or call the handler functions — they only assert on request-data shape. Real behavioral coverage relies on the contract test. A follow-up commit should add TestClient-based integration tests for the audit endpoints. Flagged in PHASE1_RUNBOOK. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:16:50 +02:00
Sharang Parnerkar	3320ef94fc	refactor: phase 0 guardrails + phase 1 step 2 (models.py split) Squash of branch refactor/phase0-guardrails-and-models-split — 4 commits, 81 files, 173/173 pytest green, OpenAPI contract preserved (360 paths / 484 operations). ## Phase 0 — Architecture guardrails Three defense-in-depth layers to keep the architecture rules enforced regardless of who opens Claude Code in this repo: 1. .claude/settings.json PreToolUse hook on Write/Edit blocks any file that would exceed the 500-line hard cap. Auto-loads in every Claude session in this repo. 2. scripts/githooks/pre-commit (install via scripts/install-hooks.sh) enforces the LOC cap locally, freezes migrations/ without [migration-approved], and protects guardrail files without [guardrail-change]. 3. .gitea/workflows/ci.yaml gains loc-budget + guardrail-integrity + sbom-scan (syft+grype) jobs, adds mypy --strict for the new Python packages (compliance/{services,repositories,domain,schemas}), and tsc --noEmit for admin-compliance + developer-portal. Per-language conventions documented in AGENTS.python.md, AGENTS.go.md, AGENTS.typescript.md at the repo root — layering, tooling, and explicit "what you may NOT do" lists. Root CLAUDE.md is prepended with the six non-negotiable rules. Each of the 10 services gets a README.md. scripts/check-loc.sh enforces soft 300 / hard 500 and surfaces the current baseline of 205 hard + 161 soft violations so Phases 1-4 can drain it incrementally. CI gates only CHANGED files in PRs so the legacy baseline does not block unrelated work. ## Deprecation sweep 47 files. Pydantic V1 regex= -> pattern= (2 sites), class Config -> ConfigDict in source_policy_router.py (schemas.py intentionally skipped; it is the Phase 1 Step 3 split target). datetime.utcnow() -> datetime.now(timezone.utc) everywhere including SQLAlchemy default= callables. All DB columns already declare timezone=True, so this is a latent-bug fix at the Python side, not a schema change. DeprecationWarning count dropped from 158 to 35. ## Phase 1 Step 1 — Contract test harness tests/contracts/test_openapi_baseline.py diffs the live FastAPI /openapi.json against tests/contracts/openapi.baseline.json on every test run. Fails on removed paths, removed status codes, or new required request body fields. Regenerate only via tests/contracts/regenerate_baseline.py after a consumer-updated contract change. This is the safety harness for all subsequent refactor commits. ## Phase 1 Step 2 — models.py split (1466 -> 85 LOC shim) compliance/db/models.py is decomposed into seven sibling aggregate modules following the existing repo pattern (dsr_models.py, vvt_models.py, ...): regulation_models.py (134) — Regulation, Requirement control_models.py (279) — Control, Mapping, Evidence, Risk ai_system_models.py (141) — AISystem, AuditExport service_module_models.py (176) — ServiceModule, ModuleRegulation, ModuleRisk audit_session_models.py (177) — AuditSession, AuditSignOff isms_governance_models.py (323) — ISMSScope, Context, Policy, Objective, SoA isms_audit_models.py (468) — Finding, CAPA, MgmtReview, InternalAudit, AuditTrail, Readiness models.py becomes an 85-line re-export shim in dependency order so existing imports continue to work unchanged. Schema is byte-identical: __tablename__, column definitions, relationship strings, back_populates, cascade directives all preserved. All new sibling files are under the 500-line hard cap; largest is isms_audit_models.py at 468. No file in compliance/db/ now exceeds the hard cap. ## Phase 1 Step 3 — infrastructure only backend-compliance/compliance/{schemas,domain,repositories}/ packages are created as landing zones with docstrings. compliance/domain/ exports DomainError / NotFoundError / ConflictError / ValidationError / PermissionError — the base classes services will use to raise domain-level errors instead of HTTPException. PHASE1_RUNBOOK.md at backend-compliance/PHASE1_RUNBOOK.md documents the nine-step execution plan for Phase 1: snapshot baseline, characterization tests, split models.py (this commit), split schemas.py (next), extract services, extract repositories, mypy --strict, coverage. ## Verification backend-compliance/.venv-phase1: uv python install 3.12 + pip -r requirements.txt PYTHONPATH=. pytest compliance/tests/ tests/contracts/ -> 173 passed, 0 failed, 35 warnings, OpenAPI 360/484 unchanged Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 13:18:29 +02:00
Benjamin Admin	712fa8cb74	feat: Pass 0b quality — negative actions, container detection, session object classes CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 33s Details CI/CD / test-python-backend-compliance (push) Successful in 30s Details CI/CD / test-python-document-crawler (push) Successful in 21s Details CI/CD / test-python-dsms-gateway (push) Successful in 16s Details CI/CD / validate-canonical-controls (push) Successful in 10s Details CI/CD / Deploy (push) Successful in 2s Details 4 error class fixes from AUTH-1052 quality review: 1. Prohibitive action types (prevent/exclude/forbid) for "dürfen keine", "verboten" etc. 2. Container object detection (Sitzungsverwaltung, Token-Schutz → _requires_decomposition) 3. Session-specific object classes (session, cookie, jwt, federated_assertion) 4. Session lifecycle actions (invalidate, issue, rotate, enforce) with templates + severity caps 76 new tests (303 total), all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 17:24:19 +01:00
Benjamin Admin	f8d9919b97	Improve object normalization: shorter keys, synonym expansion, qualifier stripping - Truncate object keys to 40 chars (was 80) at underscore boundary - Strip German qualifying prepositional phrases (bei/für/gemäß/von/zur/...) - Add 65 new synonym mappings for near-duplicate patterns found in analysis - Strip trailing noise tokens (articles/prepositions) - Add _truncate_at_boundary() helper and _QUALIFYING_PHRASE_RE regex - 11 new tests for normalization improvements (227 total pass) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 08:55:48 +01:00
Benjamin Admin	fb2cf29b34	fix: Pass 0b — Duplicate Guard, Severity-Kalibrierung, Title-Truncation CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 55s Details CI/CD / test-python-backend-compliance (push) Successful in 36s Details CI/CD / test-python-document-crawler (push) Successful in 23s Details CI/CD / test-python-dsms-gateway (push) Successful in 20s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Successful in 4s Details 1. Duplicate Guard: merge_hint-Lookup vor INSERT in _write_atomic_control() verhindert semantisch identische Controls unter demselben Parent. 2. Severity-Kalibrierung: action_type-basiert statt blind vom Parent. define/review/test → max medium, implement/monitor → max high. 3. Title-Truncation: Schnitt am Wortende statt mitten im Wort. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 08:38:33 +01:00
Benjamin Admin	f39e5a71af	feat: Obligation-Deduplizierung — 34.617 Duplikate als 'duplicate' markiert CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 33s Details CI/CD / test-python-backend-compliance (push) Successful in 35s Details CI/CD / test-python-document-crawler (push) Successful in 30s Details CI/CD / test-python-dsms-gateway (push) Successful in 20s Details CI/CD / validate-canonical-controls (push) Successful in 13s Details CI/CD / Deploy (push) Successful in 3s Details Neue Endpunkte POST /obligations/dedup und GET /obligations/dedup-stats. Pro candidate_id wird der aelteste Eintrag behalten, alle weiteren erhalten release_state='duplicate' mit merged_into_id + quality_flags fuer Traceability. Detail-View filtert Duplikate aus. MKDocs aktualisiert. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 20:13:00 +01:00
Benjamin Admin	52e463a7c8	feat: Faceted Search — Dropdown-Counts passen sich aktiven Filtern an CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 36s Details CI/CD / test-python-backend-compliance (push) Successful in 42s Details CI/CD / test-python-document-crawler (push) Successful in 30s Details CI/CD / test-python-dsms-gateway (push) Successful in 21s Details CI/CD / validate-canonical-controls (push) Successful in 13s Details CI/CD / Deploy (push) Successful in 2s Details Backend: controls-meta akzeptiert alle Filter-Parameter und berechnet Faceted Counts (jede Dimension zaehlt mit allen ANDEREN Filtern). Neue Facets: severity, verification_method, category, evidence_type, release_state — zusaetzlich zu domains, sources, type_counts. Frontend: loadMeta laedt bei jeder Filteraenderung neu, alle Dropdowns zeigen kontextsensitive Zahlen. Proxy leitet Filter an controls-meta weiter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 15:00:40 +01:00
Benjamin Admin	81c9ce5de3	fix: V1 Enrichment — Qdrant Collection + Parent-Resolution fuer regulatorische Matches CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 33s Details CI/CD / test-python-backend-compliance (push) Successful in 30s Details CI/CD / test-python-document-crawler (push) Successful in 21s Details CI/CD / test-python-dsms-gateway (push) Successful in 16s Details CI/CD / validate-canonical-controls (push) Successful in 9s Details CI/CD / Deploy (push) Successful in 1s Details Die atomic_controls_dedup Collection (51k Punkte) enthaelt nur atomare Controls ohne source_citation. Jetzt wird der Parent-Control aufgeloest, der die Rechtsgrundlage traegt. Deduplizierung nach Parent-UUID verhindert mehrfache Eintraege fuer die gleiche Regulation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 10:52:41 +01:00
Benjamin Admin	db7c207464	feat: V1 Control Enrichment — Eigenentwicklung-Label, regulatorisches Matching & Vergleichsansicht CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 39s Details CI/CD / test-python-backend-compliance (push) Successful in 32s Details CI/CD / test-python-document-crawler (push) Successful in 20s Details CI/CD / test-python-dsms-gateway (push) Successful in 16s Details CI/CD / validate-canonical-controls (push) Successful in 9s Details CI/CD / Deploy (push) Successful in 4s Details 863 v1-Controls (manuell geschrieben, ohne Rechtsgrundlage) werden als "Eigenentwicklung" gekennzeichnet und automatisch mit regulatorischen Controls (DSGVO, NIS2, OWASP etc.) per Embedding-Similarity abgeglichen. Backend: - Migration 080: v1_control_matches Tabelle (Cross-Reference) - v1_enrichment.py: Batch-Matching via BGE-M3 + Qdrant (Threshold 0.75) - 3 neue API-Endpoints: enrich-v1-matches, v1-matches, v1-enrichment-stats - 6 Tests (dry-run, execution, matches, pagination, detection) Frontend: - Orange "Eigenentwicklung"-Badge statt grauem "v1" (wenn kein Source) - "Regulatorische Abdeckung"-Sektion im ControlDetail mit Match-Karten - Side-by-Side V1CompareView (Eigenentwicklung vs. regulatorisch gedeckt) - Prev/Next Navigation durch alle Matches Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 10:32:08 +01:00
Benjamin Admin	23dd5116b3	feat: LLM-basierter Rationale-Backfill fuer atomare Controls POST /controls/backfill-rationale — ersetzt Placeholder "Aus Obligation abgeleitet." durch LLM-generierte Begruendungen (Ollama/qwen3.5). Optimierung: gruppiert ~86k Controls nach ~7k Parents, ein LLM-Call pro Parent. Paginierung via batch_size/offset fuer kontrollierte Ausfuehrung. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-25 23:01:49 +01:00
Benjamin Admin	5e9cab6ab5	feat: evidence_type Feld (code/process/hybrid) fuer Controls CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 38s Details CI/CD / test-python-backend-compliance (push) Successful in 31s Details CI/CD / test-python-document-crawler (push) Successful in 19s Details CI/CD / test-python-dsms-gateway (push) Successful in 17s Details CI/CD / validate-canonical-controls (push) Successful in 10s Details CI/CD / Deploy (push) Successful in 4s Details Neues Feld auf canonical_controls klassifiziert, ob ein Control technisch im Source Code (code), organisatorisch via Dokumente (process) oder beides (hybrid) nachgewiesen wird. Inklusive Backfill-Endpoint, Frontend-Badge/Filter und MkDocs-Dokumentation. - Migration 079: evidence_type VARCHAR(20) + Index - Backend: Filter, Backfill-Endpoint mit Domain-Heuristik, CRUD - Frontend: EvidenceTypeBadge (sky/amber/violet), Nachweisart-Dropdown - Proxy: evidence_type Passthrough fuer controls + controls-count - Tests: 22 Tests fuer Klassifikations-Heuristik - Docs: Eigenes MkDocs-Kapitel mit Mermaid-Diagramm Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-25 21:53:40 +01:00
Benjamin Admin	a29bfdd588	fix: normative_strength 'may' statt 'can' (DB-Constraint) CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 34s Details CI/CD / test-python-backend-compliance (push) Successful in 30s Details CI/CD / test-python-document-crawler (push) Successful in 19s Details CI/CD / test-python-dsms-gateway (push) Successful in 17s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Has been skipped Details DB-Constraint erlaubt nur must/should/may. 'can' gibt es nicht. Alle Referenzen auf 'can' durch 'may' ersetzt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-25 08:35:16 +01:00
Benjamin Admin	230fbeb490	feat: Dreistufenmodell normative Verbindlichkeit + Duplikat-Filter + Auto-Deploy - Source-Type-Klassifikation (58 Regulierungen: law/guideline/framework) - Backfill-Endpoint POST /controls/backfill-normative-strength - exclude_duplicates Filter fuer Control-Library (Backend + Proxy + UI-Toggle) - MkDocs-Kapitel: Normative Verbindlichkeit mit Mermaid-Diagrammen - scripts/deploy.sh: Auto-Push + Mac Mini rebuild + Coolify health monitoring - 26 Unit Tests fuer Klassifikations-Logik Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-25 08:18:00 +01:00
Benjamin Admin	6d3bdf8e74	feat: Control-Detail Provenance + Atomare Controls Seite CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 41s Details CI/CD / test-python-backend-compliance (push) Successful in 40s Details CI/CD / test-python-document-crawler (push) Successful in 23s Details CI/CD / test-python-dsms-gateway (push) Successful in 18s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Successful in 4s Details Backend: provenance endpoint (obligations, doc refs, merged duplicates, regulations summary) + atomic-stats aggregation endpoint. Frontend: ControlDetail mit Provenance-Sektionen, klickbare Navigation, neue /sdk/atomic-controls Seite mit Stats-Bar und gefilterer Liste. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-24 10:38:34 +01:00
Benjamin Admin	770f0b5ab0	fix: adapt batch dedup to NULL pattern_id — group by merge_group_hint CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 31s Details CI/CD / test-python-backend-compliance (push) Successful in 31s Details CI/CD / test-python-document-crawler (push) Successful in 21s Details CI/CD / test-python-dsms-gateway (push) Successful in 19s Details CI/CD / validate-canonical-controls (push) Successful in 10s Details CI/CD / Deploy (push) Successful in 2s Details All Pass 0b controls have pattern_id=NULL. Rewritten to: - Phase 1: Group by merge_group_hint (action:object:trigger), 52k groups - Phase 2: Cross-group embedding search for semantically similar masters - Qdrant search uses unfiltered cross-regulation endpoint - API param changed: pattern_id → hint_filter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-24 07:24:02 +01:00
Benjamin Admin	35784c35eb	feat: Batch Dedup Runner — 85k→~18-25k Master Controls CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 32s Details CI/CD / test-python-backend-compliance (push) Successful in 30s Details CI/CD / test-python-document-crawler (push) Successful in 20s Details CI/CD / test-python-dsms-gateway (push) Successful in 16s Details CI/CD / validate-canonical-controls (push) Successful in 9s Details CI/CD / Deploy (push) Successful in 1s Details Adds batch orchestration for deduplicating ~85k Pass 0b atomic controls into ~18-25k unique masters with M:N parent linking. New files: - migrations/078_batch_dedup.sql: merged_into_uuid column, perf indexes, link_type CHECK extended for cross_regulation - batch_dedup_runner.py: BatchDedupRunner with quality scoring, merge-hint grouping, title-identical short-circuit, parent-link transfer, and cross-regulation pass - tests/test_batch_dedup_runner.py: 21 tests (all passing) Modified: - control_dedup.py: optional collection param on Qdrant functions - crosswalk_routes.py: POST/GET batch-dedup endpoints Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-24 07:06:38 +01:00
Benjamin Admin	cce2707c03	fix: update 61 outdated test mocks to match current schemas CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 41s Details CI/CD / test-python-backend-compliance (push) Successful in 31s Details CI/CD / test-python-document-crawler (push) Successful in 21s Details CI/CD / test-python-dsms-gateway (push) Successful in 16s Details CI/CD / validate-canonical-controls (push) Successful in 10s Details CI/CD / Deploy (push) Successful in 4s Details Tests were failing due to stale mock objects after schema extensions: - DSFA: add _mapping property to _DictRow, use proper mock instead of MagicMock - Company Profile: add 6 missing fields (project_id, offering_urls, etc.) - Legal Templates/Policy: update document type count 52→58 - VVT: add 13 missing attributes to activity mock - Legal Documents: align consent test assertions with production behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-24 06:40:42 +01:00
Benjamin Admin	e6201d5239	feat: Anti-Fake-Evidence System (Phase 1-4b) Implement full evidence integrity pipeline to prevent compliance theater: - Confidence levels (E0-E4), truth status tracking, assertion engine - Four-Eyes approval workflow, audit trail, reject endpoint - Evidence distribution dashboard, LLM audit routes - Traceability matrix (backend endpoint + Compliance Hub UI tab) - Anti-fake badges, control status machine, normative patterns - 2 migrations, 4 test suites, MkDocs documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 17:15:45 +01:00
Benjamin Admin	48ca0a6bef	feat: Framework Decomposition Engine + Composite Detection for Pass 0b Adds a routing layer between Pass 0a and Pass 0b that classifies obligations into atomic/compound/framework_container. Framework-container obligations (e.g. "CCM-Praktiken fuer AIS") are decomposed into concrete sub-obligations via an internal framework registry before Pass 0b composition. - New: framework_decomposition.py with routing, matching, decomposition - New: Framework registry (NIST SP 800-53, OWASP ASVS, CSA CCM) as JSON - New: Composite detection flags on atomic controls (is_composite, atomicity) - New: gen_meta fields: framework_ref, framework_domain, decomposition_source - Integration: _route_and_compose() in run_pass0b() deterministic path - 248 tests (198 decomposition + 50 framework), all passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 12:11:55 +01:00
Benjamin Admin	1a63f5857b	feat: Deterministic Control Composition Engine v2 for Pass 0b CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 38s Details CI/CD / test-python-backend-compliance (push) Successful in 39s Details CI/CD / test-python-document-crawler (push) Successful in 26s Details CI/CD / test-python-dsms-gateway (push) Successful in 21s Details CI/CD / validate-canonical-controls (push) Successful in 13s Details CI/CD / Deploy (push) Successful in 3s Details Replace LLM-based Pass 0b with rule-based deterministic engine that composes atomic controls from obligation data without any LLM call. Engine features: - 24 action types (implement, configure, define, document, maintain, review, monitor, assess, audit, test, verify, validate, report, notify, train, restrict_access, encrypt, delete, retain, ensure, approve, remediate, perform, obtain) - 19 object classes (policy, procedure, register, record, report, technical_control, access_control, cryptographic_control, configuration, account, system, data, interface, role, training, incident, risk_artifact, process, consent) - Compound action splitting with no-split phrases - Title pattern: "{Object} {state_suffix}" (24 state suffixes) - Statement field: "{condition} {object} ist {trigger} {action}" - Pattern candidates for downstream categorization (26 specific combos + 24 action fallbacks) - Structured timing: deadline_hours + frequency extraction - Confidence scoring (0.3 base + action/object/trigger/template) - Merge group hints for dedup: "{action}:{norm_obj}:{trigger}" - Synonym-based object normalization (50+ German synonyms) - 16 specific (action_type, object_class) template overrides - Output validator with Pflichtfelder + Negativregeln + Warnregeln - All new fields serialized into gen_meta JSONB (no migration needed) Tests: 185 passed (33 new tests covering all engine components) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 11:05:48 +01:00
Benjamin Admin	ac6134ce6d	feat: control_parent_links population + traceability API + frontend - _write_atomic_control() now uses RETURNING id and inserts into control_parent_links (M:N) with source_regulation, source_article, and obligation_candidate_id parsed from parent's source_citation - New _parse_citation() helper for JSONB source_citation extraction - New GET /controls/{id}/traceability endpoint returning full chain: parent links with obligations, child controls, source_count - Backend: control_type filter (atomic/rich) for controls + count - Frontend: Rechtsgrundlagen section in ControlDetail showing all parent links per source regulation with obligation text + strength - Frontend: Atomic/Rich filter dropdown in Control Library list - Frontend: GenerationStrategyBadge recognizes 'pass0b' strategy - Tests: 3 new tests for parent_link creation + citation parsing, existing batch test mock updated for RETURNING clause Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 08:14:29 +01:00
Benjamin Admin	a14e2f3a00	feat(decomposition): add merge pass, enrichment, and Pass 0b refinements CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 51s Details CI/CD / test-python-backend-compliance (push) Successful in 34s Details CI/CD / test-python-document-crawler (push) Successful in 23s Details CI/CD / test-python-dsms-gateway (push) Successful in 20s Details CI/CD / validate-canonical-controls (push) Successful in 12s Details CI/CD / Deploy (push) Has been skipped Details Add obligation refinement pipeline between Pass 0a and 0b: - Merge pass: rule-based dedup of implementation-level duplicate obligations within the same parent control (Jaccard similarity on action+object) - Enrich pass: classify trigger_type (event/periodic/continuous) and detect is_implementation_specific from obligation text (regex-based, no LLM) - Pass 0b: skip merged obligations, cap severity for impl-specific, override category to 'testing' for test obligations - Migration 075: merged_into_id, trigger_type, is_implementation_specific - Two new API endpoints: merge-obligations, enrich-obligations - 30+ new tests (122 total, all passing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-21 22:27:09 +01:00
Benjamin Admin	643b26618f	feat: Control Library UI, dedup migration, QA tooling, docs CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 31s Details CI/CD / test-python-backend-compliance (push) Successful in 1m35s Details CI/CD / test-python-document-crawler (push) Successful in 20s Details CI/CD / test-python-dsms-gateway (push) Successful in 17s Details CI/CD / validate-canonical-controls (push) Successful in 10s Details CI/CD / Deploy (push) Has been skipped Details - Control Library: parent control display, ObligationTypeBadge, GenerationStrategyBadge variants, evidence string fallback - API: expose parent_control_uuid/id/title in canonical controls - Fix: DSFA SQLAlchemy 2.0 Row._mapping compatibility - Migration 074: control_parent_links + control_dedup_reviews tables - QA scripts: benchmark, gap analysis, OSCAL import, OWASP cleanup, phase5 normalize, phase74 gap fill, sync_db, run_job - Docs: dedup engine, RAG benchmark, lessons learned, pipeline docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-21 11:56:08 +01:00
Benjamin Admin	c52dbdb8f1	feat(rag): optimize RAG pipeline — JSON-Mode, CoT, Hybrid Search, Re-Ranking, Cross-Reg Dedup, chunk 1024 CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 42s Details CI/CD / test-python-backend-compliance (push) Successful in 1m38s Details CI/CD / test-python-document-crawler (push) Successful in 20s Details CI/CD / test-python-dsms-gateway (push) Successful in 17s Details CI/CD / validate-canonical-controls (push) Successful in 10s Details CI/CD / Deploy (push) Has been skipped Details Phase 1 (LLM Quality): - Add format=json to all Ollama payloads (obligation_extractor, control_generator, citation_backfill) - Add Chain-of-Thought analysis steps to Pass 0a/0b system prompts Phase 2 (Retrieval Quality): - Hybrid search via Qdrant Query API with RRF fusion + automatic text index (legal_rag.go) - Fallback to dense-only search if Query API unavailable - Cross-encoder re-ranking with BGE Reranker v2 (RERANK_ENABLED=false by default) - CPU-only PyTorch dependency to keep Docker image small Phase 3 (Data Layer): - Cross-regulation dedup pass (threshold 0.95) links controls across regulations - DedupResult.link_type field distinguishes dedup_merge vs cross_regulation - Chunk size defaults updated 512/50 → 1024/128 for new ingestions only - Existing collections and controls are NOT affected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-21 11:49:43 +01:00
Benjamin Admin	c3afa628ed	feat(sdk): vendor-compliance cross-module integration — VVT, obligations, TOM, loeschfristen Integrate the vendor-compliance module with four DSGVO modules to eliminate data silos and resolve the VVT processor tab's ephemeral state problem. - Reposition vendor-compliance sidebar from seq 4200 to 2500 (after VVT) - VVT: replace ephemeral ProcessorRecord state with Vendor-API fetch (read-only) - Obligations: add linked_vendor_ids (JSONB) + compliance check #12 MISSING_VENDOR_LINK - TOM: add vendor TOM-controls cross-reference table in overview tab - Loeschfristen: add linked_vendor_ids (JSONB) + vendor picker + document section - Migrations: 069_obligations_vendor_link.sql, 070_loeschfristen_vendor_link.sql - Tests: 12 new backend tests (125 total pass) - Docs: update obligations.md + vendors.md with cross-module integration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 13:59:43 +01:00
Benjamin Admin	4b1eede45b	feat(tom): audit document, compliance checks, 25 controls, canonical control mapping Phase A: TOM document HTML generator (12 sections, inline CSS, A4 print) Phase B: TOMDocumentTab component (org-header form, revisions, print/download) Phase C: 11 compliance checks with severity-weighted scoring Phase D: MkDocs documentation for TOM module Phase E: 25 new controls (63 → 88) in 13 categories Canonical Control Mapping (three-layer architecture): - Migration 068: tom_control_mappings + tom_control_sync_state tables - 6 API endpoints: sync, list, by-tom, stats, manual add, delete - Category mapping: 13 TOM categories → 17 canonical categories - Frontend: sync button + coverage card (Overview), drill-down (Editor), belegende Controls count (Document) - 20 tests (unit + API with mocked DB) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 11:56:53 +01:00
Benjamin Admin	2a70441eaa	feat(sdk): VVT master libraries, process templates, Loeschfristen profiling + document VVT: Master library tables (7 catalogs), 500+ seed entries, process templates with instantiation, library API endpoints + 18 tests. Loeschfristen: Baseline catalog, compliance checks, profiling engine, HTML document generator, MkDocs documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 11:56:25 +01:00
Benjamin Admin	f2819b99af	feat(pipeline): v3 — scoped control applicability + source_type classification CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 36s Details CI/CD / test-python-backend-compliance (push) Successful in 36s Details CI/CD / test-python-document-crawler (push) Successful in 27s Details CI/CD / test-python-dsms-gateway (push) Successful in 18s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Has been skipped Details Phase 4: source_type (law/guideline/standard/restricted) on source_citation - NIST/OWASP/ENISA correctly shown as "Standard" instead of "Gesetzliche Grundlage" - Dynamic frontend labels based on source_type - Backfill endpoint POST /v1/canonical/generate/backfill-source-type Phase v3: Scoped Control Applicability - 3 new fields: applicable_industries, applicable_company_size, scope_conditions - LLM prompt extended with 39 industries, 5 company sizes, 10 scope signals - All 5 generation paths (Rule 1/2/3, batch structure, batch reform) updated - _build_control_from_json: parsing + validation (string→list, size validation) - _store_control: writes 3 new JSONB columns - API: response models, create/update requests, SELECT queries extended - Migration 063: 3 new JSONB columns with GIN indexes - 110 generator tests + 28 route tests = 138 total, all passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-18 16:28:05 +01:00
Benjamin Admin	148c7ba3af	feat(qa): recital detection, review split, duplicate comparison CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 42s Details CI/CD / test-python-backend-compliance (push) Successful in 34s Details CI/CD / test-python-document-crawler (push) Successful in 21s Details CI/CD / test-python-dsms-gateway (push) Successful in 20s Details CI/CD / validate-canonical-controls (push) Successful in 12s Details CI/CD / Deploy (push) Has been skipped Details Add _detect_recital() to QA pipeline — flags controls where source_original_text contains Erwägungsgrund markers instead of article text (28% of controls with source text affected). - Recital detection via regex + phrase matching in QA validation - 10 new tests (TestRecitalDetection), 81 total - ReviewCompare component for side-by-side duplicate comparison - Review mode split: Duplikat-Verdacht vs Rule-3-ohne-Anchor tabs - MkDocs: recital detection documentation - Detection script for bulk analysis (scripts/find_recital_controls.py) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-18 08:20:02 +01:00
Benjamin Admin	a9e0869205	feat(pipeline): pipeline_version v2, migration 062, docs + 71 tests - Add PIPELINE_VERSION=2 constant and pipeline_version column to canonical_controls and canonical_processed_chunks (migration 062) - Anthropic API decides chunk relevance via null-returns (skip_prefilter) - Annex/appendix chunks explicitly protected in prompts - Fix 6 failing tests (CRYP domain, _process_batch tuple return) - Add TestPipelineVersion + TestRegulationFilter test classes (10 new tests) - Add MkDocs page: control-generator-pipeline.md (541 lines) - Update canonical-control-library.md with v2 pipeline diagram - Update testing.md with 71-test breakdown table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 17:31:11 +01:00
Benjamin Admin	36ef34169a	Fix regulation_filter bypass for chunks without regulation_code Chunks without a regulation_code were silently passing through the filter in _scan_rag(), causing unrelated documents (e.g. Data Act, legal templates) to be included in filtered generation jobs. Now chunks without reg_code are skipped when regulation_filter is active. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 13:38:25 +01:00
Benjamin Admin	d22c47c9eb	feat(pipeline): Anthropic Batch API, source/regulation filter, cost optimization CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 35s Details CI/CD / test-python-backend-compliance (push) Successful in 34s Details CI/CD / test-python-document-crawler (push) Successful in 22s Details CI/CD / test-python-dsms-gateway (push) Successful in 19s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Has been skipped Details - Add Anthropic API support to decomposition Pass 0a/0b (prompt caching, content batching) - Add Anthropic Batch API (50% cost reduction, async 24h processing) - Add source_filter (ILIKE on source_citation) for regulation-based filtering - Add category_filter to Pass 0a for selective decomposition - Add regulation_filter to control_generator for RAG scan phase filtering (prefix match on regulation_code — enables CE + Code Review focus) - New API endpoints: batch-submit-0a, batch-submit-0b, batch-status, batch-process - 83 new tests (all passing) Cost reduction: $2,525 → ~$600-700 with all optimizations combined. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 13:22:01 +01:00
Benjamin Admin	825e070ed9	feat(multi-layer): complete Multi-Layer Control Architecture (Phases 1-8 + Pass 0) CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 47s Details CI/CD / test-python-backend-compliance (push) Successful in 33s Details CI/CD / test-python-document-crawler (push) Successful in 24s Details CI/CD / test-python-dsms-gateway (push) Successful in 18s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Has been skipped Details Implements the full Multi-Layer Control Architecture for migrating ~25,000 Rich Controls into atomic, deduplicated Master Controls with full traceability. Architecture: Legal Source → Obligation → Control Pattern → Master Control → Customer Instance New services: - ObligationExtractor: 3-tier extraction (exact → embedding → LLM) - PatternMatcher: 2-tier matching (keyword + embedding + domain-bonus) - ControlComposer: Pattern + Obligation → Master Control - PipelineAdapter: Pipeline integration + Migration Passes 1-5 - DecompositionPass: Pass 0a/0b — Rich Control → atomic Controls - CrosswalkRoutes: 15 API endpoints under /v1/canonical/ New DB schema: - Migration 060: obligation_extractions, control_patterns, crosswalk_matrix - Migration 061: obligation_candidates, parent_control_uuid tracking Pattern Library: 50 YAML patterns (30 core + 20 IT-security) Go SDK: Pattern loader with YAML validation and indexing Documentation: MkDocs updated with full architecture overview 500 Python tests passing across all components. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 09:00:37 +01:00
Benjamin Admin	4f6bc8f6f6	feat(training+controls): interactive video pipeline, training blocks, control generator, CE libraries CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Failing after 37s Details CI/CD / test-python-backend-compliance (push) Successful in 39s Details CI/CD / test-python-document-crawler (push) Successful in 26s Details CI/CD / test-python-dsms-gateway (push) Successful in 23s Details CI/CD / validate-canonical-controls (push) Successful in 12s Details CI/CD / Deploy (push) Has been skipped Details Interactive Training Videos (CP-TRAIN): - DB migration 022: training_checkpoints + checkpoint_progress tables - NarratorScript generation via Anthropic (AI Teacher persona, German) - TTS batch synthesis + interactive video pipeline (slides + checkpoint slides + FFmpeg) - 4 new API endpoints: generate-interactive, interactive-manifest, checkpoint submit, checkpoint progress - InteractiveVideoPlayer component (HTML5 Video, quiz overlay, seek protection, progress tracking) - Learner portal integration with automatic completion on all checkpoints passed - 30 new tests (handler validation + grading logic + manifest/progress + seek protection) Training Blocks: - Block generator, block store, block config CRUD + preview/generate endpoints - Migration 021: training_blocks schema Control Generator + Canonical Library: - Control generator routes + service enhancements - Canonical control library helpers, sidebar entry - Citation backfill service + tests - CE libraries data (hazard, protection, evidence, lifecycle, components) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 21:41:48 +01:00
Benjamin Admin	c8fd9cc780	feat(control-library): document-grouped batching, generation strategy tracking, sort by source CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 31s Details CI/CD / test-python-backend-compliance (push) Successful in 31s Details CI/CD / test-python-document-crawler (push) Successful in 21s Details CI/CD / test-python-dsms-gateway (push) Successful in 18s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Successful in 2s Details - Group chunks by regulation_code before batching for better LLM context - Add generation_strategy column (ungrouped=v1, document_grouped=v2) - Add v1/v2 badge to control cards in frontend - Add sort-by-source option with visual group headers - Add frontend page tests (18 tests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 15:10:52 +01:00

1 2

92 Commits