# Product Integration Specification **Status:** Design Draft **Authors:** Sharang, Benjamin **Date:** 2026-05-11 **Companion docs:** PLATFORM_ARCHITECTURE.md, INFRASTRUCTURE.md **Contract version:** 1.0 --- ## 1. Purpose This document defines the contract that **every product** must implement to be sold on the platform as a B2B building block. The contract is enforced by the Tenant Registry, Customer Portal, and Orca deployment pipeline. A product that does not implement the contract cannot be activated for a tenant. The contract is designed to be **first-party today, third-party-ready tomorrow** — the technical surface is identical for our own products and any future external developers, with stricter verification gates for the latter. --- ## 2. Core Principles ``` 1. ONE TENANT, ONE TRUTH Every request is scoped by org_id from the JWT. Cross-tenant data leakage is the single largest commercial risk; the spec treats it as a contract violation. 2. PLATFORM OWNS IDENTITY, BILLING, ROUTING Products NEVER implement their own login, never store passwords, never invoice customers directly. These are platform concerns. 3. PRODUCTS OWN THEIR DOMAIN AND DATA Products own their database, their data model, their backup, their RTO/RPO. No cross-product database sharing. Composition is via APIs, not via DB joins. 4. STATELESS APPLICATIONS, STATEFUL DATA STORES Application containers are replaceable in seconds. State lives in databases that have explicit backup contracts. 5. CONTRACT EVOLVES, PRODUCTS DECLARE COMPATIBILITY Products declare which contract version they implement. The platform supports N and N-1; deprecation is announced before removal. ``` --- ## 3. Required Surfaces A product is composed of five surfaces. Three are mandatory, one is tier-gated, one is mandatory documentation. ``` ┌──────────────────────┬──────────────────────────────────────────┬───────────────┐ │ Surface │ What │ Requirement │ ├──────────────────────┼──────────────────────────────────────────┼───────────────┤ │ Backend API │ REST + OpenAPI 3.0 spec │ REQUIRED │ │ Frontend │ Web component (custom element) │ REQUIRED │ │ MCP Server │ MCP server exposing tenant-scoped tools │ REQUIRED for │ │ │ │ Enterprise │ │ │ │ tier; opt for │ │ │ │ Starter/Pro │ │ Documentation │ README, API docs, integration guide, │ REQUIRED │ │ │ runbook, data model, GDPR retention │ │ │ Observability │ /health, /metrics, structured logs, │ REQUIRED │ │ │ audit event emission │ │ └──────────────────────┴──────────────────────────────────────────┴───────────────┘ ``` --- ## 4. Backend API Contract ### 4.1 Mandatory Endpoints Every product backend must implement these endpoints. The Tenant Registry health-checks them on deploy; any missing endpoint blocks the registration. ``` GET /health Returns 200 if healthy, 503 if unhealthy. Body: {"status": "ok"|"degraded"|"down", "checks": {"db": "ok", "deps": "ok"}} Authentication: NONE (Orca probe) GET /version Returns product version and contract version it implements. Body: {"product": "certifai", "version": "1.4.2", "contract": "1.0", "build": "", "deployed_at": "2026-05-10T..."} Authentication: NONE GET /v1/usage Query: ?tenant_id=&from=&to=&project_id= Returns billing-relevant usage metrics for a tenant (and optional project). Body: {"tenant_id": "...", "project_id": "...", "period": {...}, "metrics": {"seats_active": 12, "api_calls": 14203, ...}} Authentication: SERVICE TOKEN (called by billing job) Note: products with high-cardinality usage (LLM tokens, etc.) SHOULD also stream per-event metering to /internal/usage/events on Tenant Registry. Event shape is Lago-compatible (transaction_id, code, external_subscription_id, properties) so we can swap to a Lago instance later without changing producers. POST /v1/tenants/{id}/provision Body: {"plan": "...", "config": {...}, "contract_version": "1.0"} Initializes tenant-specific resources (schemas, default data, queues). Must be idempotent: a second call with same params is a no-op. Authentication: SERVICE TOKEN (called by Tenant Registry) POST /v1/tenants/{id}/suspend Soft-suspend: data retained, all customer access blocked. Authentication: SERVICE TOKEN POST /v1/tenants/{id}/reactivate Reverse of suspend. Authentication: SERVICE TOKEN POST /v1/tenants/{id}/terminate Hard terminate: schedules data for erasure per retention policy. Body: {"reason": "...", "scheduled_erasure_at": "..."} Authentication: SERVICE TOKEN POST /v1/tenants/{id}/export GDPR Article 20 (data portability) export for a tenant. Returns: signed URL to a ZIP containing all tenant data in JSON + binary blobs. Authentication: USER JWT (IT_ADMIN or LEGAL role) DELETE /v1/tenants/{id}/data GDPR Article 17 (right to erasure) full deletion. Body: {"confirm": ""} ← safety check Authentication: SERVICE TOKEN + USER JWT (IT_ADMIN signing off) ``` ### 4.2 Authentication Modes ``` USER JWT Bearer token issued by Keycloak for a user session. Contains: sub, org_id, org_roles, products, plan. Validated via Keycloak JWKS endpoint. Used for: all customer-facing endpoints. SERVICE TOKEN Short-lived JWT issued by Keycloak via OAuth 2.0 client_credentials flow. Each service has a Keycloak client (id: certifai-svc, compliance-svc, etc.) with declared scopes. Used for: platform-to-product calls (provisioning), product-to-product calls (inter-product API). TTL: 15 minutes max. ORCA PROBE No authentication. Local network only. Used for: /health, /version (Orca polls these). Must not leak tenant data. ``` ### 4.3 Tenant Scoping Rules ``` 1. EVERY non-probe endpoint extracts tenant context from JWT or path. USER JWT → tenant = jwt.org_id SERVICE TOKEN → tenant = path parameter, validated against service scopes 2. EVERY query to the product database includes WHERE tenant_id = $1. No exceptions. Code review enforces this; tests verify it. 3. EVERY response includes only data for the requested tenant. The product asserts this invariant in middleware (defense in depth). 4. EVERY log line and audit event includes tenant_id as a structured field. ``` ### 4.4 OpenAPI Spec Every product publishes `openapi.yaml` at `/openapi.yaml`. The Tenant Registry pulls this on product registration and validates that the mandatory endpoints from §4.1 are present with correct signatures. ``` Product OpenAPI must: - Be valid OpenAPI 3.0 (3.1 not yet — tooling gap) - Include all mandatory endpoints from §4.1 - Document all custom endpoints with examples - Declare authentication mode for each endpoint - Declare scopes consumed (for SERVICE TOKEN endpoints) - Include error response schemas (4xx, 5xx) ``` --- ## 5. Frontend Contract A product declares its frontend type in the manifest. The portal renders accordingly. Three types are supported: ``` interactive Full UI shipped as a web component custom element. Customer OPERATES the product through this UI. Examples: CERTifAI, breakpilot-compliance, classic SaaS products. widget Only a small dashboard tile component; no full product page. Customer SEES product output in a tile; deeper management happens on a portal-rendered config page. Examples: monitoring, status reporting. headless No frontend code at all. The portal renders a generic management UI from a portal_config block in the manifest. Customer CONFIGURES (API keys, webhooks) and their own systems consume the product via API/MCP. Examples: notetaker bot, document classifier, webhook router. ``` The portal branches its rendering on `manifest.frontend.type`. Backend, MCP, observability, and lifecycle contracts are identical across all three types — only the customer-facing surface changes. ### 5.A Interactive (Web Component) The frontend is a **custom element** registered with a product-specific tag name. The Customer Portal loads the bundle and renders the element with attributes passed in by the portal. #### 5.A.1 Why web components Our products span Rust/Dioxus, Next.js, Go, Python. Web components are the only framework-agnostic surface that lets all of these ship a frontend without forcing a stack rewrite: - CERTifAI compiles Dioxus → WASM → wraps in a custom element - breakpilot-compliance wraps React components via `@r2wc/react-to-web-component` - Any future Vue/Svelte/Solid product also works #### 5.A.2 The Tag Contract Each product declares ONE primary tag in its manifest. The portal renders it like this: ```html ``` Attributes the portal passes (the product MUST handle these): ``` tenant tenant slug (acme) tenant-id tenant UUID (uuid-acme) jwt short-lived JWT (≤ 5 min), product validates against Keycloak JWKS locale en / de / fr / es / pt theme light / dark api-base backend URL the product should call audit-callback-url URL to POST audit events to (portal-relative) ``` #### 5.A.3 Events the Product Must Emit The component emits events upward via `CustomEvent`. The portal listens for these and integrates them: ``` breakpilot:navigate Detail: {path: "/sub/route", title: "Page Title"} Portal updates browser URL + breadcrumb without reloading. breakpilot:error Detail: {code: "...", message: "...", recoverable: true|false} Portal shows toast / blocking error. breakpilot:audit Detail: {action: "...", target: "...", metadata: {...}} Portal forwards to central audit log via audit-callback-url. breakpilot:loading Detail: {state: "start"|"end", description: "Generating DSFA..."} Portal shows progress indicator. breakpilot:request-upgrade Detail: {feature: "...", required_plan: "enterprise"} Portal opens upgrade-quote flow. ``` #### 5.A.4 Design System Compatibility The platform publishes `@breakpilot/design-tokens` (CSS variables, fonts, spacing). Products are encouraged but not required to consume it. The portal injects design tokens into the shadow DOM root so consuming them is a single CSS line: ```css :host { color: var(--bp-text); background: var(--bp-surface); } ``` Products that ship custom styling must respect the `theme` attribute and the prefers-color-scheme media query. #### 5.A.5 Bundle Loading ``` Product publishes a bundle at: https://cdn.breakpilot.com/products/{name}/{version}/element.js Portal loads it lazily via dynamic import when the user navigates to /[tenant]/products/{name}. Portal caches the bundle URL per product version (declared in tenant_products.config). Bundle size budget: ≤ 500KB gzipped for first load. ``` ### 5.B Widget A widget product declares ONE custom element that renders only as a dashboard tile. It receives the same attributes as interactive products but emits no `breakpilot:navigate` events — clicking the tile takes the user to a portal-rendered config page (same surface as headless products in §5.C). ```html ``` Constraints: ``` Bundle size budget ≤ 50KB gzipped (widgets load eagerly on dashboard) Dimensions declared in manifest (e.g., 200×120 or 400×240) Refresh widget polls own API; portal does not push updates Allowed events breakpilot:error, breakpilot:audit, breakpilot:request-upgrade (NOT breakpilot:navigate — click-through is portal-controlled) ``` API keys, webhooks, and full management UI for widget products use the same portal-rendered config page as headless products (§5.C). ### 5.C Headless The product ships NO frontend code. The portal renders a generic management UI from a `portal_config` block in the manifest. This page is served at `/[tenant]/products/{name}` and contains the same elements regardless of which product it is — populated entirely from manifest data. #### 5.C.1 What the Portal Renders ``` ┌──────────────────────────────────────────────────────────┐ │ Notetaker [Status: OK] │ │ ────────────────────────────────────────────────────────│ │ │ │ USAGE (last 30 days) │ │ ┌──────────────────────────────────────────────────┐ │ │ │ 142 sessions processed ▁▃▆█▆▄▂▃▅▆▄▂▁▂▃▄▅▆▇█ │ │ │ └──────────────────────────────────────────────────┘ │ │ │ │ API KEYS [+ Generate] │ │ ──────────────────────────────────────────────────── │ │ • prod-key k_xxx...4f12 scopes: r,w 2026-01-04│ │ • staging-key k_xxx...9a83 scopes: r 2026-04-22│ │ │ │ WEBHOOKS [+ Add] │ │ ──────────────────────────────────────────────────── │ │ • https://acme.example.com/notetaker/cb │ │ events: session.completed, session.failed │ │ last 24h: 142 delivered, 0 failed [Test] │ │ │ │ CODE SAMPLES │ │ ──────────────────────────────────────────────────── │ │ [curl] [JS] [Python] │ │ curl -X POST https://notetaker-api.breakpilot.com/v1 │ │ -H "Authorization: ApiKey k_xxx" │ │ -H "X-Tenant: acme" │ │ -d '{...}' │ │ │ │ DOCS ► developers.breakpilot.com/products/notetaker │ └──────────────────────────────────────────────────────────┘ ``` #### 5.C.2 Manifest Requirement A headless product must include a `portal_config` block declaring: - **sections**: which UI sections to render (subset of: `api_keys`, `webhooks`, `usage`, `docs`, `code_samples`, `custom_actions`) - **webhook_events**: the catalog of events the product can emit - **api_key_scopes**: the catalog of scopes that can be granted on a key - **code_samples**: at least one language with a working request example - **status_endpoint**: optional URL for the portal to poll for the status badge See §10.1 for the full schema. #### 5.C.3 API Keys API keys are a portal concern, not a product concern. Tenant Registry generates and stores key hashes; the product validates incoming keys against `POST /internal/api-keys/verify` on Tenant Registry. This means: - Key rotation is portal-controlled - Scope enforcement is consistent across all headless products - Revocation is instant (registry updates a single row) #### 5.C.4 Webhooks The portal owns webhook configuration UI and delivery logging. Products POST event payloads to a portal endpoint (`/internal/webhooks/dispatch`); the portal handles signing, delivery, retry, dead-letter, and the customer-visible delivery log. This keeps webhook UX consistent across all headless products and means a product cannot accidentally leak events from one tenant to another's webhook URL. #### 5.C.5 No Impersonation Backstage shows no "Impersonate" button for headless products — there is no UI to enter. Debugging is via API call logs, audit events, webhook delivery history, and admin actions declared in the manifest (e.g., "Flush Queue", "Rotate Keys", "Reset State"). --- ## 6. MCP Server (Required for Enterprise) ### 6.1 What it is An MCP (Model Context Protocol) server exposes the product's capabilities as tools that customer-side AI agents can call. The customer's IT Admin configures the MCP endpoint in their AI agent platform (Claude Desktop, Cursor, internal agents, etc.). ### 6.2 Required Behavior ``` 1. ONE MCP server per product Endpoint: https://mcp.{product}.breakpilot.com (or unified mcp.breakpilot.com/{product}) 2. Authentication via SCOPED API KEY Customer IT Admin generates API key in /[tenant]/settings/api-keys. Key carries tenant_id binding and scopes (read/write per product domain). No user JWT for MCP — agents authenticate as the org, not as a user. 3. Tools are tenant-scoped Every tool call uses the API key's tenant_id binding. Cross-tenant calls are impossible by construction. 4. Tool catalog declared in manifest Each tool: name, description, parameters (JSONSchema), required_scopes. 5. Audit every tool call Emit breakpilot:audit-equivalent server-side: actor=api_key_id, action=tool_name, metadata=parameters. ``` ### 6.3 Example Tool Catalogs ``` CERTifAI MCP tools: list_ai_agents → returns agents configured for this tenant get_llm_usage → returns LiteLLM usage for date range run_news_search → SearXNG search list_chat_sessions → user's chat history Compliance MCP tools: create_dsfa → starts a DSFA workflow check_tom_status → returns TOM compliance status list_dsr_requests → returns open Data Subject Requests approve_dsfa → marks DSFA as approved list_ai_act_assessments → returns AI Act assessments ``` ### 6.4 Activation Enterprise customers automatically get MCP enabled. Starter/Pro customers see "Available on Enterprise" in the API Keys page. Tenant Registry checks `tenant.plan` before issuing an MCP API key. --- ## 7. Documentation Contract A product ships five required documents. They are published at `developers.breakpilot.com/products/{name}/`. ``` 1. README What does it do? Value prop in 200 words. Who is the typical user? What are the workflows? 2. API Reference Auto-generated from openapi.yaml. Hosted via Redoc or Stoplight Elements. 3. Integration Guide For customer IT teams. How to: - Enable the product on their tenant - Configure SSO and roles - Wire into their workflows - Use the MCP server (if applicable) - Generate and manage API keys 4. Operational Runbook For us. How to: - Deploy a new version - Roll back - Debug a stuck tenant - Reset tenant state - Investigate slow queries 5. Data Model + GDPR What data is stored, in which table/collection, personal data category (Art. 9 special category?), retention period, GDPR lawful basis. Used by customer DPOs for their own Verzeichnis. ``` --- ## 8. Observability Contract ### 8.1 Health Check ``` GET /health returns: 200 {"status": "ok", "checks": {...}} — all good 200 {"status": "degraded", "checks": {...}, "reason"} — degraded but serving 503 {"status": "down", "checks": {...}, "reason"} — restart me ``` Orca polls every 30s. Three consecutive 503s triggers automatic restart. ### 8.2 Metrics ``` GET /metrics returns Prometheus exposition format. Required metrics: bp_http_requests_total{method, route, status, tenant_id} bp_http_request_duration_seconds{method, route, tenant_id} bp_active_tenants_gauge bp_db_query_duration_seconds{operation} bp_external_api_calls_total{provider, status} (LLM calls, etc.) ``` ### 8.3 Structured Logging ``` All logs are JSON. All log lines include: ts ISO-8601 timestamp level debug|info|warn|error service product name (certifai) tenant_id tenant UUID (or "system" for non-tenant ops) user_sub user UUID if applicable request_id trace ID msg human-readable message ... additional structured fields No PII in logs (use the PII redaction middleware from breakpilot-core). ``` ### 8.4 Audit Events Audit events go to the central audit log in Tenant Registry. Products emit them via POST to the audit-callback-url passed by the portal (frontend) or directly to Tenant Registry API (backend). ``` Event format (Retraced-shape — transformable 1:1 if we swap to BoxyHQ Retraced later): { "tenant_id": "uuid-acme", # → Retraced "group.id" "project_id": "uuid-prod" | null, # optional sub-tenancy scope "product": "certifai", # which product emitted "actor": { "id": "user-uuid" | "svc:certifai" | "api_key:keyid", "type": "user" | "service" | "api_key", "name": "alice@acme.com" }, "action": "dsfa.approve", # dotted: . "crud": "u", # c|r|u|d "target": { "id": "", "type": "dsfa" | "llm_config" | ..., "name": "" }, "source_ip": "1.2.3.4", "description": "Alice approved DSFA #42 for Customer Data Processing", "fields": {...}, # additional structured metadata "created_at": "2026-05-11T14:23:01Z" } Mandatory event categories per product: config changes everything in product settings data exports anyone exporting tenant data data deletions erasures and bulk deletes permission changes role grants/revocations within product approvals business-significant approvals (DSFA, etc.) cross-product calls service-token calls into other products (auto-emitted by both caller and callee, with on_behalf_of in fields) ``` The portal /audit page renders these events filtered by tenant + product + actor + action + time range. The schema is intentionally Retraced-compatible so the storage layer can be swapped without changing producers. --- ## 9. Plane-by-Plane Integration Requirements ### 9.1 Identity Plane ``` [REGISTRATION] - Register an OIDC client in Keycloak (id: {product}-client) Confidential, client_credentials grant for service tokens, authorization_code grant if product has its own UI flows. - Declare role mappings in product manifest: role_mappings: IT_ADMIN: Admin LEGAL: Auditor FINANCE: ReadOnly USER: Member - Declare an entitlement key (e.g. "certifai") that goes into JWT products claim. [RUNTIME] - Validate JWT via Keycloak JWKS endpoint (cache JWKS for 5 min). - Reject if products claim does not include this product's entitlement key. - Reject if iss is not the platform Keycloak. - Reject if exp expired or nbf future. [NEVER] - Never validate JWT against a static secret. JWKS only. - Never issue tokens. Never accept passwords. Never store credentials. ``` ### 9.2 Control Plane ``` [REGISTRATION] - On first deploy, product POSTs to Tenant Registry: POST /catalog/products body: manifest (see §10) Tenant Registry verifies the manifest, pulls openapi.yaml, validates mandatory endpoints, registers the product. - Product appears in Backstage product picker when creating sales orders. [LIFECYCLE] - On tenant.activate: Tenant Registry calls product /v1/tenants/{id}/provision - On tenant.suspend: calls /suspend - On tenant.churn: calls /terminate - On contract.renew: no call (idempotent: just stays active) [USAGE METERING] - Tenant Registry runs a daily job hitting product /v1/usage for billing. - Product is responsible for accurate metering and idempotent reporting. [BACKSTAGE ACTIONS] - Product declares custom admin actions in manifest: admin_actions: - name: "Rebuild RAG Index" endpoint: POST /v1/tenants/{id}/admin/rebuild-rag confirm: required plane: data - Backstage renders these as buttons on /backstage/tenants/{id}/products/{name} - Calls are SERVICE TOKEN authenticated and audit-logged. [AUDIT EVENTS] - Product POSTs all audit events to Tenant Registry /audit endpoint. - Tenant Registry stores them in audit_log table for cross-product unified view. ``` ### 9.3 Data Plane ``` [DATA OWNERSHIP] - Product owns its database. No other service queries it directly. - Cross-product composition is via the inter-product service-token API (§11), never via shared DB connections. [ISOLATION] - Every table/collection has a tenant_id (or org_id) column. - Every query filters by it. - Database user permissions cannot bypass it. [PROJECT SCOPING — OPTIONAL] - Products MAY support sub-tenancy via projects (mirrors GCP Project / AWS Account pattern). Allows customers to separate dev / staging / prod or per-team data within a single tenant. - Declared in manifest: data: supports_projects: true - Implementation: - All tenant-scoped tables/collections add project_id column. - Compound unique constraints become (tenant_id, project_id, key). - All endpoints accept optional ?project_id=; absence means the tenant's default implicit project. - JWT may carry an active project_id claim; products SHOULD respect it if present. - Reference implementation: breakpilot-compliance already uses this pattern (sdk_states UNIQUE on (tenant_id, project_id) since March 2026). - Products that do NOT support projects must still gracefully ignore project_id parameters (return tenant-wide data). [TENANT LIFECYCLE CONTRACT] Products MUST honor the tenant.status passed in the JWT (`tenant_status` custom claim) and behave per the table below. See PLATFORM_ARCHITECTURE.md P15 + P16 for the full state machine. ┌──────────┬───────────────────────────────────────────────────────────┐ │ status │ Product behavior │ ├──────────┼───────────────────────────────────────────────────────────┤ │ demo │ Accept all calls. Apply NO billing meter. Honor │ │ │ /v1/tenants/demo/reset (idempotent). Seed from │ │ │ catalog.demo.seed_data_url. Audit emitted but tagged │ │ │ {"demo": true} so portal can hide from real audit. │ ├──────────┼───────────────────────────────────────────────────────────┤ │ trial │ Accept all calls up to catalog.trial_quota; over quota │ │ │ return 429 with header X-Trial-Limit-Reset. Show "Trial" │ │ │ context in any product UI banner area provided by host. │ ├──────────┼───────────────────────────────────────────────────────────┤ │ active │ Normal operation. │ ├──────────┼───────────────────────────────────────────────────────────┤ │ frozen │ Per data.frozen_behavior in manifest (typically reads │ │ │ allowed, writes return 402, background jobs paused). │ │ │ /export MUST work; webhook deliveries MUST stop. │ ├──────────┼───────────────────────────────────────────────────────────┤ │ archived │ All API calls return 410 Gone. Data already deleted by │ │ │ the offboarding step; this state is for audit only. │ └──────────┴───────────────────────────────────────────────────────────┘ Products MUST implement: - GET /v1/tenants/{id}/export Returns one ZIP per tenant containing every format declared in data.offboarding_export_formats. Synchronous OK if <60s; async with signed URL otherwise. - DELETE /v1/tenants/{id}/data Removes all tenant data within 30 days. Audit log retained separately (see §8.4). Idempotent. - POST /v1/tenants/demo/reset Restores seed data. Only callable from the portal service token. [BACKUP CONTRACT] - Product declares in manifest: backup: data_stores: [postgres, qdrant, minio] rpo: 6h rto: 30min retention_days: 30 - Infra Plane executes backups per declaration (pg_dump, etc.). - Product publishes restore procedure in operational runbook. [GDPR ENDPOINTS] - /v1/tenants/{id}/export returns ALL data for the tenant (JSON + blobs in ZIP). - DELETE /v1/tenants/{id}/data deletes everything within 30 days of call. - Both endpoints emit audit events. [DATA RESIDENCY] - All data stays in EU (database, object storage, cache). - Product declares any external data flows (e.g., LLM calls to OpenAI EU endpoint) in the data model documentation. ``` ### 9.4 Infra Plane ``` [IaC] - Orca manifest at: /orca/manifests/{vm}/{product}.toml - Manifest declares: image, resource limits, health check, secret refs, network rules, replicas, restart policy. - Changes go through Gitea PR → Gitea Actions → Orca apply. [SECRETS] - All secrets via Infisical machine identity. - Secret path namespacing: /prod/{product}/{KEY} - Manifest references paths, never values: secrets: DB_URL: /prod/certifai/MONGODB_URI LLM_KEY: /prod/certifai/LITELLM_MASTER_KEY - Bootstrap secrets (DB URIs for Keycloak only) are the lone exception. [NETWORKING] - Product services bind only to the private network. - Public-facing routes pass through Orca-Proxy. - Inter-product calls use internal DNS names (e.g., certifai.internal:8080). [BUILD + DEPLOY] - Dockerfile in product repo root. - Gitea Actions pipeline: fmt → lint → test → build → push → orca apply → e2e - Image tagged with git SHA + semver. [COLD START] - Product declares startup dependencies in manifest: depends_on: [keycloak, postgres-app, infisical] - Orca enforces ordering on full restart (see INFRASTRUCTURE.md §10 Scenario F). ``` --- ## 10. Product Manifest The canonical declaration of a product, used by Tenant Registry, Orca, and Backstage. One file, committed to product repo, applied via deployment pipeline. ```yaml # product.manifest.yaml schema_version: "1.0" product: id: certifai name: "CERTifAI" description: "Self-hosted GDPR-compliant AI infrastructure dashboard" vendor: breakpilot # we; future third-parties will use their slug contract_version: "1.0" product_version: "1.4.2" repo: git.breakpilot.com/sharang/certifai catalog: # Renders in /[tenant]/catalog and /backstage/products category: "AI Infrastructure" # AI Infrastructure | Compliance | Productivity | Security | Data tagline: "GDPR-compliant LLMs without leaving the EU" hero_image: https://cdn.breakpilot.com/products/certifai/hero.png screenshots: - https://cdn.breakpilot.com/products/certifai/dashboard.png - https://cdn.breakpilot.com/products/certifai/agents.png pricing_summary: "From €X/seat/month — included on Professional and Enterprise plans" available_on_plans: [trial, professional, enterprise] # 'trial' opt-in for self-serve trial_days: 14 trial_quota: # caps applied while tenant.status == trial llm_tokens_per_day: 100_000 api_calls_per_day: 10_000 works_well_with: [compliance] # cross-product affinity; surfaced in catalog depends_on_products: [] # hard dependencies (rare; for compositions) demo: supported: true # MUST be true unless explicitly waived seed_data_url: https://cdn.breakpilot.com/products/certifai/demo/seed-v3.tar.gz reset_endpoint: /v1/tenants/demo/reset # called nightly by portal cron persona_hints: # for sales rep talk track - "GDPR officer at a 200-person SaaS" - "CTO replacing OpenAI calls with EU-hosted LLMs" identity: oidc_client_id: certifai-client entitlement_key: certifai role_mappings: IT_ADMIN: Admin CXO: Member FINANCE: Viewer LEGAL: Viewer USER: Member required_scopes: - read:agents - write:agents - read:usage frontend: type: interactive # interactive | widget | headless tag: certifai-dashboard bundle_url: https://cdn.breakpilot.com/products/certifai/{version}/element.js bundle_size_kb: 380 routes: - path: / label: "Dashboard" - path: /agents label: "AI Agents" required_role: Member - path: /providers label: "Providers" required_role: Admin backend: openapi_url: /openapi.yaml base_url: https://certifai-api.internal/v1 health_url: /health service_token_audience: certifai-svc mcp: enabled: true required_plan: enterprise endpoint: https://mcp.breakpilot.com/certifai tools: - name: list_ai_agents description: "Returns AI agents configured for the tenant" required_scope: read:agents - name: get_llm_usage description: "Returns LLM usage metrics" required_scope: read:usage # ... more tools data: data_stores: - type: mongodb vm: vm-certifai - type: external_api provider: litellm pii_class: low tenant_scoping: field: org_id enforcement: middleware supports_projects: false # see §9.3 PROJECT SCOPING retention_default_days: 365 gdpr_export: /v1/tenants/{id}/export gdpr_erasure: /v1/tenants/{id}/data offboarding_export_formats: [json, csv] # produced by P16 final-export step frozen_behavior: reads: allow # customer can still pull data / download exports writes: deny_402 # POST/PUT/DELETE return 402 Payment Required background_jobs: pause # scheduled work suspended, queue preserved backup: rpo: 24h rto: 30min retention_days: 30 infra: image: registry.breakpilot.com/certifai-dashboard vm: vm-certifai replicas: 1 resource_limits: cpu: "2000m" memory: "4Gi" health_check: path: /health interval: 30s timeout: 5s threshold: 3 secrets: - MONGODB_URI: /prod/certifai/MONGODB_URI - KEYCLOAK_CLIENT_SECRET: /prod/certifai/KEYCLOAK_CLIENT_SECRET - LITELLM_MASTER_KEY: /prod/certifai/LITELLM_MASTER_KEY depends_on: - keycloak - mongodb - infisical admin_actions: - name: "Reset LiteLLM API Key" description: "Rotates the per-tenant LiteLLM key" endpoint: POST /v1/tenants/{id}/admin/rotate-litellm-key confirm: required audit_required: true observability: metrics: /metrics logs: format: json pii_redaction: true audit_endpoint: tenant-registry.internal/audit ``` ### 10.1 Manifest Variants by Frontend Type The example above shows an `interactive` product. Headless and widget products differ only in the `frontend` block. #### Widget variant ```yaml frontend: type: widget tag: status-monitor-widget bundle_url: https://cdn.breakpilot.com/products/status/{version}/widget.js bundle_size_kb: 38 dimensions: width: 400 height: 240 poll_interval_s: 60 portal_config: # same shape as headless (§ below) — used for click-through management page sections: [api_keys, webhooks, usage, docs] api_key_scopes: [...] webhook_events: [...] ``` #### Headless variant (no frontend bundle) ```yaml frontend: type: headless # NO tag, NO bundle_url — the portal renders 100% of the customer UI portal_config: sections: - api_keys - webhooks - usage - code_samples - docs status_endpoint: /v1/status # optional; portal polls for status badge api_key_scopes: - id: read description: "Read sessions and results" - id: write description: "Create new sessions" - id: admin description: "Manage settings (rare; consider before granting)" webhook_events: - name: session.completed description: "Fires when a notetaker session is fully processed" payload_schema_url: /schemas/session.completed.json - name: session.failed description: "Fires when a session cannot be processed" payload_schema_url: /schemas/session.failed.json code_samples: - language: curl title: "Create a session" snippet: | curl -X POST https://notetaker-api.breakpilot.com/v1/sessions \ -H "Authorization: ApiKey k_xxx" \ -H "X-Tenant: acme" \ -d '{"audio_url": "...", "language": "en"}' - language: python title: "Create a session" snippet: | import requests requests.post( "https://notetaker-api.breakpilot.com/v1/sessions", headers={"Authorization": "ApiKey k_xxx", "X-Tenant": "acme"}, json={"audio_url": "...", "language": "en"}, ) ``` The Tenant Registry validates the `frontend` block against the type: - `interactive` requires `tag` and `bundle_url`; `portal_config` is optional - `widget` requires `tag`, `bundle_url`, `dimensions`, AND `portal_config` - `headless` MUST NOT declare `tag` or `bundle_url`; `portal_config` is required --- ## 11. Service Token Model (Inter-Product Communication) Products can call each other directly. Auth is via short-lived service tokens issued by Keycloak's `client_credentials` flow. ### 11.1 Flow ``` 1. Compliance product needs to list AI agents for an AI Act assessment. 2. Compliance backend requests a service token: POST https://auth.breakpilot.com/realms/breakpilot-prod/protocol/openid-connect/token Body: grant_type=client_credentials client_id=compliance-svc client_secret= scope=read:certifai-agents Response: JWT (15 min TTL) 3. Compliance calls CERTifAI: GET https://certifai-api.internal/v1/tenants/{tenant_id}/agents Authorization: Bearer X-On-Behalf-Of-User: ← original user, for audit X-Service-Reason: ai-act-assessment 4. CERTifAI validates token: - Issued by platform Keycloak: ok - Audience includes "certifai-svc": ok - Scopes include "read:certifai-agents": ok - tenant_id in path matches caller's intent: ok (no cross-tenant) 5. CERTifAI returns data. 6. Both sides emit audit events: {actor: "svc:compliance", action: "certifai.list_agents", on_behalf_of: "user_sub", tenant_id: "...", reason: "ai-act-assessment"} ``` ### 11.2 Scope Catalog Each service declares scopes it offers (other services can request these) and scopes it consumes (it needs from other services). ``` certifai offers: read:certifai-agents read:certifai-usage write:certifai-settings (rare; consider before granting) compliance offers: read:compliance-status read:compliance-dsfa write:compliance-events (for cross-product event emission) billing-service consumes: read:certifai-usage read:compliance-status compliance consumes: read:certifai-agents (for AI Act assessments) ``` Scopes are granted in Keycloak per service client. Grants are reviewed quarterly. ### 11.3 Third-Party Readiness When we open the platform to third parties: ``` - Same OIDC client_credentials flow - Manifests are SIGNED by third-party developer keys (signature verified by Tenant Registry) - Third-party scopes are read-only by default; write scopes require manual approval - Network isolation: third-party services run in a separate Orca subnet - Resource limits enforced (CPU, memory, network egress) - Per-tenant install requires explicit IT Admin consent (OAuth consent screen) ``` The contract surface today is the same — we just add verification gates. --- ## 12. Versioning and Contract Evolution ### 12.1 Versions in play ``` contract_version This document. Updated when the platform changes what products must implement. Currently 1.0. Bumped on breaking changes. product_version The product's own version (semver). Tracked by Tenant Registry. Independent of contract version. api_version The version in URL paths (/v1/, /v2/). Within a contract version, a product may have multiple API versions live. ``` ### 12.2 Platform supports N and N-1 The platform always supports the current contract version and the previous one. Deprecation announced in this doc before any breaking change. ### 12.3 Breaking Change Process ``` 1. Announce in this doc (one section per breaking change with motivation). 2. Update contract_version, e.g. 1.0 → 2.0. 3. New products required to ship 2.0 from day one. 4. Existing products get 12 months to migrate. 5. After 12 months, 1.0 retired; tenants on 1.0 products are migrated or churned. ``` --- ## 13. Onboarding Checklist for a New Product A product is "ready to ship to a customer" when all boxes are ticked. ``` ☐ Backend API ☐ openapi.yaml committed and validated ☐ Mandatory endpoints implemented (§4.1) ☐ JWT validation via Keycloak JWKS ☐ Service token validation ☐ Tenant scoping enforced in middleware + tested ☐ /v1/tenants/{id}/provision idempotency test passes ☐ /v1/tenants/{id}/export produces valid GDPR-compliant ZIP ☐ DELETE /v1/tenants/{id}/data is irreversible and audited ☐ Frontend (manifest declares one of: interactive | widget | headless) For frontend.type = interactive: ☐ Custom element registered with declared tag ☐ Bundle published to CDN (≤ 500KB gzipped) ☐ Handles all required attributes (§5.A.2) ☐ Emits all event types (§5.A.3) ☐ Light + dark theme support (§5.A.4) ☐ At least one locale beyond English For frontend.type = widget: ☐ Widget custom element registered with declared tag ☐ Bundle published to CDN (≤ 50KB gzipped) ☐ Tile dimensions declared in manifest ☐ Allowed events only (no breakpilot:navigate) ☐ portal_config block complete (for click-through page) For frontend.type = headless: ☐ NO tag and NO bundle_url declared ☐ portal_config.sections declared ☐ portal_config.api_key_scopes catalog complete ☐ portal_config.webhook_events catalog with payload schemas ☐ portal_config.code_samples in at least one language ☐ Webhook payloads include HMAC signature for verification ☐ Status endpoint returns valid format (if declared) ☐ POST /internal/api-keys/verify integration tested with Tenant Registry ☐ POST /internal/webhooks/dispatch integration tested with portal ☐ MCP (if Enterprise plan or applicable) ☐ MCP server deployed ☐ Tool catalog declared in manifest ☐ API key authentication implemented ☐ All tools tenant-scoped and audited ☐ Documentation ☐ README published at developers.breakpilot.com/products/{name} ☐ API reference auto-generated and live ☐ Integration guide for customer IT ☐ Operational runbook for us ☐ Data model + GDPR retention table ☐ Observability ☐ /health implemented and returns valid format ☐ /metrics in Prometheus format ☐ JSON structured logging ☐ Audit events emitted for all listed categories ☐ No PII in logs (PII redaction tested) ☐ Identity integration ☐ Keycloak OIDC client registered ☐ Role mappings declared and tested ☐ Entitlement key included in tenant JWTs (verified end-to-end) ☐ Control integration ☐ product.manifest.yaml committed ☐ Registered with Tenant Registry catalog ☐ Lifecycle endpoints tested via Backstage "Create Test Tenant" ☐ Usage endpoint returns valid format ☐ Backstage admin actions render correctly ☐ Data integration ☐ All tables/collections have tenant_id ☐ Cross-tenant query test (negative test) passes ☐ Backup contract declared and Infra Plane is executing it ☐ GDPR export tested with real data ☐ Data residency confirmed (no exfiltration outside EU) ☐ Infra integration ☐ Orca manifest committed and applies cleanly ☐ Dockerfile builds reproducibly ☐ All secrets in Infisical (zero hardcoded) ☐ Gitea Actions pipeline green ☐ Resource limits set and tested under load ☐ Cold start dependency order declared ``` --- ## 14. Gap Analysis — Existing Products ### CERTifAI vs. Contract 1.0 ``` ✓ OIDC via Keycloak — already implemented ✓ Role data model (Admin/Member/Viewer) — exists ✗ Mandatory endpoints — NONE of §4.1 implemented yet ✗ Frontend as web component — currently a full Dioxus fullstack app ✗ MCP server — not implemented ✗ Tenant scoping in queries — only chat is user-scoped, no org_id scoping ✗ Service token validation — not implemented ✗ GDPR export/erasure — not implemented ✗ /health, /metrics, structured audit emission — not implemented ✓ Orca + Infisical compatible — already deployed this way Effort estimate: 4-6 weeks of focused work ``` ### breakpilot-compliance vs. Contract 1.0 ``` ✓ Multi-tenant via X-Tenant-ID — exists (needs JWT validation upgrade) ✓ Modular Next.js frontend — close to web-component-wrappable ✗ Mandatory endpoints — partially implemented (usage endpoint missing) ✗ JWT validation at proxy — currently raw header trust ✗ Frontend as web component — needs wrapping with @r2wc/react-to-web-component ✗ MCP server — not implemented ✓ Backup contract — declared informally, needs to be in manifest ✗ GDPR export/erasure — partial (DSR module exists, doesn't cover whole tenant) ✓ Observability — partial (structured logs, no /metrics) Effort estimate: 3-5 weeks of focused work ``` --- ## 15. Open Items ``` - Design tokens package (@breakpilot/design-tokens) — needs to exist before web components ship - CDN for product bundles — pick provider (Hetzner Object Storage + Cloudflare?) - MCP gateway — single mcp.breakpilot.com vs. per-product subdomains - Third-party manifest signing — defer until first real third-party conversation - Inter-product event bus — explicitly deferred; service tokens cover the use cases for now - Contract testing — automate manifest + openapi validation in Gitea Actions - Customer-facing catalog UI — defined at /[tenant]/catalog (see PLATFORM_ARCHITECTURE.md §5a operating principles); Backstage product picker reuses same catalog metadata. OSS swap-in points (designed-for, not adopted yet): - Audit log storage: BoxyHQ Retraced — our event schema is Retraced-shape (§8.4), swap when audit query patterns outgrow PostgreSQL or when a customer asks for exportable SOC2-grade audit retention. - Usage metering: Lago — our /v1/usage endpoint plus optional per-event stream (§4.1) is Lago-compatible. Swap when LiteLLM token billing requires real-time metering or per-customer pricing tiers we cannot model in Stripe. - Customer IdP federation (SCIM): BoxyHQ Jackson or Keycloak's SCIM module. Adopt when first enterprise customer asks for automated user provisioning. - Feature flags / per-tenant feature gating: OpenFeature (vendor-neutral). Adopt when product features need finer-than-plan-tier gating per tenant. ``` --- *End of document. Contract version 1.0. Next review: after first product (CERTifAI or compliance) achieves full compliance with §13 checklist.*