Apply platform-domain decision (2026-05-18). No services touched; docs/config only. Refs: M1.1
50 KiB
Product Integration Specification
Status: Design Draft Authors: Sharang, Benjamin Date: 2026-05-11 Companion docs: PLATFORM_ARCHITECTURE.md, INFRASTRUCTURE.md Contract version: 1.0
1. Purpose
This document defines the contract that every product must implement to be sold on the platform as a B2B building block. The contract is enforced by the Tenant Registry, Customer Portal, and Orca deployment pipeline. A product that does not implement the contract cannot be activated for a tenant.
The contract is designed to be first-party today, third-party-ready tomorrow — the technical surface is identical for our own products and any future external developers, with stricter verification gates for the latter.
2. Core Principles
1. ONE TENANT, ONE TRUTH
Every request is scoped by org_id from the JWT. Cross-tenant data leakage is the
single largest commercial risk; the spec treats it as a contract violation.
2. PLATFORM OWNS IDENTITY, BILLING, ROUTING
Products NEVER implement their own login, never store passwords, never invoice
customers directly. These are platform concerns.
3. PRODUCTS OWN THEIR DOMAIN AND DATA
Products own their database, their data model, their backup, their RTO/RPO.
No cross-product database sharing. Composition is via APIs, not via DB joins.
4. STATELESS APPLICATIONS, STATEFUL DATA STORES
Application containers are replaceable in seconds. State lives in databases
that have explicit backup contracts.
5. CONTRACT EVOLVES, PRODUCTS DECLARE COMPATIBILITY
Products declare which contract version they implement. The platform supports
N and N-1; deprecation is announced before removal.
3. Required Surfaces
A product is composed of five surfaces. Three are mandatory, one is tier-gated, one is mandatory documentation.
┌──────────────────────┬──────────────────────────────────────────┬───────────────┐
│ Surface │ What │ Requirement │
├──────────────────────┼──────────────────────────────────────────┼───────────────┤
│ Backend API │ REST + OpenAPI 3.0 spec │ REQUIRED │
│ Frontend │ Web component (custom element) │ REQUIRED │
│ MCP Server │ MCP server exposing tenant-scoped tools │ REQUIRED for │
│ │ │ Enterprise │
│ │ │ tier; opt for │
│ │ │ Starter/Pro │
│ Documentation │ README, API docs, integration guide, │ REQUIRED │
│ │ runbook, data model, GDPR retention │ │
│ Observability │ /health, /metrics, structured logs, │ REQUIRED │
│ │ audit event emission │ │
└──────────────────────┴──────────────────────────────────────────┴───────────────┘
4. Backend API Contract
4.1 Mandatory Endpoints
Every product backend must implement these endpoints. The Tenant Registry health-checks them on deploy; any missing endpoint blocks the registration.
GET /health
Returns 200 if healthy, 503 if unhealthy.
Body: {"status": "ok"|"degraded"|"down", "checks": {"db": "ok", "deps": "ok"}}
Authentication: NONE (Orca probe)
GET /version
Returns product version and contract version it implements.
Body: {"product": "certifai", "version": "1.4.2", "contract": "1.0",
"build": "<git sha>", "deployed_at": "2026-05-10T..."}
Authentication: NONE
GET /v1/usage
Query: ?tenant_id=<uuid>&from=<iso>&to=<iso>&project_id=<uuid>
Returns billing-relevant usage metrics for a tenant (and optional project).
Body: {"tenant_id": "...", "project_id": "...", "period": {...},
"metrics": {"seats_active": 12, "api_calls": 14203, ...}}
Authentication: SERVICE TOKEN (called by billing job)
Note: products with high-cardinality usage (LLM tokens, etc.) SHOULD also
stream per-event metering to /internal/usage/events on Tenant Registry.
Event shape is Lago-compatible (transaction_id, code, external_subscription_id,
properties) so we can swap to a Lago instance later without changing producers.
POST /v1/tenants/{id}/provision
Body: {"plan": "...", "config": {...}, "contract_version": "1.0"}
Initializes tenant-specific resources (schemas, default data, queues).
Must be idempotent: a second call with same params is a no-op.
Authentication: SERVICE TOKEN (called by Tenant Registry)
POST /v1/tenants/{id}/suspend
Soft-suspend: data retained, all customer access blocked.
Authentication: SERVICE TOKEN
POST /v1/tenants/{id}/reactivate
Reverse of suspend.
Authentication: SERVICE TOKEN
POST /v1/tenants/{id}/terminate
Hard terminate: schedules data for erasure per retention policy.
Body: {"reason": "...", "scheduled_erasure_at": "..."}
Authentication: SERVICE TOKEN
POST /v1/tenants/{id}/export
GDPR Article 20 (data portability) export for a tenant.
Returns: signed URL to a ZIP containing all tenant data in JSON + binary blobs.
Authentication: USER JWT (IT_ADMIN or LEGAL role)
DELETE /v1/tenants/{id}/data
GDPR Article 17 (right to erasure) full deletion.
Body: {"confirm": "<tenant_slug>"} ← safety check
Authentication: SERVICE TOKEN + USER JWT (IT_ADMIN signing off)
4.2 Authentication Modes
USER JWT Bearer token issued by Keycloak for a user session.
Contains: sub, org_id, org_roles, products, plan.
Validated via Keycloak JWKS endpoint.
Used for: all customer-facing endpoints.
SERVICE TOKEN Short-lived JWT issued by Keycloak via OAuth 2.0
client_credentials flow.
Each service has a Keycloak client (id: certifai-svc,
compliance-svc, etc.) with declared scopes.
Used for: platform-to-product calls (provisioning),
product-to-product calls (inter-product API).
TTL: 15 minutes max.
ORCA PROBE No authentication. Local network only.
Used for: /health, /version (Orca polls these).
Must not leak tenant data.
4.3 Tenant Scoping Rules
1. EVERY non-probe endpoint extracts tenant context from JWT or path.
USER JWT → tenant = jwt.org_id
SERVICE TOKEN → tenant = path parameter, validated against service scopes
2. EVERY query to the product database includes WHERE tenant_id = $1.
No exceptions. Code review enforces this; tests verify it.
3. EVERY response includes only data for the requested tenant.
The product asserts this invariant in middleware (defense in depth).
4. EVERY log line and audit event includes tenant_id as a structured field.
4.4 OpenAPI Spec
Every product publishes openapi.yaml at /openapi.yaml. The Tenant Registry pulls this on product registration and validates that the mandatory endpoints from §4.1 are present with correct signatures.
Product OpenAPI must:
- Be valid OpenAPI 3.0 (3.1 not yet — tooling gap)
- Include all mandatory endpoints from §4.1
- Document all custom endpoints with examples
- Declare authentication mode for each endpoint
- Declare scopes consumed (for SERVICE TOKEN endpoints)
- Include error response schemas (4xx, 5xx)
5. Frontend Contract
A product declares its frontend type in the manifest. The portal renders accordingly. Three types are supported:
interactive Full UI shipped as a web component custom element.
Customer OPERATES the product through this UI.
Examples: CERTifAI, breakpilot-compliance, classic SaaS products.
widget Only a small dashboard tile component; no full product page.
Customer SEES product output in a tile; deeper management
happens on a portal-rendered config page.
Examples: monitoring, status reporting.
headless No frontend code at all. The portal renders a generic
management UI from a portal_config block in the manifest.
Customer CONFIGURES (API keys, webhooks) and their own
systems consume the product via API/MCP.
Examples: notetaker bot, document classifier, webhook router.
The portal branches its rendering on manifest.frontend.type. Backend, MCP, observability, and lifecycle contracts are identical across all three types — only the customer-facing surface changes.
5.A Interactive (Web Component)
The frontend is a custom element registered with a product-specific tag name. The Customer Portal loads the bundle and renders the element with attributes passed in by the portal.
5.A.1 Why web components
Our products span Rust/Dioxus, Next.js, Go, Python. Web components are the only framework-agnostic surface that lets all of these ship a frontend without forcing a stack rewrite:
- CERTifAI compiles Dioxus → WASM → wraps in a custom element
- breakpilot-compliance wraps React components via
@r2wc/react-to-web-component - Any future Vue/Svelte/Solid product also works
5.A.2 The Tag Contract
Each product declares ONE primary tag in its manifest. The portal renders it like this:
<certifai-dashboard
tenant="acme"
tenant-id="uuid-acme"
jwt="<short-lived portal-issued JWT>"
locale="en"
theme="light"
api-base="https://certifai-api.internal/v1"
audit-callback-url="/api/audit"
/>
Attributes the portal passes (the product MUST handle these):
tenant tenant slug (acme)
tenant-id tenant UUID (uuid-acme)
jwt short-lived JWT (≤ 5 min), product validates against Keycloak JWKS
locale en / de / fr / es / pt
theme light / dark
api-base backend URL the product should call
audit-callback-url URL to POST audit events to (portal-relative)
5.A.3 Events the Product Must Emit
The component emits events upward via CustomEvent. The portal listens for these and integrates them:
breakpilot:navigate
Detail: {path: "/sub/route", title: "Page Title"}
Portal updates browser URL + breadcrumb without reloading.
breakpilot:error
Detail: {code: "...", message: "...", recoverable: true|false}
Portal shows toast / blocking error.
breakpilot:audit
Detail: {action: "...", target: "...", metadata: {...}}
Portal forwards to central audit log via audit-callback-url.
breakpilot:loading
Detail: {state: "start"|"end", description: "Generating DSFA..."}
Portal shows progress indicator.
breakpilot:request-upgrade
Detail: {feature: "...", required_plan: "enterprise"}
Portal opens upgrade-quote flow.
5.A.4 Design System Compatibility
The platform publishes @breakpilot/design-tokens (CSS variables, fonts, spacing). Products are encouraged but not required to consume it. The portal injects design tokens into the shadow DOM root so consuming them is a single CSS line:
:host { color: var(--bp-text); background: var(--bp-surface); }
Products that ship custom styling must respect the theme attribute and the prefers-color-scheme media query.
5.A.5 Bundle Loading
Product publishes a bundle at:
https://cdn.breakpilot.com/products/{name}/{version}/element.js
Portal loads it lazily via dynamic import when the user navigates to /[tenant]/products/{name}.
Portal caches the bundle URL per product version (declared in tenant_products.config).
Bundle size budget: ≤ 500KB gzipped for first load.
5.B Widget
A widget product declares ONE custom element that renders only as a dashboard tile. It receives the same attributes as interactive products but emits no breakpilot:navigate events — clicking the tile takes the user to a portal-rendered config page (same surface as headless products in §5.C).
<status-monitor-widget
tenant="acme"
tenant-id="uuid-acme"
jwt="<short-lived JWT>"
locale="en"
theme="light"
api-base="https://monitor-api.internal/v1"
/>
Constraints:
Bundle size budget ≤ 50KB gzipped (widgets load eagerly on dashboard)
Dimensions declared in manifest (e.g., 200×120 or 400×240)
Refresh widget polls own API; portal does not push updates
Allowed events breakpilot:error, breakpilot:audit, breakpilot:request-upgrade
(NOT breakpilot:navigate — click-through is portal-controlled)
API keys, webhooks, and full management UI for widget products use the same portal-rendered config page as headless products (§5.C).
5.C Headless
The product ships NO frontend code. The portal renders a generic management UI from a portal_config block in the manifest. This page is served at /[tenant]/products/{name} and contains the same elements regardless of which product it is — populated entirely from manifest data.
5.C.1 What the Portal Renders
┌──────────────────────────────────────────────────────────┐
│ Notetaker [Status: OK] │
│ ────────────────────────────────────────────────────────│
│ │
│ USAGE (last 30 days) │
│ ┌──────────────────────────────────────────────────┐ │
│ │ 142 sessions processed ▁▃▆█▆▄▂▃▅▆▄▂▁▂▃▄▅▆▇█ │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ API KEYS [+ Generate] │
│ ──────────────────────────────────────────────────── │
│ • prod-key k_xxx...4f12 scopes: r,w 2026-01-04│
│ • staging-key k_xxx...9a83 scopes: r 2026-04-22│
│ │
│ WEBHOOKS [+ Add] │
│ ──────────────────────────────────────────────────── │
│ • https://acme.example.com/notetaker/cb │
│ events: session.completed, session.failed │
│ last 24h: 142 delivered, 0 failed [Test] │
│ │
│ CODE SAMPLES │
│ ──────────────────────────────────────────────────── │
│ [curl] [JS] [Python] │
│ curl -X POST https://notetaker-api.breakpilot.com/v1 │
│ -H "Authorization: ApiKey k_xxx" │
│ -H "X-Tenant: acme" │
│ -d '{...}' │
│ │
│ DOCS ► developers.breakpilot.com/products/notetaker │
└──────────────────────────────────────────────────────────┘
5.C.2 Manifest Requirement
A headless product must include a portal_config block declaring:
- sections: which UI sections to render (subset of:
api_keys,webhooks,usage,docs,code_samples,custom_actions) - webhook_events: the catalog of events the product can emit
- api_key_scopes: the catalog of scopes that can be granted on a key
- code_samples: at least one language with a working request example
- status_endpoint: optional URL for the portal to poll for the status badge
See §10.1 for the full schema.
5.C.3 API Keys
API keys are a portal concern, not a product concern. Tenant Registry generates and stores key hashes; the product validates incoming keys against POST /internal/api-keys/verify on Tenant Registry. This means:
- Key rotation is portal-controlled
- Scope enforcement is consistent across all headless products
- Revocation is instant (registry updates a single row)
5.C.4 Webhooks
The portal owns webhook configuration UI and delivery logging. Products POST event payloads to a portal endpoint (/internal/webhooks/dispatch); the portal handles signing, delivery, retry, dead-letter, and the customer-visible delivery log.
This keeps webhook UX consistent across all headless products and means a product cannot accidentally leak events from one tenant to another's webhook URL.
5.C.5 No Impersonation
Backstage shows no "Impersonate" button for headless products — there is no UI to enter. Debugging is via API call logs, audit events, webhook delivery history, and admin actions declared in the manifest (e.g., "Flush Queue", "Rotate Keys", "Reset State").
6. MCP Server (Required for Enterprise)
6.1 What it is
An MCP (Model Context Protocol) server exposes the product's capabilities as tools that customer-side AI agents can call. The customer's IT Admin configures the MCP endpoint in their AI agent platform (Claude Desktop, Cursor, internal agents, etc.).
6.2 Required Behavior
1. ONE MCP server per product
Endpoint: https://mcp.{product}.breakpilot.com (or unified mcp.breakpilot.com/{product})
2. Authentication via SCOPED API KEY
Customer IT Admin generates API key in /[tenant]/settings/api-keys.
Key carries tenant_id binding and scopes (read/write per product domain).
No user JWT for MCP — agents authenticate as the org, not as a user.
3. Tools are tenant-scoped
Every tool call uses the API key's tenant_id binding.
Cross-tenant calls are impossible by construction.
4. Tool catalog declared in manifest
Each tool: name, description, parameters (JSONSchema), required_scopes.
5. Audit every tool call
Emit breakpilot:audit-equivalent server-side: actor=api_key_id,
action=tool_name, metadata=parameters.
6.3 Example Tool Catalogs
CERTifAI MCP tools:
list_ai_agents → returns agents configured for this tenant
get_llm_usage → returns LiteLLM usage for date range
run_news_search → SearXNG search
list_chat_sessions → user's chat history
Compliance MCP tools:
create_dsfa → starts a DSFA workflow
check_tom_status → returns TOM compliance status
list_dsr_requests → returns open Data Subject Requests
approve_dsfa → marks DSFA as approved
list_ai_act_assessments → returns AI Act assessments
6.4 Activation
Enterprise customers automatically get MCP enabled. Starter/Pro customers see "Available on Enterprise" in the API Keys page. Tenant Registry checks tenant.plan before issuing an MCP API key.
7. Documentation Contract
A product ships five required documents. They are published at developers.breakpilot.com/products/{name}/.
1. README What does it do? Value prop in 200 words.
Who is the typical user? What are the workflows?
2. API Reference Auto-generated from openapi.yaml.
Hosted via Redoc or Stoplight Elements.
3. Integration Guide For customer IT teams. How to:
- Enable the product on their tenant
- Configure SSO and roles
- Wire into their workflows
- Use the MCP server (if applicable)
- Generate and manage API keys
4. Operational Runbook For us. How to:
- Deploy a new version
- Roll back
- Debug a stuck tenant
- Reset tenant state
- Investigate slow queries
5. Data Model + GDPR What data is stored, in which table/collection,
personal data category (Art. 9 special category?),
retention period, GDPR lawful basis.
Used by customer DPOs for their own Verzeichnis.
8. Observability Contract
8.1 Health Check
GET /health returns:
200 {"status": "ok", "checks": {...}} — all good
200 {"status": "degraded", "checks": {...}, "reason"} — degraded but serving
503 {"status": "down", "checks": {...}, "reason"} — restart me
Orca polls every 30s. Three consecutive 503s triggers automatic restart.
8.2 Metrics
GET /metrics returns Prometheus exposition format.
Required metrics:
bp_http_requests_total{method, route, status, tenant_id}
bp_http_request_duration_seconds{method, route, tenant_id}
bp_active_tenants_gauge
bp_db_query_duration_seconds{operation}
bp_external_api_calls_total{provider, status} (LLM calls, etc.)
8.3 Structured Logging
All logs are JSON. All log lines include:
ts ISO-8601 timestamp
level debug|info|warn|error
service product name (certifai)
tenant_id tenant UUID (or "system" for non-tenant ops)
user_sub user UUID if applicable
request_id trace ID
msg human-readable message
... additional structured fields
No PII in logs (use the PII redaction middleware from breakpilot-core).
8.4 Audit Events
Audit events go to the central audit log in Tenant Registry. Products emit them via POST to the audit-callback-url passed by the portal (frontend) or directly to Tenant Registry API (backend).
Event format (Retraced-shape — transformable 1:1 if we swap to BoxyHQ Retraced later):
{
"tenant_id": "uuid-acme", # → Retraced "group.id"
"project_id": "uuid-prod" | null, # optional sub-tenancy scope
"product": "certifai", # which product emitted
"actor": {
"id": "user-uuid" | "svc:certifai" | "api_key:keyid",
"type": "user" | "service" | "api_key",
"name": "alice@acme.com"
},
"action": "dsfa.approve", # dotted: <domain>.<verb>
"crud": "u", # c|r|u|d
"target": {
"id": "<entity-id>",
"type": "dsfa" | "llm_config" | ...,
"name": "<human label>"
},
"source_ip": "1.2.3.4",
"description": "Alice approved DSFA #42 for Customer Data Processing",
"fields": {...}, # additional structured metadata
"created_at": "2026-05-11T14:23:01Z"
}
Mandatory event categories per product:
config changes everything in product settings
data exports anyone exporting tenant data
data deletions erasures and bulk deletes
permission changes role grants/revocations within product
approvals business-significant approvals (DSFA, etc.)
cross-product calls service-token calls into other products (auto-emitted
by both caller and callee, with on_behalf_of in fields)
The portal /audit page renders these events filtered by tenant + product + actor + action + time range. The schema is intentionally Retraced-compatible so the storage layer can be swapped without changing producers.
9. Plane-by-Plane Integration Requirements
9.1 Identity Plane
[REGISTRATION]
- Register an OIDC client in Keycloak (id: {product}-client)
Confidential, client_credentials grant for service tokens,
authorization_code grant if product has its own UI flows.
- Declare role mappings in product manifest:
role_mappings:
IT_ADMIN: Admin
LEGAL: Auditor
FINANCE: ReadOnly
USER: Member
- Declare an entitlement key (e.g. "certifai") that goes into JWT products claim.
[RUNTIME]
- Validate JWT via Keycloak JWKS endpoint (cache JWKS for 5 min).
- Reject if products claim does not include this product's entitlement key.
- Reject if iss is not the platform Keycloak.
- Reject if exp expired or nbf future.
[NEVER]
- Never validate JWT against a static secret. JWKS only.
- Never issue tokens. Never accept passwords. Never store credentials.
9.2 Control Plane
[REGISTRATION]
- On first deploy, product POSTs to Tenant Registry:
POST /catalog/products
body: manifest (see §10)
Tenant Registry verifies the manifest, pulls openapi.yaml, validates
mandatory endpoints, registers the product.
- Product appears in Backstage product picker when creating sales orders.
[LIFECYCLE]
- On tenant.activate: Tenant Registry calls product /v1/tenants/{id}/provision
- On tenant.suspend: calls /suspend
- On tenant.churn: calls /terminate
- On contract.renew: no call (idempotent: just stays active)
[USAGE METERING]
- Tenant Registry runs a daily job hitting product /v1/usage for billing.
- Product is responsible for accurate metering and idempotent reporting.
[BACKSTAGE ACTIONS]
- Product declares custom admin actions in manifest:
admin_actions:
- name: "Rebuild RAG Index"
endpoint: POST /v1/tenants/{id}/admin/rebuild-rag
confirm: required
plane: data
- Backstage renders these as buttons on /backstage/tenants/{id}/products/{name}
- Calls are SERVICE TOKEN authenticated and audit-logged.
[AUDIT EVENTS]
- Product POSTs all audit events to Tenant Registry /audit endpoint.
- Tenant Registry stores them in audit_log table for cross-product unified view.
9.3 Data Plane
[DATA OWNERSHIP]
- Product owns its database. No other service queries it directly.
- Cross-product composition is via the inter-product service-token API (§11),
never via shared DB connections.
[ISOLATION]
- Every table/collection has a tenant_id (or org_id) column.
- Every query filters by it.
- Database user permissions cannot bypass it.
[PROJECT SCOPING — OPTIONAL]
- Products MAY support sub-tenancy via projects (mirrors GCP Project /
AWS Account pattern). Allows customers to separate dev / staging / prod
or per-team data within a single tenant.
- Declared in manifest:
data:
supports_projects: true
- Implementation:
- All tenant-scoped tables/collections add project_id column.
- Compound unique constraints become (tenant_id, project_id, key).
- All endpoints accept optional ?project_id=<uuid>; absence means
the tenant's default implicit project.
- JWT may carry an active project_id claim; products SHOULD respect it
if present.
- Reference implementation: breakpilot-compliance already uses this pattern
(sdk_states UNIQUE on (tenant_id, project_id) since March 2026).
- Products that do NOT support projects must still gracefully ignore
project_id parameters (return tenant-wide data).
[TENANT LIFECYCLE CONTRACT]
Products MUST honor the tenant.status passed in the JWT (`tenant_status`
custom claim) and behave per the table below. See PLATFORM_ARCHITECTURE.md
P15 + P16 for the full state machine.
┌──────────┬───────────────────────────────────────────────────────────┐
│ status │ Product behavior │
├──────────┼───────────────────────────────────────────────────────────┤
│ demo │ Accept all calls. Apply NO billing meter. Honor │
│ │ /v1/tenants/demo/reset (idempotent). Seed from │
│ │ catalog.demo.seed_data_url. Audit emitted but tagged │
│ │ {"demo": true} so portal can hide from real audit. │
├──────────┼───────────────────────────────────────────────────────────┤
│ trial │ Accept all calls up to catalog.trial_quota; over quota │
│ │ return 429 with header X-Trial-Limit-Reset. Show "Trial" │
│ │ context in any product UI banner area provided by host. │
├──────────┼───────────────────────────────────────────────────────────┤
│ active │ Normal operation. │
├──────────┼───────────────────────────────────────────────────────────┤
│ frozen │ Per data.frozen_behavior in manifest (typically reads │
│ │ allowed, writes return 402, background jobs paused). │
│ │ /export MUST work; webhook deliveries MUST stop. │
├──────────┼───────────────────────────────────────────────────────────┤
│ archived │ All API calls return 410 Gone. Data already deleted by │
│ │ the offboarding step; this state is for audit only. │
└──────────┴───────────────────────────────────────────────────────────┘
Products MUST implement:
- GET /v1/tenants/{id}/export
Returns one ZIP per tenant containing every format declared in
data.offboarding_export_formats. Synchronous OK if <60s; async
with signed URL otherwise.
- DELETE /v1/tenants/{id}/data
Removes all tenant data within 30 days. Audit log retained
separately (see §8.4). Idempotent.
- POST /v1/tenants/demo/reset
Restores seed data. Only callable from the portal service token.
[BACKUP CONTRACT]
- Product declares in manifest:
backup:
data_stores: [postgres, qdrant, minio]
rpo: 6h
rto: 30min
retention_days: 30
- Infra Plane executes backups per declaration (pg_dump, etc.).
- Product publishes restore procedure in operational runbook.
[GDPR ENDPOINTS]
- /v1/tenants/{id}/export returns ALL data for the tenant (JSON + blobs in ZIP).
- DELETE /v1/tenants/{id}/data deletes everything within 30 days of call.
- Both endpoints emit audit events.
[DATA RESIDENCY]
- All data stays in EU (database, object storage, cache).
- Product declares any external data flows (e.g., LLM calls to OpenAI EU endpoint)
in the data model documentation.
9.4 Infra Plane
[IaC]
- Orca manifest at: /orca/manifests/{vm}/{product}.toml
- Manifest declares: image, resource limits, health check, secret refs,
network rules, replicas, restart policy.
- Changes go through Gitea PR → Gitea Actions → Orca apply.
[SECRETS]
- All secrets via Infisical machine identity.
- Secret path namespacing: /prod/{product}/{KEY}
- Manifest references paths, never values:
secrets:
DB_URL: /prod/certifai/MONGODB_URI
LLM_KEY: /prod/certifai/LITELLM_MASTER_KEY
- Bootstrap secrets (DB URIs for Keycloak only) are the lone exception.
[NETWORKING]
- Product services bind only to the private network.
- Public-facing routes pass through Orca-Proxy.
- Inter-product calls use internal DNS names (e.g., certifai.internal:8080).
[BUILD + DEPLOY]
- Dockerfile in product repo root.
- Gitea Actions pipeline:
fmt → lint → test → build → push → orca apply → e2e
- Image tagged with git SHA + semver.
[COLD START]
- Product declares startup dependencies in manifest:
depends_on: [keycloak, postgres-app, infisical]
- Orca enforces ordering on full restart (see INFRASTRUCTURE.md §10 Scenario F).
10. Product Manifest
The canonical declaration of a product, used by Tenant Registry, Orca, and Backstage. One file, committed to product repo, applied via deployment pipeline.
# product.manifest.yaml
schema_version: "1.0"
product:
id: certifai
name: "CERTifAI"
description: "Self-hosted GDPR-compliant AI infrastructure dashboard"
vendor: breakpilot # we; future third-parties will use their slug
contract_version: "1.0"
product_version: "1.4.2"
repo: git.breakpilot.com/sharang/certifai
catalog:
# Renders in /[tenant]/catalog and /backstage/products
category: "AI Infrastructure" # AI Infrastructure | Compliance | Productivity | Security | Data
tagline: "GDPR-compliant LLMs without leaving the EU"
hero_image: https://cdn.breakpilot.com/products/certifai/hero.png
screenshots:
- https://cdn.breakpilot.com/products/certifai/dashboard.png
- https://cdn.breakpilot.com/products/certifai/agents.png
pricing_summary: "From €X/seat/month — included on Professional and Enterprise plans"
available_on_plans: [trial, professional, enterprise] # 'trial' opt-in for self-serve
trial_days: 14
trial_quota: # caps applied while tenant.status == trial
llm_tokens_per_day: 100_000
api_calls_per_day: 10_000
works_well_with: [compliance] # cross-product affinity; surfaced in catalog
depends_on_products: [] # hard dependencies (rare; for compositions)
demo:
supported: true # MUST be true unless explicitly waived
seed_data_url: https://cdn.breakpilot.com/products/certifai/demo/seed-v3.tar.gz
reset_endpoint: /v1/tenants/demo/reset # called nightly by portal cron
persona_hints: # for sales rep talk track
- "GDPR officer at a 200-person SaaS"
- "CTO replacing OpenAI calls with EU-hosted LLMs"
identity:
oidc_client_id: certifai-client
entitlement_key: certifai
role_mappings:
IT_ADMIN: Admin
CXO: Member
FINANCE: Viewer
LEGAL: Viewer
USER: Member
required_scopes:
- read:agents
- write:agents
- read:usage
frontend:
type: interactive # interactive | widget | headless
tag: certifai-dashboard
bundle_url: https://cdn.breakpilot.com/products/certifai/{version}/element.js
bundle_size_kb: 380
routes:
- path: /
label: "Dashboard"
- path: /agents
label: "AI Agents"
required_role: Member
- path: /providers
label: "Providers"
required_role: Admin
backend:
openapi_url: /openapi.yaml
base_url: https://certifai-api.internal/v1
health_url: /health
service_token_audience: certifai-svc
mcp:
enabled: true
required_plan: enterprise
endpoint: https://mcp.breakpilot.com/certifai
tools:
- name: list_ai_agents
description: "Returns AI agents configured for the tenant"
required_scope: read:agents
- name: get_llm_usage
description: "Returns LLM usage metrics"
required_scope: read:usage
# ... more tools
data:
data_stores:
- type: mongodb
vm: vm-certifai
- type: external_api
provider: litellm
pii_class: low
tenant_scoping:
field: org_id
enforcement: middleware
supports_projects: false # see §9.3 PROJECT SCOPING
retention_default_days: 365
gdpr_export: /v1/tenants/{id}/export
gdpr_erasure: /v1/tenants/{id}/data
offboarding_export_formats: [json, csv] # produced by P16 final-export step
frozen_behavior:
reads: allow # customer can still pull data / download exports
writes: deny_402 # POST/PUT/DELETE return 402 Payment Required
background_jobs: pause # scheduled work suspended, queue preserved
backup:
rpo: 24h
rto: 30min
retention_days: 30
infra:
image: registry.breakpilot.com/certifai-dashboard
vm: vm-certifai
replicas: 1
resource_limits:
cpu: "2000m"
memory: "4Gi"
health_check:
path: /health
interval: 30s
timeout: 5s
threshold: 3
secrets:
- MONGODB_URI: /prod/certifai/MONGODB_URI
- KEYCLOAK_CLIENT_SECRET: /prod/certifai/KEYCLOAK_CLIENT_SECRET
- LITELLM_MASTER_KEY: /prod/certifai/LITELLM_MASTER_KEY
depends_on:
- keycloak
- mongodb
- infisical
admin_actions:
- name: "Reset LiteLLM API Key"
description: "Rotates the per-tenant LiteLLM key"
endpoint: POST /v1/tenants/{id}/admin/rotate-litellm-key
confirm: required
audit_required: true
observability:
metrics: /metrics
logs:
format: json
pii_redaction: true
audit_endpoint: tenant-registry.internal/audit
10.1 Manifest Variants by Frontend Type
The example above shows an interactive product. Headless and widget products differ only in the frontend block.
Widget variant
frontend:
type: widget
tag: status-monitor-widget
bundle_url: https://cdn.breakpilot.com/products/status/{version}/widget.js
bundle_size_kb: 38
dimensions:
width: 400
height: 240
poll_interval_s: 60
portal_config:
# same shape as headless (§ below) — used for click-through management page
sections: [api_keys, webhooks, usage, docs]
api_key_scopes: [...]
webhook_events: [...]
Headless variant (no frontend bundle)
frontend:
type: headless
# NO tag, NO bundle_url — the portal renders 100% of the customer UI
portal_config:
sections:
- api_keys
- webhooks
- usage
- code_samples
- docs
status_endpoint: /v1/status # optional; portal polls for status badge
api_key_scopes:
- id: read
description: "Read sessions and results"
- id: write
description: "Create new sessions"
- id: admin
description: "Manage settings (rare; consider before granting)"
webhook_events:
- name: session.completed
description: "Fires when a notetaker session is fully processed"
payload_schema_url: /schemas/session.completed.json
- name: session.failed
description: "Fires when a session cannot be processed"
payload_schema_url: /schemas/session.failed.json
code_samples:
- language: curl
title: "Create a session"
snippet: |
curl -X POST https://notetaker-api.breakpilot.com/v1/sessions \
-H "Authorization: ApiKey k_xxx" \
-H "X-Tenant: acme" \
-d '{"audio_url": "...", "language": "en"}'
- language: python
title: "Create a session"
snippet: |
import requests
requests.post(
"https://notetaker-api.breakpilot.com/v1/sessions",
headers={"Authorization": "ApiKey k_xxx", "X-Tenant": "acme"},
json={"audio_url": "...", "language": "en"},
)
The Tenant Registry validates the frontend block against the type:
interactiverequirestagandbundle_url;portal_configis optionalwidgetrequirestag,bundle_url,dimensions, ANDportal_configheadlessMUST NOT declaretagorbundle_url;portal_configis required
11. Service Token Model (Inter-Product Communication)
Products can call each other directly. Auth is via short-lived service tokens issued by Keycloak's client_credentials flow.
11.1 Flow
1. Compliance product needs to list AI agents for an AI Act assessment.
2. Compliance backend requests a service token:
POST https://auth.breakpilot.com/realms/breakpilot-prod/protocol/openid-connect/token
Body: grant_type=client_credentials
client_id=compliance-svc
client_secret=<from Infisical>
scope=read:certifai-agents
Response: JWT (15 min TTL)
3. Compliance calls CERTifAI:
GET https://certifai-api.internal/v1/tenants/{tenant_id}/agents
Authorization: Bearer <service token>
X-On-Behalf-Of-User: <user_sub> ← original user, for audit
X-Service-Reason: ai-act-assessment
4. CERTifAI validates token:
- Issued by platform Keycloak: ok
- Audience includes "certifai-svc": ok
- Scopes include "read:certifai-agents": ok
- tenant_id in path matches caller's intent: ok (no cross-tenant)
5. CERTifAI returns data.
6. Both sides emit audit events:
{actor: "svc:compliance", action: "certifai.list_agents",
on_behalf_of: "user_sub", tenant_id: "...", reason: "ai-act-assessment"}
11.2 Scope Catalog
Each service declares scopes it offers (other services can request these) and scopes it consumes (it needs from other services).
certifai offers:
read:certifai-agents
read:certifai-usage
write:certifai-settings (rare; consider before granting)
compliance offers:
read:compliance-status
read:compliance-dsfa
write:compliance-events (for cross-product event emission)
billing-service consumes:
read:certifai-usage
read:compliance-status
compliance consumes:
read:certifai-agents (for AI Act assessments)
Scopes are granted in Keycloak per service client. Grants are reviewed quarterly.
11.3 Third-Party Readiness
When we open the platform to third parties:
- Same OIDC client_credentials flow
- Manifests are SIGNED by third-party developer keys (signature verified by Tenant Registry)
- Third-party scopes are read-only by default; write scopes require manual approval
- Network isolation: third-party services run in a separate Orca subnet
- Resource limits enforced (CPU, memory, network egress)
- Per-tenant install requires explicit IT Admin consent (OAuth consent screen)
The contract surface today is the same — we just add verification gates.
12. Versioning and Contract Evolution
12.1 Versions in play
contract_version This document. Updated when the platform changes what products
must implement. Currently 1.0. Bumped on breaking changes.
product_version The product's own version (semver). Tracked by Tenant Registry.
Independent of contract version.
api_version The version in URL paths (/v1/, /v2/). Within a contract version,
a product may have multiple API versions live.
12.2 Platform supports N and N-1
The platform always supports the current contract version and the previous one. Deprecation announced in this doc before any breaking change.
12.3 Breaking Change Process
1. Announce in this doc (one section per breaking change with motivation).
2. Update contract_version, e.g. 1.0 → 2.0.
3. New products required to ship 2.0 from day one.
4. Existing products get 12 months to migrate.
5. After 12 months, 1.0 retired; tenants on 1.0 products are migrated or churned.
13. Onboarding Checklist for a New Product
A product is "ready to ship to a customer" when all boxes are ticked.
☐ Backend API
☐ openapi.yaml committed and validated
☐ Mandatory endpoints implemented (§4.1)
☐ JWT validation via Keycloak JWKS
☐ Service token validation
☐ Tenant scoping enforced in middleware + tested
☐ /v1/tenants/{id}/provision idempotency test passes
☐ /v1/tenants/{id}/export produces valid GDPR-compliant ZIP
☐ DELETE /v1/tenants/{id}/data is irreversible and audited
☐ Frontend (manifest declares one of: interactive | widget | headless)
For frontend.type = interactive:
☐ Custom element registered with declared tag
☐ Bundle published to CDN (≤ 500KB gzipped)
☐ Handles all required attributes (§5.A.2)
☐ Emits all event types (§5.A.3)
☐ Light + dark theme support (§5.A.4)
☐ At least one locale beyond English
For frontend.type = widget:
☐ Widget custom element registered with declared tag
☐ Bundle published to CDN (≤ 50KB gzipped)
☐ Tile dimensions declared in manifest
☐ Allowed events only (no breakpilot:navigate)
☐ portal_config block complete (for click-through page)
For frontend.type = headless:
☐ NO tag and NO bundle_url declared
☐ portal_config.sections declared
☐ portal_config.api_key_scopes catalog complete
☐ portal_config.webhook_events catalog with payload schemas
☐ portal_config.code_samples in at least one language
☐ Webhook payloads include HMAC signature for verification
☐ Status endpoint returns valid format (if declared)
☐ POST /internal/api-keys/verify integration tested with Tenant Registry
☐ POST /internal/webhooks/dispatch integration tested with portal
☐ MCP (if Enterprise plan or applicable)
☐ MCP server deployed
☐ Tool catalog declared in manifest
☐ API key authentication implemented
☐ All tools tenant-scoped and audited
☐ Documentation
☐ README published at developers.breakpilot.com/products/{name}
☐ API reference auto-generated and live
☐ Integration guide for customer IT
☐ Operational runbook for us
☐ Data model + GDPR retention table
☐ Observability
☐ /health implemented and returns valid format
☐ /metrics in Prometheus format
☐ JSON structured logging
☐ Audit events emitted for all listed categories
☐ No PII in logs (PII redaction tested)
☐ Identity integration
☐ Keycloak OIDC client registered
☐ Role mappings declared and tested
☐ Entitlement key included in tenant JWTs (verified end-to-end)
☐ Control integration
☐ product.manifest.yaml committed
☐ Registered with Tenant Registry catalog
☐ Lifecycle endpoints tested via Backstage "Create Test Tenant"
☐ Usage endpoint returns valid format
☐ Backstage admin actions render correctly
☐ Data integration
☐ All tables/collections have tenant_id
☐ Cross-tenant query test (negative test) passes
☐ Backup contract declared and Infra Plane is executing it
☐ GDPR export tested with real data
☐ Data residency confirmed (no exfiltration outside EU)
☐ Infra integration
☐ Orca manifest committed and applies cleanly
☐ Dockerfile builds reproducibly
☐ All secrets in Infisical (zero hardcoded)
☐ Gitea Actions pipeline green
☐ Resource limits set and tested under load
☐ Cold start dependency order declared
14. Gap Analysis — Existing Products
CERTifAI vs. Contract 1.0
✓ OIDC via Keycloak — already implemented
✓ Role data model (Admin/Member/Viewer) — exists
✗ Mandatory endpoints — NONE of §4.1 implemented yet
✗ Frontend as web component — currently a full Dioxus fullstack app
✗ MCP server — not implemented
✗ Tenant scoping in queries — only chat is user-scoped, no org_id scoping
✗ Service token validation — not implemented
✗ GDPR export/erasure — not implemented
✗ /health, /metrics, structured audit emission — not implemented
✓ Orca + Infisical compatible — already deployed this way
Effort estimate: 4-6 weeks of focused work
breakpilot-compliance vs. Contract 1.0
✓ Multi-tenant via X-Tenant-ID — exists (needs JWT validation upgrade)
✓ Modular Next.js frontend — close to web-component-wrappable
✗ Mandatory endpoints — partially implemented (usage endpoint missing)
✗ JWT validation at proxy — currently raw header trust
✗ Frontend as web component — needs wrapping with @r2wc/react-to-web-component
✗ MCP server — not implemented
✓ Backup contract — declared informally, needs to be in manifest
✗ GDPR export/erasure — partial (DSR module exists, doesn't cover whole tenant)
✓ Observability — partial (structured logs, no /metrics)
Effort estimate: 3-5 weeks of focused work
15. Open Items
- Design tokens package (@breakpilot/design-tokens) — needs to exist before web components ship
- CDN for product bundles — pick provider (Hetzner Object Storage + Cloudflare?)
- MCP gateway — single mcp.breakpilot.com vs. per-product subdomains
- Third-party manifest signing — defer until first real third-party conversation
- Inter-product event bus — explicitly deferred; service tokens cover the use cases for now
- Contract testing — automate manifest + openapi validation in Gitea Actions
- Customer-facing catalog UI — defined at /[tenant]/catalog (see PLATFORM_ARCHITECTURE.md
§5a operating principles); Backstage product picker reuses same catalog metadata.
OSS swap-in points (designed-for, not adopted yet):
- Audit log storage: BoxyHQ Retraced — our event schema is Retraced-shape (§8.4),
swap when audit query patterns outgrow PostgreSQL or when a customer asks for
exportable SOC2-grade audit retention.
- Usage metering: Lago — our /v1/usage endpoint plus optional per-event stream
(§4.1) is Lago-compatible. Swap when LiteLLM token billing requires real-time
metering or per-customer pricing tiers we cannot model in Stripe.
- Customer IdP federation (SCIM): BoxyHQ Jackson or Keycloak's SCIM module.
Adopt when first enterprise customer asks for automated user provisioning.
- Feature flags / per-tenant feature gating: OpenFeature (vendor-neutral).
Adopt when product features need finer-than-plan-tier gating per tenant.
End of document. Contract version 1.0. Next review: after first product (CERTifAI or compliance) achieves full compliance with §13 checklist.