# Platform Architecture — B2B Customer Portal **Status:** Design Draft **Authors:** Sharang, Benjamin **Date:** 2026-05-11 --- ## 1. Vision We sell CERTifAI and breakpilot-compliance as modular B2B building blocks. Customers buy one or both and operate them inside a unified customer portal — without needing to understand that they are separate products under the hood. Each customer is a **tenant**: fully isolated data, their own user base, their own identity configuration. We manage all tenants from a single operator backstage. ERPNext runs our company: CRM, sales orders, invoicing, HR. Frappe Helpdesk runs customer support. Gitea runs engineering. Everything else — Keycloak, all product services, all databases — runs on our own infrastructure managed by Orca. --- ## 2. Products in Scope | Product | What it is | |---|---| | **CERTifAI** | Self-hosted GDPR-compliant AI admin dashboard. Manages LLMs, AI agents, MCP servers, usage analytics. Built with Rust/Dioxus. | | **breakpilot-compliance** | GDPR and AI-Act compliance automation. Covers DSFA, VVT, TOM, DSR, AI Act, risk, vendor, incidents. Built with Python/FastAPI + Go AI SDK + Next.js. | Out of scope: breakpilot-dataroom, breakpilot-lehrer, breakpilot-pitch-deck. --- ## 3. The Four Planes ``` ╔══════════════════════════════════════════════════════════════════╗ ║ PLANE 1 — IDENTITY (logical root, all auth flows through here) ║ ╚══════════════════════════════════════════════════════════════════╝ ↓ JWT ╔══════════════════════════════════════════════════════════════════╗ ║ PLANE 2 — CONTROL (portal + ERPNext + tenant registry) ║ ╚══════════════════════════════════════════════════════════════════╝ ↓ tenant-scoped API calls ╔══════════════════════════════════════════════════════════════════╗ ║ PLANE 3 — DATA (CERTifAI + breakpilot-compliance) ║ ╚══════════════════════════════════════════════════════════════════╝ ↓ everything runs on ╔══════════════════════════════════════════════════════════════════╗ ║ PLANE 4 — INFRA (Orca + VMs + Gitea + Infisical + LiteLLM) ║ ╚══════════════════════════════════════════════════════════════════╝ ``` --- ## 4. Plane 1 — Identity **Technology:** Keycloak 26, single realm (`breakpilot-prod`) Keycloak is the only truth about who anyone is. Every other service validates JWTs issued here — nothing else handles auth logic. ### Structure ``` Realm: breakpilot-prod │ ├── Organizations (one per B2B customer) │ ├── Acme Corp → org_id: uuid-acme │ ├── BayernAG → org_id: uuid-bayernag │ └── ... │ ├── Organization Roles (what a user can do within their company) │ ├── IT_ADMIN — full portal access, user management, IdP config │ ├── CXO — dashboard, billing, audit (read) │ ├── FINANCE — billing, invoices │ ├── LEGAL — audit log, compliance read │ └── USER — product access only │ ├── Realm Roles (what we, the operators, can do) │ ├── BREAKPILOT_ADMIN — full backstage, impersonation, demo tenant edit │ ├── SUPPORT_ENGINEER — read backstage, limited impersonation │ └── SALES_REP — demo tenant login, CRM read, NO real-tenant access │ └── Identity Provider Brokering (per org, optional) ├── OIDC (Okta, Google Workspace, any OIDC provider) └── SAML (Azure AD, ADFS, any SAML 2.0 provider) ``` ### JWT Structure Every service receives a JWT containing: ``` sub — user UUID email — user email org_id — customer tenant UUID (= Keycloak org ID) org_name — human-readable company name org_roles — [IT_ADMIN, USER, ...] roles within their org realm_roles — [customer] | [BREAKPILOT_ADMIN] | [SUPPORT_ENGINEER] | [SALES_REP] products — [certifai, compliance] entitlements (injected by protocol mapper) plan — starter | professional | enterprise iss — https://auth.breakpilot.com/realms/breakpilot-prod ``` The `products` and `plan` claims are added by a Keycloak **protocol mapper** that reads live entitlements from the Tenant Registry at token issuance. Products do not need to call back to the registry on every request. --- ## 5. Plane 2 — Control Three distinct services. Clear separation of responsibility. ### 5a. Customer Portal **Technology:** Next.js 15 (new service) **Deployed at:** `*.breakpilot.com` via Orca-Proxy wildcard routing The front door for all customers and for us. Owns no business logic — it is a routing, auth, and UI layer. **Subdomain routing:** - DNS wildcard `*.breakpilot.com` → Orca-Proxy - Orca-Proxy reads `Host` header → routes all traffic to the portal container - Portal reads `Host` → extracts tenant slug → looks up Tenant Registry **Customer area** (requires valid JWT for their org): ``` /[slug]/dashboard product tiles, usage summary, activity /[slug]/catalog browse ALL products, subscribed and not (upgrade/upsell flow) /[slug]/products/ /certifai CERTifAI product area (subscribed only) /compliance breakpilot-compliance area (subscribed only) /[slug]/projects optional sub-tenancy: dev/staging/prod separation [IT_ADMIN] /[slug]/settings/ /identity IdP configuration [IT_ADMIN] /users invite, roles, deactivate [IT_ADMIN] /api-keys API keys for integrations [IT_ADMIN] /integrations webhooks, process hooks /[slug]/billing/ plan, usage, invoices [FINANCE, CXO, IT_ADMIN] /[slug]/audit/ platform + product audit, filterable by product [LEGAL, IT_ADMIN] /[slug]/support/ Frappe HD customer portal [all roles] ``` **Operating principles (borrowed from AWS/Azure/GCP consoles):** ``` 1. Role-based UI hiding The portal NEVER shows a button, link, or section the user cannot use. Disabled-with-tooltip is also wrong — hide it. The customer's mental model should be "the portal shows me what I can do," not "the portal teases me." 2. Browse before buy /catalog shows every product available on the platform with description, pricing tier, and a one-click "Request" CTA — even for products the customer is not subscribed to. Drives organic upsell instead of requiring sales touchpoints. 3. Hierarchy: Tenant → Project (optional) → Resources A tenant can have multiple projects (e.g., "Production", "Staging"). Products that support project scoping isolate data per project. Customers without sophistication operate as single-project (default). Mirrors GCP Project / AWS Account / Azure Resource Group pattern. 4. Cross-product activity log /audit shows portal events AND every product's audit events filtered by tenant. Filterable by product, actor, action, time range. One log to satisfy DPO inquiries instead of hunting per-product. 5. Cost and usage as first-class Billing page is not just "your invoice." Shows live usage per product, trend over time, and projected next invoice. Removes "bill shock." ``` **Backstage** (access by realm role): - `BREAKPILOT_ADMIN` — everything below - `SUPPORT_ENGINEER` — read all + impersonation, no create/delete - `SALES_REP` — `/backstage/leads`, `/backstage/demo`, own CRM activity only; CANNOT load any other `/backstage/tenants/[id]` route ``` /backstage/dashboard MRR, active tenants, system health /backstage/tenants/ /new create customer /[id]/overview health, logins, API volume /[id]/products enable/configure products /[id]/users view members, impersonate /[id]/billing Stripe + ERPNext view /[id]/support tickets for this customer /[id]/audit full audit trail /backstage/system/ /health all service health /incidents incident log /releases deployment history ``` ### 5b. ERPNext **Technology:** Frappe + ERPNext (self-hosted via Orca) **Access:** `erp.breakpilot.com` — us only (IP-restricted at Orca-Proxy) **Auth:** Keycloak OIDC — we log in with our existing accounts, no separate password ERPNext is our **business operations backbone**. We do not build CRM, invoicing, or HR — we configure ERPNext for these. | ERPNext Module | Used for | |---|---| | CRM | Leads, opportunities, deal pipeline | | Sales | Quotations, Sales Orders (= contracts) | | Accounts | Sales Invoices, payment tracking, DATEV export | | Buying | Our own SaaS costs, infrastructure invoices | | HR | Sharang + Benjamin as employees, expense claims | | Support (Frappe HD) | Customer tickets, SLA, escalation to Gitea | **Integration with the platform:** - ERPNext Customer record has a custom field `tenant_id` linking to the Tenant Registry - When a Sales Order is submitted in ERPNext → webhook → Tenant Registry `/tenants/{id}/activate` - Portal billing page reads invoices from ERPNext REST API server-side — customers never log into ERPNext directly - We (founders) create quotations, orders, and invoices inside ERPNext ### 5c. Tenant Registry **Technology:** Go service (new), PostgreSQL schema `tenant_registry` The glue between Keycloak, ERPNext, and the products. The technical source of truth for "what is this tenant, what do they have access to, how are they configured." **Key data it holds:** ``` tenants id, slug, name, erp_customer_id, stripe_cust_id, status, plan, trial/contract dates, sales_owner, kind (real | demo). status ∈ {demo, trial, active, frozen, archived}. demo — shared demo tenant; reset nightly; no billing trial — real customer in their N-day evaluation window active — paid, contract or self-serve plan frozen — read-only after cancel / non-payment (30d grace) archived — data export window closed; only audit log retained tenant_projects OPTIONAL sub-tenancy. id, tenant_id, name, slug, status. Customers without need operate as a single implicit "default" project. Products opt in via manifest (supports_projects: true) and accept an optional project_id parameter on tenant-scoped APIs. Mirrors GCP Project / AWS Account pattern. tenant_products tenant ↔ product, enabled, config (litellm_url, max_seats, modules_enabled), expires_at tenant_idp_config type (oidc/saml), metadata, verified audit_log every portal AND product action: who, what, when, from where, including impersonations. Indexed for cross-product search (filter by tenant + product + actor + action + time). Schema is Retraced-compatible so we can swap implementation without changing producers (see PRODUCT_INTEGRATION_SPEC.md §8.4). api_keys portal-owned. tenant_id, product, scopes, name, hash, created_by, last_used_at, revoked_at. Headless products call /internal/api-keys/verify to validate inbound keys. Single source of truth across all products. ``` **Links:** - `tenant.id` = `Keycloak org_id` (one-to-one) - `tenant.erp_customer_id` = `ERPNext Customer.name` (one-to-one) - `tenant.stripe_cust_id` = Stripe Customer ID (self-serve billing only) ### 5d. Demo Tenant (Shared) **Slug:** `demo` — reachable at `demo.breakpilot.com` **Status:** `demo` (never transitions; never billed) **Owner:** us (`BREAKPILOT_ADMIN` curates content; `SALES_REP` reads + logs in) A single, shared tenant pre-seeded with realistic-but-fake data covering CERTifAI + breakpilot-compliance. Sales reps use it to walk prospects through the product live. Prospects do NOT log in directly — the sales rep drives the screen. **How it differs from a real tenant:** ``` DEMO TENANT REAL TENANT ───────────────────────────────────── ─────────────────────────────────── status = demo status = trial | active billing disabled billing active audit emitted but not exported audit emitted and exportable nightly reset job restores fixtures data is permanent seed data loaded on reset: customer-owned data product.manifest.seed_data_url all real-tenant flows work otherwise same flows, same code paths ``` **Why shared and not per-prospect:** - Cheap (one tenant, no Orca provisioning per prospect) - Predictable (sales reps know exactly what's in there) - The known-quantity model — works in practice, matches what we have experience with - Tradeoff accepted: concurrent edits during the same day are visible across demo sessions; nightly reset hides this within 24h **Nightly reset:** - Cron job (3:00 Europe/Berlin) calls each product's `/v1/tenants/demo/reset` endpoint - Product fetches its fixtures from `catalog.seed_data_url` and restores - Reset is itself an audit event; failures page the on-call ### 5e. Frappe Helpdesk + Gitea Issues **Technology:** Frappe HD (installed on same Frappe bench as ERPNext), Gitea Issues **Support flow:** - Customer submits ticket via `/[slug]/support/` (Frappe HD customer portal, embedded or linked) - Agent (us) triages in Frappe HD agent UI at `erp.breakpilot.com` - If technical: agent clicks "Escalate to Engineering" → Frappe server script creates a Gitea issue in the relevant repo via Gitea REST API → issue URL stored on ticket - When Gitea issue is closed → Gitea webhook → Frappe HD → ticket marked "Resolved" --- ## 6. Plane 3 — Data ### CERTifAI Self-hosted GDPR-compliant AI dashboard. After updates, it is fully tenant-aware. **Multi-tenancy:** All MongoDB queries scoped by `org_id` from JWT **Auth:** Validates Keycloak JWT (JWKS endpoint), maps `org_roles` to product roles **LiteLLM:** Shared managed instance (Starter/Professional, API key per tenant) or customer-hosted (Enterprise, URL stored in `tenant_products.config`) **Role mapping:** | Portal role | CERTifAI role | |---|---| | IT_ADMIN | Admin | | CXO, USER | Member | | FINANCE, LEGAL | Viewer | ### breakpilot-compliance GDPR and AI-Act compliance automation platform. After updates, tenant identity comes from validated JWT — not raw client headers. **Multi-tenancy:** All PostgreSQL queries scoped by `tenant_id` (= `org_id` from JWT) **Auth:** Next.js proxy validates JWT → extracts `org_id` → sets `X-Tenant-ID` **Role mapping:** `LEGAL` can approve DSFA; `IT_ADMIN` is compliance admin; `USER` contributes to DSR/VVT workflows --- ## 7. Plane 4 — Infra **Orchestration:** Orca manages all containers on Hetzner VMs **Secrets:** Infisical — every service has a machine identity, pulls its own secrets at startup **CI/CD:** Gitea Actions → Docker build → push to private registry → Orca redeploy webhook **Routing:** Orca-Proxy handles all TLS termination and subdomain routing ``` Orca-Proxy routing table: auth.breakpilot.com → Keycloak erp.breakpilot.com → ERPNext + Frappe HD (IP-restricted) git.breakpilot.com → Gitea secrets.breakpilot.com → Infisical (IP-restricted) *.breakpilot.com → customer-portal (wildcard, Host → tenant) ``` **Services managed by Orca:** ``` Identity & Auth └── Keycloak 26 Business Operations ├── ERPNext (Frappe) └── Frappe Helpdesk Developer Tooling ├── Gitea └── Gitea Runner Secrets └── Infisical AI Inference └── LiteLLM (shared, API key per tenant; or customer-hosted for Enterprise) Customer Portal ├── customer-portal (new) └── tenant-registry (new) Products ├── certifai-dashboard └── breakpilot-compliance stack ├── backend-compliance (Python/FastAPI) ├── ai-compliance-sdk (Go) └── admin-compliance (Next.js) Data Stores ├── PostgreSQL 17 [schemas: tenant_registry, compliance] ├── MongoDB [CERTifAI] ├── Qdrant [compliance RAG] └── MinIO [compliance documents] ``` **Infisical secret namespacing:** ``` /prod/ /keycloak/ DB_PASS, ADMIN_PASS, REALM_KEYS /erpnext/ DB_PASS, SMTP_PASS, OIDC_CLIENT_SECRET /customer-portal/ KEYCLOAK_CLIENT_SECRET, ERP_API_KEY, REGISTRY_DB_URI /tenant-registry/ POSTGRES_URI, KEYCLOAK_ADMIN_SECRET, ERP_API_KEY, STRIPE_SECRET /certifai/ MONGODB_URI, KEYCLOAK_CLIENT_SECRET, LITELLM_MASTER_KEY /compliance/ POSTGRES_URI, QDRANT_API_KEY, MINIO_KEYS, ANTHROPIC_API_KEY /litellm/ OPENAI_API_KEY, ANTHROPIC_API_KEY, MASTER_KEY /gitea-runner/ DOCKER_REGISTRY_PASS, ORCA_WEBHOOK_TOKEN ``` --- ## 8. Process Sketches ### P1 — New Customer Onboarding (Sales-Led) ``` US (ERPNext) TENANT REGISTRY KEYCLOAK │ │ │ │ Lead → Opportunity │ │ │ → Quotation (PDF sent) │ │ │ → Sales Order submitted │ │ │─────── webhook ─────────────►│ │ │ │ create org ─────────►│ │ │◄──── org_id ──────────│ │ │ write tenant row │ │ │ write tenant_products│ │ │ send welcome email │ │ │ │ │ │ │ ▼ │ │ IT ADMIN receives email │ │ clicks setup link │ │ │ ┌───────┤ │ │ │set pw │ │ │ │ 2FA │ │ │ └───┬───┘ │ │ │ │ lands on /acme/dashboard │ │ │ │ ``` ### P2 — User Login (Customer's Own IdP) ``` USER ORCA-PROXY PORTAL KEYCLOAK CUSTOMER IdP │ │ │ │ │ │ acme.breakpilot.com │ │ │ │ │───────────────────────►│ │ │ │ │ │ Host=acme.* │ │ │ │ │───────────────►│ │ │ │ │ │ slug=acme │ │ │ │ │ lookup tenant │ │ │ │ │ → idp=acme-okta│ │ │ │ │─── redirect ──►│ │ │ │ │ kc_idp_hint │ │ │ │ │ │─── redirect ──►│ │ │ │ │ │ │ │◄─────────────────────── auth ──┤ │ │ │ │ │ issue JWT │ │ │ │◄── JWT ────────│ │ │◄── /acme/dashboard ────┤ │ │ │ ``` ### P3 — User Login (Our IdP — email + password) ``` USER PORTAL KEYCLOAK │ │ │ │ acme.yourplatform│ │ │──────────────────►│ │ │ │ redirect + PKCE │ │ │─────────────────►│ │◄── Keycloak login page ─────────────┤ │ enter email + password (+ TOTP) │ │─────────────────────────────────────►│ │ │◄── JWT ──────────│ │◄── /acme/dashboard┤ │ ``` ### P4 — IT Admin Configures External IdP ``` IT ADMIN PORTAL TENANT REGISTRY KEYCLOAK │ │ │ │ │ /settings/ │ │ │ │ identity │ │ │ │───────────────►│ │ │ │ fill OIDC/ │ │ │ │ SAML details │ │ │ │───────────────►│ │ │ │ │── PATCH idp_config►│ │ │ │ │── create IdP ────►│ │ │ │ for org │ │ │ │◄── ok ────────────│ │ │ │ verified=true │ │◄── "Test" btn ─┤ │ │ │ auth popup ───────────────────────────────────────────►│ │◄── success ────────────────────────────────────────────┤ │◄── "IdP configured" ┤ │ │ ``` ### P5 — IT Admin Invites a Team Member ``` IT ADMIN PORTAL KEYCLOAK NEW USER │ │ │ │ │ /settings/ │ │ │ │ users → invite│ │ │ │ email + role │ │ │ │───────────────►│ │ │ │ │ create user in org│ │ │ │──────────────────►│ │ │ │ │ send invite │ │ │ │ email ────────►│ │ │ │ │ click link │ │ │◄── set pw ─────│ │ │ │ (+ TOTP) │ │ │ │ issue JWT │ │ │◄─── JWT ──────────│ │ │ │ │ ┌──────┘ │ │ │ lands on│ │ │ │ /acme/dashboard │ │ │ (role-filtered view) ``` ### P6 — Customer Accesses a Product ``` USER PORTAL KEYCLOAK PRODUCT (e.g. CERTifAI) │ │ │ │ │ /acme/products/ │ │ │ │ certifai │ │ │ │─────────────────►│ │ │ │ │ check JWT: │ │ │ │ products claim │ │ │ │ includes │ │ │ │ "certifai" ? │ │ │ │ │ │ │ [YES] ────────┤ │ │ │ │ pass JWT ──────────────────────── │ │ │ │ validate JWKS │ │ │ │ extract org_id │ │ │ │ scope all data │ │◄── product UI ───┤ │ │ │ │ │ │ │ [NO] ─────────┤ │ │ │◄── "Not in your plan" + upgrade CTA │ ``` ### P7 — Finance User Views Billing ``` FINANCE USER PORTAL ERPNEXT API STRIPE API │ │ │ │ │ /acme/billing│ │ │ │──────────────►│ │ │ │ │ role check: │ │ │ │ FINANCE → ok │ │ │ │ │ │ │ │── fetch invoices►│ │ │ │◄── invoice list ─│ │ │ │ │ │ │ │── fetch usage ────────────────────► │ │ │◄── usage data ────────────────────── │ │ │ │ │ │◄── billing page renders │ │ │ plan · usage · invoices │ │ │ [Download PDF] ──────────────►│ │ │◄── PDF streamed ─────────────────│ │ │ │ │ │ [Upgrade Plan] │ │ │──────────────►│ │ │ │ │ create CRM task in ERPNext │ │ │─────────────────►│ │ │ │ notify us (email/ERPNext task) │ │◄── "We'll be in touch" ──┤ │ │ ``` ### P8 — Legal User Exports Audit Report ``` LEGAL USER PORTAL TENANT REGISTRY COMPLIANCE PRODUCT │ │ │ │ │ /acme/audit │ │ │ │──────────────►│ │ │ │ │ role check: │ │ │ │ LEGAL → ok │ │ │ │ │ │ │ │── platform audit ──────────────────► │ │ (who logged in, role changes, │ │ │ IdP changes, impersonations) │ │ │◄── audit_log rows ─────────────────┤ │ │ │ │ │ │── compliance audit ────────────────► │ │ (DSFA approvals, DSR processing, │ │ │ TOM completions) │ │ │◄── compliance audit rows ──────────┤ │ │ │ │ │ [Export] │ │ │ │──────────────►│ │ │ │◄── ZIP: │ │ │ │ platform-audit.csv │ │ │ compliance-audit.pdf │ │ ``` ### P9 — Support Ticket Escalated to Engineering ``` CUSTOMER FRAPPE HD US (AGENT) GITEA │ │ │ │ │ submit ticket │ │ │ │ via /support/ │ │ │ │─────────────────►│ │ │ │ │── notify agent ─►│ │ │ │ │ triage ticket │ │ │ │ → technical bug │ │ │ │ │ │ │ │ [Escalate] │ │ │◄─────────────────│ │ │ │ server script: │ │ │ │ POST /issues ────────────────────► │ │ │ │ {title, body, │ │ │ │ labels:[bug]} │ │ │◄──── issue URL ───────────────────┤ │ │ store on ticket │ │ │◄── "Escalated to engineering, we'll update you" ─────┤ │ │ │ │ │ │ │ dev fixes it │ │ │ │ closes issue ─►│ │ │◄──────── webhook ──────────────────│ │ │ ticket → Resolved│ │ │◄── notification ─│ │ │ ``` ### P10 — We Create a New Customer (Startup Flow) ``` US (BACKSTAGE) TENANT REGISTRY KEYCLOAK ERPNEXT IT ADMIN │ │ │ │ │ │ /backstage/ │ │ │ │ │ tenants/new │ │ │ │ │ fill: name, │ │ │ │ │ contact, plan, │ │ │ │ │ products │ │ │ │ │──── [Create] ────►│ │ │ │ │ │── create org ───►│ │ │ │ │◄── org_id ───────│ │ │ │ │── create Customer ──────────────►│ │ │ │◄── erp_customer_id ──────────────│ │ │ │ write tenant rows │ │ │ │ send welcome email ─────────────────────────── │ │◄── tenant created ┤ │ │ │ │ "Awaiting setup"│ │ │ │ │ │ │ │ click link│ │ │ │◄── set pw ────────────────── │ │ │ │ + 2FA │ │ │ │ │ JWT issued │ │ │ │◄─────────────────────────────────────── /acme/ ─┤ ``` ### P11 — We Debug a Customer Issue (Impersonation) ``` US (BACKSTAGE) TENANT REGISTRY KEYCLOAK PORTAL (AS CUSTOMER) │ │ │ │ │ /backstage/ │ │ │ │ tenants/acme/ │ │ │ │ users → │ │ │ │ Impersonate Alice│ │ │ │──────────────────►│ │ │ │ │ write audit_log │ │ │ │ {action: │ │ │ │ impersonate, │ │ │ │ actor: sharang, │ │ │ │ target: alice} │ │ │ │── request token ►│ │ │ │◄── imp. token ───│ │ │◄── token ─────────┤ (30min, signed, │ │ │ │ impersonated_by │ │ │ │ claim) │ │ │ │ │ │ new tab: acme.breakpilot.com │ │ │──────────────────────────────────────────────────────────►│ │ │ [orange banner] │ │ │ Impersonating │ │ │ alice@acme.com │ │ │ 29:47 remaining │ │ reproduce issue, identify root cause │ │ │──────────────────────────────────────────────────────────►│ │ │ [Exit impersonation] ``` ### P12 — ERPNext Sales Order Activates a Tenant ``` US (ERPNEXT) ERPNEXT TENANT REGISTRY KEYCLOAK IT ADMIN │ │ │ │ │ │ Sales Order │ │ │ │ │ Submit ──── ►│ │ │ │ │ │── webhook ───────►│ │ │ │ │ {order_id, │ │ │ │ │ tenant_id, │ │ │ │ │ products, │ │ │ │ │ plan, │ │ │ │ │ contract_start │ │ │ │ │ contract_end} │ │ │ │ │ │ tenant.status │ │ │ │ │ = active │ │ │ │ │ tenant_products │ │ │ │ │ enabled=true │ │ │ │ │── update claims ─►│ │ │ │ │ (protocol mapper│ │ │ │ │ picks up new │ │ │ │ │ entitlements) │ │ │ │ │── send email ─────────────────►│ │ │ │ "Subscription │ │ │ │ │ now active" │ │ ``` ### P13 — Customer Browses Catalog and Requests a New Product ``` USER (any role) PORTAL TENANT REGISTRY ERPNEXT (CRM) │ │ │ │ │ /acme/catalog │ │ │ │──────────────►│ │ │ │ │── GET /catalog ──────►│ │ │ │◄── product manifests +│ │ │ │ subscribed status │ │ │◄── catalog page │ │ │ • CERTifAI [✓ Subscribed] │ │ │ • Compliance [✓ Subscribed] │ │ │ • Notetaker [+ Request] │ │ │ • Classifier [+ Request] │ │ │ │ │ │ click [Request] on Notetaker │ │ │ Modal: "Why do you want this?" │ │ │ + estimated seats / volume │ │ │──────────────►│ │ │ │ │── POST /catalog/ │ │ │ │ request ─────────►│ │ │ │ {tenant, product, │ │ │ │ requested_by, note}│ │ │ │ │── create CRM Lead ──►│ │ │ │ linked to Customer │ │ │ │◄── lead_id ──────────│ │ │ │ notify sales_owner │ │ │ │ (email + ERPNext │ │ │ │ activity) │ │◄── "We'll be in touch within 1 day" ──│ │ ``` ### P14 — Sales Rep Demos to a Prospect (Shared Demo Tenant) ``` SALES REP KEYCLOAK PORTAL DEMO TENANT │ │ │ │ │ open Zoom with prospect, share screen │ │ │ │ demo.breakpilot.com │ │────────────────────────────────►│ │ │ │ │ Host: demo │ │ │ │ → slug = demo │ │ │ │ → tenant.kind=demo │ │ │ │ tenant.status=demo │ │ │ │ │ │ │ OIDC redirect │ │ │◄──────────────│─────────────────│ │ │ login sales@breakpilot │ │ realm_role=SALES_REP │ │──────────────►│ │ │ │ │ verify SALES_REP allowed on demo only │ │ │ issue JWT: │ │ │ org_id=demo, org_roles=[IT_ADMIN], │ │ │ realm_roles=[SALES_REP], │ │ │ products=[certifai, compliance] │ │◄──────────────│ │ │ │ │ │ │ /demo/dashboard ───────────────►│ │ │ /demo/products/certifai ─►│ load custom elt ►│ │ /demo/products/compliance ─►│ load custom elt ►│ │◄── show prospect every flow ────│ │ │ │ │ │ if prospect interested: │ │ click [Request Trial] in /demo/catalog │ │ modal: prospect email, company, est. seats │ │ → POST /catalog/trial-request │ │ creates CRM Lead in ERPNext, NOT a tenant │ │ sales_owner = the logged-in SALES_REP │ │ │ │ 03:00 nightly: │ │ cron → product /v1/tenants/demo/reset │ │ fixtures from catalog.seed_data_url restored │ │ demo is clean for next day │ ``` **Guardrails:** - Keycloak policy: `SALES_REP` realm role MUST NOT be issued a token with `org_id ≠ demo` - Backstage policy: `SALES_REP` CANNOT see real-tenant data, CAN see CRM (their leads) - Real customer support is NEVER done from a SALES_REP login ### P15 — Self-Serve Trial → Convert or Expire ``` PROSPECT PORTAL TENANT REGISTRY ERPNEXT KEYCLOAK │ │ │ │ │ │ breakpilot.com/start │ │ │ │──────────────►│ │ │ │ │ form: email, company, password │ │ │ │──────────────►│ │ │ │ │ │── POST /trials ──────►│ │ │ │ │ {email, company, │ │ │ │ │ requested_products} │ │ │ │ │ │ │ │ │ │ │ slugify(company) │ │ │ │ │ create tenant │ │ │ │ │ status=trial │ │ │ │ │ trial_ends_at = │ │ │ │ │ now + 14d │ │ │ │ │ create Customer ►│ │ │ │ │ tier=Trial │ │ │ │ │ sales_owner= │ │ │ │ │ unassigned │ │ │ │ │── create org ───────────────►│ │ │ │ + IT_ADMIN user │ │ │ │ + verify email │ │ │ │ │ │ │◄── magic link │ │ │ │ │ click link, set password │ │ │ │ land on /acme-trial/dashboard │ │ │ │ banner: "Trial: 14 days left — Add billing to keep your data" │ │ │ │ │ │ ── customer uses platform normally ── │ │ │ │ │ │ │ DAY 7 cron: trial_ends_at - 7d │ │ │ │ → email IT_ADMIN + CXO │ │ │ │ → CRM Activity: "Day-7 nudge" ►│ │ │ │ │ │ │ │ DAY 12: same, urgent tone │ │ │ │ DAY 14: trial_ends_at reached │ │ │ │ │ │ │ │ IF customer added payment: │ │ │ │ status: trial → active │ │ │ │ Stripe subscription created │ │ │ │ OR Sales Order in ERPNext signed │ │ │ │ banner removed │ │ │ │ │ │ ELSE: │ │ status: trial → frozen │ │ │ │ 30-day grace: portal read-only, products return 402 │ │ │ daily reminder email until day 44 │ │ │ │ │ DAY 44: frozen → archived │ │ │ │ GDPR export ZIP emailed to IT_ADMIN │ │ │ │ each product called: DELETE /v1/tenants/{id}/data │ │ │ 30 days later: tenant row deleted (audit_log retained 7y) │ ``` **Trial scoping:** - All paid products are available in trial mode by default unless `catalog.available_on_plans` excludes `trial` - Usage-billed products (e.g., LiteLLM calls) get a hard cap during trial (manifest: `trial_quota`) - Customer can upgrade plan mid-trial; trial timer just stops, no proration ### P16 — Customer Cancels and Offboards ``` IT ADMIN PORTAL TENANT REGISTRY PRODUCTS ERPNEXT │ │ │ │ │ │ /acme/settings/billing │ │ │ │──────────────►│ │ │ │ │ [Cancel Subscription] │ │ │ │ Modal: │ │ │ │ • reason (dropdown) │ │ │ │ • confirm typing "acme" │ │ │ │ • shows: data retained 30d, then deleted │ │ │──────────────►│ │ │ │ │ │── POST /tenants/ │ │ │ │ │ acme/cancel ─────►│ │ │ │ │ │ status: active │ │ │ │ │ → frozen │ │ │ │ │ frozen_at = now │ │ │ │ │ delete_at = │ │ │ │ │ now + 30d │ │ │ │ │── Stripe cancel │ │ │ │ │ at_period_end │ │ │ │ │── opportunity ──────────────────►│ │ │ │ stage=Lost │ │ │ │ │ reason=... │ │ │ │ │── notify sales_owner │ │ │ │ (could reach out for save) │ │◄── confirmation page │ │ │ │ "Frozen until . Download your data anytime." │ │ │ │ │ │ │ frozen state: │ │ │ │ portal works READ-ONLY │ │ │ │ /export available (all products) │ │ │ │ product APIs return 402 on writes │ │ │ │ │ │ │ │ if customer changes mind within 30d: │ │ │ [Reactivate] → status: frozen → active │ │ │ no data loss │ │ │ │ │ │ │ DAY 30 cron: │ │ │ │ tenant.delete_at reached │ │ │ │ build final export ZIP per product ►│ /v1/tenants/{id}/export │ │ email ZIP link to IT_ADMIN + CXO │ │ │ │ (signed URL, 7-day TTL) │ │ │ │ for each product: │ │ │ DELETE /v1/tenants/{id}/data ────►│ │ │ │ Keycloak: org archived, users disabled │ │ │ ERPNext Customer: status=Inactive │ │ │ tenant.status = archived │ │ │ │ │ │ audit_log retained 7y per GDPR / accounting │ │ ``` **Self-serve vs. enterprise:** - Stripe-billed customers cancel in-portal; flow above - ERPNext-billed (enterprise) customers send written notice; sales rep updates Sales Order; flow runs from `/backstage/tenants/[id]/lifecycle` with the same downstream effects ### Headless Product Flows P1–P13 cover **interactive** products that ship a UI. Products declared as `frontend.type = headless` (see PRODUCT_INTEGRATION_SPEC.md §5) ship no frontend code — customers configure them through a portal-rendered UI and consume them via API/MCP from their own systems. Examples: a notetaker bot, a document classifier, a webhook router, a compliance reporter. The portal still hosts these products end-to-end: the customer area, billing, audit, and Backstage all work the same. Only the "use the product" surface changes from a UI to API keys + webhooks. ### H1 — Customer Enables a Headless Product ``` IT ADMIN PORTAL TENANT REGISTRY HEADLESS PRODUCT │ │ │ │ │ /acme/products│ │ │ │ /notetaker │ │ │ │──────────────►│ │ │ │ │ load manifest │ │ │ │ frontend.type = │ │ │ │ "headless" │ │ │ │ │ │ │ │ render portal-owned │ │ │ │ config page from │ │ │ │ manifest sections: │ │ │ │ • API Keys │ │ │ │ • Webhooks │ │ │ │ • Usage chart │ │ │ │ • Docs link │ │ │ │ • Code samples │ │ │◄── page ──────│ │ │ ``` ### H2 — Generate API Key for a Headless Product ``` IT ADMIN PORTAL TENANT REGISTRY HEADLESS PRODUCT │ │ │ │ │ [Generate Key]│ │ │ │ name: "prod" │ │ │ │ scopes:[r,w] │ │ │ │──────────────►│ │ │ │ │── POST /api-keys ────►│ │ │ │ {tenant, scopes, │ │ │ │ product, name} │ │ │ │ │ generate raw key │ │ │ │ store HASH only │ │ │ │ bind: tenant + │ │ │ │ product + │ │ │ │ scopes │ │ │◄── raw key (once) ────│ │ │◄── show once ─│ │ │ │ "Copy now — │ │ │ │ won't show │ │ │ │ again" │ │ │ ``` ### H3 — Customer's System Calls the Headless Product ``` CUSTOMER SYSTEM HEADLESS PRODUCT TENANT REGISTRY │ │ │ │ POST /v1/sessions │ │ │ Auth: ApiKey k_xxx │ │ │ X-Tenant: acme │ │ │──────────────────────►│ │ │ │ validate key ───────►│ │ │ → tenant_id, │ │ │ scopes │ │ │◄─────────────────────│ │ │ enforce scope │ │ │ tenant_id in EVERY │ │ │ DB query │ │ │ process request │ │ │ emit usage ─────────►│ │ │ emit audit ─────────►│ │◄── 200 response ──────│ │ ``` ### H4 — Async Result Delivered via Webhook ``` HEADLESS PRODUCT CUSTOMER WEBHOOK URL PORTAL (delivery log) │ │ │ │ async job finishes │ │ │ load webhook config │ │ │ for this tenant + │ │ │ this event type │ │ │ │ │ │ POST customer URL ─────►│ │ │ Body: {event, result, │ │ │ tenant, signature} │ │ │◄── 200 ─────────────────│ │ │ log delivery ────────────────────────────────────► │ (success/fail, ts, │ │ │ response code) │ │ │ │ │ │ if delivery fails: │ │ │ retry with backoff │ │ │ 3 attempts, then │ │ │ dead-letter │ │ │ visible in portal at │ │ │ /webhooks/deliveries │ │ ``` ### H5 — Headless Product Tile on Customer Dashboard ``` USER PORTAL TENANT REGISTRY HEADLESS PRODUCT │ │ │ │ │ /acme/dashboard │ │ │ │─────────────────►│ │ │ │ │ for each entitled │ │ │ │ product in JWT: │ │ │ │ │ │ │ │ type=interactive → │ │ │ │ render "Open" tile │ │ │ │ │ │ │ │ type=widget → │ │ │ │ load widget bundle │ │ │ │ render custom elt │ │ │ │ │ │ │ │ type=headless → │ │ │ │ GET /v1/usage ─────────────────────────────►│ │ │◄────────────────────── usage summary ────────│ │ │ render stat tile: │ │ │ │ "Notetaker │ │ │ │ 142 sessions │ │ │ │ last 30d" │ │ │ │ click → goes to │ │ │ │ /products/notetaker │ │ │◄── dashboard ────│ │ │ ``` ### H6 — Backstage Operates a Headless Product ``` US (BACKSTAGE) PORTAL TENANT REGISTRY HEADLESS PRODUCT │ │ │ │ │ /backstage/ │ │ │ │ tenants/acme/ │ │ │ │ products/ │ │ │ │ notetaker │ │ │ │──────────────►│ │ │ │ │ NO "Impersonate" btn │ │ │ │ (no UI to enter) │ │ │ │ │ │ │ │ shows: │ │ │ │ • Health │ │ │ │ • Usage 30/90d │ │ │ │ • API call errors │ │ │ │ • Webhook deliveries │ │ │ │ • Failed deliveries │ │ │ │ • Admin actions from │ │ │ │ manifest: │ │ │ │ [Flush queue] │ │ │ │ [Rotate keys] │ │ │ │ [Reset state] │ │ │ [Flush queue] │ │ │ │──────────────►│ │ │ │ │── service token ─────►│ │ │ │ POST /admin/flush ─────────────────────────► │ │ │ audit event │ │ │◄─────────────────────────────────── ok ───── │ │◄── done ──────│ │ │ ``` --- ## 9. Technology Decisions (Locked) | Decision | Choice | Rationale | |---|---|---| | Identity | Keycloak, single realm | Already in CERTifAI; Organizations + IdP brokering built-in | | Tenant model | Keycloak Organization per customer | Native isolation, JWT claims, no custom multi-tenant auth code | | Subdomain routing | Orca-Proxy, wildcard cert | Consistent with existing infra; tenant from `Host` header | | Secret management | Infisical, machine identity per service | Uniform across all services; path-namespaced per service | | Business operations | ERPNext (Frappe) | CRM + sales + invoicing + HR in one; avoids building our own | | Customer support | Frappe Helpdesk | Same Frappe bench as ERPNext; native customer-ticket-account link | | Engineering issues | Gitea Issues | Already running Gitea; Frappe HD → Gitea via REST API (server script) | | Data isolation | Logical (tenant_id / org_id columns) | Sufficient for Starter/Professional; physical isolation offered for Enterprise | | Billing — self-serve | Stripe (Starter, Professional) | Standard; portal billing page reads Stripe | | Billing — enterprise | ERPNext Sales Invoices | Manual invoicing, DATEV export for accountant | | Customer portal | New Next.js 15 app | Clean slate; existing admin apps have product-specific chrome | | Tenant Registry | New Go service | Thin glue layer; owns entitlements, IdP config, audit log | | Products scope | CERTifAI + breakpilot-compliance only | Dataroom and pitch-deck out of scope | --- ## 10. Open Items / Phasing ### Phase 0 — Foundation (pilot-ready, one real customer) - Orca-Proxy: wildcard TLS, subdomain routing table - Infisical: machine identities + secrets for all existing services - Keycloak: Organizations enabled, realm roles (incl. `SALES_REP`), one test org - Tenant Registry: core schema + API (`/tenants` CRUD + `/activate`), `status` enum - Backstage minimal: create tenant form, tenant list, impersonation - Portal login: subdomain detection → Keycloak OIDC → tenant context - CERTifAI: MongoDB-backed sessions, `org_id` query scoping, role enforcement - breakpilot-compliance: JWT → `X-Tenant-ID` validated at Next.js proxy - **Demo tenant `demo` seeded**; sales rep can log in and walk a screen-shared prospect ### Phase 1 — Customer-Facing Portal - Full customer dashboard, product tiles, usage summary - User management and invite flow - IdP configuration wizard (OIDC + SAML) - Billing page (ERPNext invoices + Stripe usage) - Audit log page and CSV/PDF export - Frappe HD embedded in `/[slug]/support/` ### Phase 2 — Business Operations - ERPNext configured: CRM, Sales Orders, Invoicing, HR - ERPNext → Tenant Registry webhook (Sales Order submit → tenant activate) - Frappe HD → Gitea escalation (server script) - Backstage health dashboard (service health, incidents) - Keycloak protocol mapper (products + plan injected into JWT) - **Self-serve trial flow P15**: `/start` form, 14-day timer, day-7/12/14 emails, trial → frozen → archived state machine - **Cancel + offboard flow P16**: cancel modal, 30-day frozen window, automated final-export ZIP, GDPR erasure call to every product - **Demo nightly reset**: cron at 03:00 Europe/Berlin calls each product's `/v1/tenants/demo/reset` ### Phase 3 — Product API Surface - CERTifAI: OpenAPI spec, `/api/v1/health` + `/api/v1/usage` - breakpilot-compliance: OpenAPI spec, `/api/v1/usage` - Customer-facing API keys (IT Admin generates, scoped to their org) - LiteLLM per-tenant API key metering → usage data in portal ### Phase 4 — Enterprise Tier - Physical data isolation option (dedicated PostgreSQL schema per tenant) - Customer-hosted LiteLLM (URL stored in `tenant_products.config`) - Custom domain support (`compliance.acme.com` → Orca-Proxy → portal) - MCP servers per product (CERTifAI MCP, compliance MCP) - SLA enforcement in Frappe HD per plan tier --- *End of document. Updated after design review 2026-05-11.*