Files
docs/PLATFORM_ARCHITECTURE.md
sharang 03a5b4846e
ci / shared (push) Successful in 4s
chore(domain): yourplatform.com → breakpilot.com
Apply platform-domain decision (2026-05-18). No services touched; docs/config only.

Refs: M1.1
2026-05-18 20:28:41 +00:00

69 KiB
Raw Permalink Blame History

Platform Architecture — B2B Customer Portal

Status: Design Draft
Authors: Sharang, Benjamin
Date: 2026-05-11


1. Vision

We sell CERTifAI and breakpilot-compliance as modular B2B building blocks. Customers buy one or both and operate them inside a unified customer portal — without needing to understand that they are separate products under the hood.

Each customer is a tenant: fully isolated data, their own user base, their own identity configuration. We manage all tenants from a single operator backstage.

ERPNext runs our company: CRM, sales orders, invoicing, HR. Frappe Helpdesk runs customer support. Gitea runs engineering. Everything else — Keycloak, all product services, all databases — runs on our own infrastructure managed by Orca.


2. Products in Scope

Product What it is
CERTifAI Self-hosted GDPR-compliant AI admin dashboard. Manages LLMs, AI agents, MCP servers, usage analytics. Built with Rust/Dioxus.
breakpilot-compliance GDPR and AI-Act compliance automation. Covers DSFA, VVT, TOM, DSR, AI Act, risk, vendor, incidents. Built with Python/FastAPI + Go AI SDK + Next.js.

Out of scope: breakpilot-dataroom, breakpilot-lehrer, breakpilot-pitch-deck.


3. The Four Planes

╔══════════════════════════════════════════════════════════════════╗
║  PLANE 1 — IDENTITY  (logical root, all auth flows through here) ║
╚══════════════════════════════════════════════════════════════════╝
                              ↓  JWT
╔══════════════════════════════════════════════════════════════════╗
║  PLANE 2 — CONTROL  (portal + ERPNext + tenant registry)        ║
╚══════════════════════════════════════════════════════════════════╝
                              ↓  tenant-scoped API calls
╔══════════════════════════════════════════════════════════════════╗
║  PLANE 3 — DATA  (CERTifAI + breakpilot-compliance)             ║
╚══════════════════════════════════════════════════════════════════╝
                              ↓  everything runs on
╔══════════════════════════════════════════════════════════════════╗
║  PLANE 4 — INFRA  (Orca + VMs + Gitea + Infisical + LiteLLM)   ║
╚══════════════════════════════════════════════════════════════════╝

4. Plane 1 — Identity

Technology: Keycloak 26, single realm (breakpilot-prod)

Keycloak is the only truth about who anyone is. Every other service validates JWTs issued here — nothing else handles auth logic.

Structure

Realm: breakpilot-prod
│
├── Organizations  (one per B2B customer)
│   ├── Acme Corp       → org_id: uuid-acme
│   ├── BayernAG        → org_id: uuid-bayernag
│   └── ...
│
├── Organization Roles  (what a user can do within their company)
│   ├── IT_ADMIN   — full portal access, user management, IdP config
│   ├── CXO        — dashboard, billing, audit (read)
│   ├── FINANCE    — billing, invoices
│   ├── LEGAL      — audit log, compliance read
│   └── USER       — product access only
│
├── Realm Roles  (what we, the operators, can do)
│   ├── BREAKPILOT_ADMIN    — full backstage, impersonation, demo tenant edit
│   ├── SUPPORT_ENGINEER    — read backstage, limited impersonation
│   └── SALES_REP           — demo tenant login, CRM read, NO real-tenant access
│
└── Identity Provider Brokering  (per org, optional)
    ├── OIDC  (Okta, Google Workspace, any OIDC provider)
    └── SAML  (Azure AD, ADFS, any SAML 2.0 provider)

JWT Structure

Every service receives a JWT containing:

sub          — user UUID
email        — user email
org_id       — customer tenant UUID (= Keycloak org ID)
org_name     — human-readable company name
org_roles    — [IT_ADMIN, USER, ...]  roles within their org
realm_roles  — [customer] | [BREAKPILOT_ADMIN] | [SUPPORT_ENGINEER] | [SALES_REP]
products     — [certifai, compliance]  entitlements (injected by protocol mapper)
plan         — starter | professional | enterprise
iss          — https://auth.breakpilot.com/realms/breakpilot-prod

The products and plan claims are added by a Keycloak protocol mapper that reads live entitlements from the Tenant Registry at token issuance. Products do not need to call back to the registry on every request.


5. Plane 2 — Control

Three distinct services. Clear separation of responsibility.

5a. Customer Portal

Technology: Next.js 15 (new service)
Deployed at: *.breakpilot.com via Orca-Proxy wildcard routing

The front door for all customers and for us. Owns no business logic — it is a routing, auth, and UI layer.

Subdomain routing:

  • DNS wildcard *.breakpilot.com → Orca-Proxy
  • Orca-Proxy reads Host header → routes all traffic to the portal container
  • Portal reads Host → extracts tenant slug → looks up Tenant Registry

Customer area (requires valid JWT for their org):

/[slug]/dashboard         product tiles, usage summary, activity
/[slug]/catalog           browse ALL products, subscribed and not (upgrade/upsell flow)
/[slug]/products/
  /certifai               CERTifAI product area (subscribed only)
  /compliance             breakpilot-compliance area (subscribed only)
/[slug]/projects          optional sub-tenancy: dev/staging/prod separation [IT_ADMIN]
/[slug]/settings/
  /identity               IdP configuration        [IT_ADMIN]
  /users                  invite, roles, deactivate [IT_ADMIN]
  /api-keys               API keys for integrations [IT_ADMIN]
  /integrations           webhooks, process hooks
/[slug]/billing/          plan, usage, invoices     [FINANCE, CXO, IT_ADMIN]
/[slug]/audit/            platform + product audit, filterable by product [LEGAL, IT_ADMIN]
/[slug]/support/          Frappe HD customer portal [all roles]

Operating principles (borrowed from AWS/Azure/GCP consoles):

1. Role-based UI hiding
   The portal NEVER shows a button, link, or section the user cannot use.
   Disabled-with-tooltip is also wrong — hide it. The customer's mental model
   should be "the portal shows me what I can do," not "the portal teases me."

2. Browse before buy
   /catalog shows every product available on the platform with description,
   pricing tier, and a one-click "Request" CTA — even for products the
   customer is not subscribed to. Drives organic upsell instead of
   requiring sales touchpoints.

3. Hierarchy: Tenant → Project (optional) → Resources
   A tenant can have multiple projects (e.g., "Production", "Staging").
   Products that support project scoping isolate data per project.
   Customers without sophistication operate as single-project (default).
   Mirrors GCP Project / AWS Account / Azure Resource Group pattern.

4. Cross-product activity log
   /audit shows portal events AND every product's audit events filtered
   by tenant. Filterable by product, actor, action, time range. One log
   to satisfy DPO inquiries instead of hunting per-product.

5. Cost and usage as first-class
   Billing page is not just "your invoice." Shows live usage per product,
   trend over time, and projected next invoice. Removes "bill shock."

Backstage (access by realm role):

  • BREAKPILOT_ADMIN — everything below
  • SUPPORT_ENGINEER — read all + impersonation, no create/delete
  • SALES_REP/backstage/leads, /backstage/demo, own CRM activity only; CANNOT load any other /backstage/tenants/[id] route
/backstage/dashboard      MRR, active tenants, system health
/backstage/tenants/
  /new                    create customer
  /[id]/overview          health, logins, API volume
  /[id]/products          enable/configure products
  /[id]/users             view members, impersonate
  /[id]/billing           Stripe + ERPNext view
  /[id]/support           tickets for this customer
  /[id]/audit             full audit trail
/backstage/system/
  /health                 all service health
  /incidents              incident log
  /releases               deployment history

5b. ERPNext

Technology: Frappe + ERPNext (self-hosted via Orca)
Access: erp.breakpilot.com — us only (IP-restricted at Orca-Proxy)
Auth: Keycloak OIDC — we log in with our existing accounts, no separate password

ERPNext is our business operations backbone. We do not build CRM, invoicing, or HR — we configure ERPNext for these.

ERPNext Module Used for
CRM Leads, opportunities, deal pipeline
Sales Quotations, Sales Orders (= contracts)
Accounts Sales Invoices, payment tracking, DATEV export
Buying Our own SaaS costs, infrastructure invoices
HR Sharang + Benjamin as employees, expense claims
Support (Frappe HD) Customer tickets, SLA, escalation to Gitea

Integration with the platform:

  • ERPNext Customer record has a custom field tenant_id linking to the Tenant Registry
  • When a Sales Order is submitted in ERPNext → webhook → Tenant Registry /tenants/{id}/activate
  • Portal billing page reads invoices from ERPNext REST API server-side — customers never log into ERPNext directly
  • We (founders) create quotations, orders, and invoices inside ERPNext

5c. Tenant Registry

Technology: Go service (new), PostgreSQL schema tenant_registry

The glue between Keycloak, ERPNext, and the products. The technical source of truth for "what is this tenant, what do they have access to, how are they configured."

Key data it holds:

tenants              id, slug, name, erp_customer_id, stripe_cust_id,
                     status, plan, trial/contract dates, sales_owner,
                     kind (real | demo).
                     status ∈ {demo, trial, active, frozen, archived}.
                       demo     — shared demo tenant; reset nightly; no billing
                       trial    — real customer in their N-day evaluation window
                       active   — paid, contract or self-serve plan
                       frozen   — read-only after cancel / non-payment (30d grace)
                       archived — data export window closed; only audit log retained

tenant_projects      OPTIONAL sub-tenancy. id, tenant_id, name, slug,
                     status. Customers without need operate as a single
                     implicit "default" project. Products opt in via
                     manifest (supports_projects: true) and accept an
                     optional project_id parameter on tenant-scoped APIs.
                     Mirrors GCP Project / AWS Account pattern.

tenant_products      tenant ↔ product, enabled, config (litellm_url,
                     max_seats, modules_enabled), expires_at

tenant_idp_config    type (oidc/saml), metadata, verified

audit_log            every portal AND product action: who, what, when,
                     from where, including impersonations. Indexed for
                     cross-product search (filter by tenant + product +
                     actor + action + time). Schema is Retraced-compatible
                     so we can swap implementation without changing
                     producers (see PRODUCT_INTEGRATION_SPEC.md §8.4).

api_keys             portal-owned. tenant_id, product, scopes, name,
                     hash, created_by, last_used_at, revoked_at.
                     Headless products call /internal/api-keys/verify
                     to validate inbound keys. Single source of truth
                     across all products.

Links:

  • tenant.id = Keycloak org_id (one-to-one)
  • tenant.erp_customer_id = ERPNext Customer.name (one-to-one)
  • tenant.stripe_cust_id = Stripe Customer ID (self-serve billing only)

5d. Demo Tenant (Shared)

Slug: demo — reachable at demo.breakpilot.com Status: demo (never transitions; never billed) Owner: us (BREAKPILOT_ADMIN curates content; SALES_REP reads + logs in)

A single, shared tenant pre-seeded with realistic-but-fake data covering CERTifAI + breakpilot-compliance. Sales reps use it to walk prospects through the product live. Prospects do NOT log in directly — the sales rep drives the screen.

How it differs from a real tenant:

DEMO TENANT                              REAL TENANT
─────────────────────────────────────    ───────────────────────────────────
status = demo                            status = trial | active
billing disabled                         billing active
audit emitted but not exported           audit emitted and exportable
nightly reset job restores fixtures      data is permanent
seed data loaded on reset:               customer-owned data
  product.manifest.seed_data_url
all real-tenant flows work otherwise     same flows, same code paths

Why shared and not per-prospect:

  • Cheap (one tenant, no Orca provisioning per prospect)
  • Predictable (sales reps know exactly what's in there)
  • The known-quantity model — works in practice, matches what we have experience with
  • Tradeoff accepted: concurrent edits during the same day are visible across demo sessions; nightly reset hides this within 24h

Nightly reset:

  • Cron job (3:00 Europe/Berlin) calls each product's /v1/tenants/demo/reset endpoint
  • Product fetches its fixtures from catalog.seed_data_url and restores
  • Reset is itself an audit event; failures page the on-call

5e. Frappe Helpdesk + Gitea Issues

Technology: Frappe HD (installed on same Frappe bench as ERPNext), Gitea Issues

Support flow:

  • Customer submits ticket via /[slug]/support/ (Frappe HD customer portal, embedded or linked)
  • Agent (us) triages in Frappe HD agent UI at erp.breakpilot.com
  • If technical: agent clicks "Escalate to Engineering" → Frappe server script creates a Gitea issue in the relevant repo via Gitea REST API → issue URL stored on ticket
  • When Gitea issue is closed → Gitea webhook → Frappe HD → ticket marked "Resolved"

6. Plane 3 — Data

CERTifAI

Self-hosted GDPR-compliant AI dashboard. After updates, it is fully tenant-aware.

Multi-tenancy: All MongoDB queries scoped by org_id from JWT
Auth: Validates Keycloak JWT (JWKS endpoint), maps org_roles to product roles
LiteLLM: Shared managed instance (Starter/Professional, API key per tenant) or customer-hosted (Enterprise, URL stored in tenant_products.config)
Role mapping:

Portal role CERTifAI role
IT_ADMIN Admin
CXO, USER Member
FINANCE, LEGAL Viewer

breakpilot-compliance

GDPR and AI-Act compliance automation platform. After updates, tenant identity comes from validated JWT — not raw client headers.

Multi-tenancy: All PostgreSQL queries scoped by tenant_id (= org_id from JWT)
Auth: Next.js proxy validates JWT → extracts org_id → sets X-Tenant-ID
Role mapping: LEGAL can approve DSFA; IT_ADMIN is compliance admin; USER contributes to DSR/VVT workflows


7. Plane 4 — Infra

Orchestration: Orca manages all containers on Hetzner VMs
Secrets: Infisical — every service has a machine identity, pulls its own secrets at startup
CI/CD: Gitea Actions → Docker build → push to private registry → Orca redeploy webhook
Routing: Orca-Proxy handles all TLS termination and subdomain routing

Orca-Proxy routing table:
  auth.breakpilot.com     → Keycloak
  erp.breakpilot.com      → ERPNext + Frappe HD    (IP-restricted)
  git.breakpilot.com      → Gitea
  secrets.breakpilot.com  → Infisical               (IP-restricted)
  *.breakpilot.com        → customer-portal         (wildcard, Host → tenant)

Services managed by Orca:

Identity & Auth
  └── Keycloak 26

Business Operations
  ├── ERPNext (Frappe)
  └── Frappe Helpdesk

Developer Tooling
  ├── Gitea
  └── Gitea Runner

Secrets
  └── Infisical

AI Inference
  └── LiteLLM  (shared, API key per tenant; or customer-hosted for Enterprise)

Customer Portal
  ├── customer-portal       (new)
  └── tenant-registry       (new)

Products
  ├── certifai-dashboard
  └── breakpilot-compliance stack
       ├── backend-compliance    (Python/FastAPI)
       ├── ai-compliance-sdk     (Go)
       └── admin-compliance      (Next.js)

Data Stores
  ├── PostgreSQL 17   [schemas: tenant_registry, compliance]
  ├── MongoDB         [CERTifAI]
  ├── Qdrant          [compliance RAG]
  └── MinIO           [compliance documents]

Infisical secret namespacing:

/prod/
  /keycloak/           DB_PASS, ADMIN_PASS, REALM_KEYS
  /erpnext/            DB_PASS, SMTP_PASS, OIDC_CLIENT_SECRET
  /customer-portal/    KEYCLOAK_CLIENT_SECRET, ERP_API_KEY, REGISTRY_DB_URI
  /tenant-registry/    POSTGRES_URI, KEYCLOAK_ADMIN_SECRET, ERP_API_KEY, STRIPE_SECRET
  /certifai/           MONGODB_URI, KEYCLOAK_CLIENT_SECRET, LITELLM_MASTER_KEY
  /compliance/         POSTGRES_URI, QDRANT_API_KEY, MINIO_KEYS, ANTHROPIC_API_KEY
  /litellm/            OPENAI_API_KEY, ANTHROPIC_API_KEY, MASTER_KEY
  /gitea-runner/       DOCKER_REGISTRY_PASS, ORCA_WEBHOOK_TOKEN

8. Process Sketches

P1 — New Customer Onboarding (Sales-Led)

  US (ERPNext)                  TENANT REGISTRY           KEYCLOAK
       │                              │                       │
       │  Lead → Opportunity          │                       │
       │  → Quotation (PDF sent)      │                       │
       │  → Sales Order submitted     │                       │
       │─────── webhook ─────────────►│                       │
       │                              │  create org ─────────►│
       │                              │◄──── org_id ──────────│
       │                              │  write tenant row     │
       │                              │  write tenant_products│
       │                              │  send welcome email   │
       │                              │       │               │
       │                              │       ▼               │
       │                         IT ADMIN receives email      │
       │                         clicks setup link            │
       │                              │               ┌───────┤
       │                              │               │set pw │
       │                              │               │  2FA  │
       │                              │               └───┬───┘
       │                              │                   │
       │                         lands on /acme/dashboard │
       │                              │                   │

P2 — User Login (Customer's Own IdP)

  USER                  ORCA-PROXY         PORTAL          KEYCLOAK       CUSTOMER IdP
   │                        │                │                │                │
   │  acme.breakpilot.com │                │                │                │
   │───────────────────────►│                │                │                │
   │                        │  Host=acme.*   │                │                │
   │                        │───────────────►│                │                │
   │                        │                │ slug=acme      │                │
   │                        │                │ lookup tenant  │                │
   │                        │                │ → idp=acme-okta│                │
   │                        │                │─── redirect ──►│                │
   │                        │                │   kc_idp_hint  │                │
   │                        │                │                │─── redirect ──►│
   │                        │                │                │                │
   │                        │◄─────────────────────── auth ──┤                │
   │                        │                │                │ issue JWT      │
   │                        │                │◄── JWT ────────│                │
   │◄── /acme/dashboard ────┤                │                │                │

P3 — User Login (Our IdP — email + password)

  USER               PORTAL            KEYCLOAK
   │                   │                  │
   │  acme.breakpilot  │                  │
   │──────────────────►│                  │
   │                   │ redirect + PKCE  │
   │                   │─────────────────►│
   │◄── Keycloak login page ─────────────┤
   │  enter email + password (+ TOTP)     │
   │─────────────────────────────────────►│
   │                   │◄── JWT ──────────│
   │◄── /acme/dashboard┤                  │

P4 — IT Admin Configures External IdP

  IT ADMIN           PORTAL          TENANT REGISTRY       KEYCLOAK
      │                │                   │                   │
      │  /settings/    │                   │                   │
      │  identity      │                   │                   │
      │───────────────►│                   │                   │
      │  fill OIDC/    │                   │                   │
      │  SAML details  │                   │                   │
      │───────────────►│                   │                   │
      │                │── PATCH idp_config►│                   │
      │                │                   │── create IdP ────►│
      │                │                   │   for org         │
      │                │                   │◄── ok ────────────│
      │                │                   │  verified=true    │
      │◄── "Test" btn ─┤                   │                   │
      │  auth popup ───────────────────────────────────────────►│
      │◄── success ────────────────────────────────────────────┤
      │◄── "IdP configured" ┤              │                   │

P5 — IT Admin Invites a Team Member

  IT ADMIN           PORTAL            KEYCLOAK          NEW USER
      │                │                   │                │
      │  /settings/    │                   │                │
      │  users → invite│                   │                │
      │  email + role  │                   │                │
      │───────────────►│                   │                │
      │                │ create user in org│                │
      │                │──────────────────►│                │
      │                │                   │ send invite    │
      │                │                   │ email ────────►│
      │                │                   │                │ click link
      │                │                   │◄── set pw ─────│
      │                │                   │    (+ TOTP)    │
      │                │                   │ issue JWT      │
      │                │◄─── JWT ──────────│                │
      │                │                   │         ┌──────┘
      │                │                   │  lands on│
      │                │                   │  /acme/dashboard
      │                │                   │  (role-filtered view)

P6 — Customer Accesses a Product

  USER              PORTAL         KEYCLOAK        PRODUCT (e.g. CERTifAI)
   │                  │                │                  │
   │  /acme/products/ │                │                  │
   │  certifai        │                │                  │
   │─────────────────►│                │                  │
   │                  │ check JWT:     │                  │
   │                  │ products claim │                  │
   │                  │ includes       │                  │
   │                  │ "certifai" ?   │                  │
   │                  │                │                  │
   │    [YES] ────────┤                │                  │
   │                  │ pass JWT ──────────────────────── │
   │                  │                │  validate JWKS   │
   │                  │                │  extract org_id  │
   │                  │                │  scope all data  │
   │◄── product UI ───┤                │                  │
   │                  │                │                  │
   │    [NO] ─────────┤                │                  │
   │◄── "Not in your plan" + upgrade CTA                  │

P7 — Finance User Views Billing

  FINANCE USER       PORTAL          ERPNEXT API        STRIPE API
       │               │                  │                  │
       │  /acme/billing│                  │                  │
       │──────────────►│                  │                  │
       │               │ role check:      │                  │
       │               │ FINANCE → ok     │                  │
       │               │                  │                  │
       │               │── fetch invoices►│                  │
       │               │◄── invoice list ─│                  │
       │               │                  │                  │
       │               │── fetch usage ────────────────────► │
       │               │◄── usage data ────────────────────── │
       │               │                  │                  │
       │◄── billing page renders          │                  │
       │    plan · usage · invoices       │                  │
       │    [Download PDF] ──────────────►│                  │
       │◄── PDF streamed ─────────────────│                  │
       │                                  │                  │
       │  [Upgrade Plan]                  │                  │
       │──────────────►│                  │                  │
       │               │ create CRM task in ERPNext          │
       │               │─────────────────►│                  │
       │               │ notify us (email/ERPNext task)      │
       │◄── "We'll be in touch" ──┤       │                  │
  LEGAL USER         PORTAL       TENANT REGISTRY    COMPLIANCE PRODUCT
       │               │                │                   │
       │  /acme/audit  │                │                   │
       │──────────────►│                │                   │
       │               │ role check:    │                   │
       │               │ LEGAL → ok     │                   │
       │               │                │                   │
       │               │── platform audit ──────────────────►
       │               │  (who logged in, role changes,      │
       │               │   IdP changes, impersonations)      │
       │               │◄── audit_log rows ─────────────────┤
       │               │                │                   │
       │               │── compliance audit ────────────────►
       │               │  (DSFA approvals, DSR processing,  │
       │               │   TOM completions)                  │
       │               │◄── compliance audit rows ──────────┤
       │               │                │                   │
       │  [Export]     │                │                   │
       │──────────────►│                │                   │
       │◄── ZIP:       │                │                   │
       │   platform-audit.csv          │                   │
       │   compliance-audit.pdf        │                   │

P9 — Support Ticket Escalated to Engineering

  CUSTOMER           FRAPPE HD          US (AGENT)          GITEA
      │                  │                  │                 │
      │  submit ticket   │                  │                 │
      │  via /support/   │                  │                 │
      │─────────────────►│                  │                 │
      │                  │── notify agent ─►│                 │
      │                  │                  │ triage ticket   │
      │                  │                  │ → technical bug │
      │                  │                  │                 │
      │                  │                  │ [Escalate]      │
      │                  │◄─────────────────│                 │
      │                  │ server script:   │                 │
      │                  │ POST /issues ────────────────────► │
      │                  │                  │  {title, body,  │
      │                  │                  │   labels:[bug]} │
      │                  │◄──── issue URL ───────────────────┤
      │                  │ store on ticket  │                 │
      │◄── "Escalated to engineering, we'll update you" ─────┤
      │                  │                  │                 │
      │                  │                  │  dev fixes it   │
      │                  │                  │  closes issue ─►│
      │                  │◄──────── webhook ──────────────────│
      │                  │ ticket → Resolved│                 │
      │◄── notification ─│                  │                 │

P10 — We Create a New Customer (Startup Flow)

  US (BACKSTAGE)     TENANT REGISTRY      KEYCLOAK       ERPNEXT       IT ADMIN
       │                   │                  │               │              │
       │  /backstage/      │                  │               │              │
       │  tenants/new      │                  │               │              │
       │  fill: name,      │                  │               │              │
       │  contact, plan,   │                  │               │              │
       │  products         │                  │               │              │
       │──── [Create] ────►│                  │               │              │
       │                   │── create org ───►│               │              │
       │                   │◄── org_id ───────│               │              │
       │                   │── create Customer ──────────────►│              │
       │                   │◄── erp_customer_id ──────────────│              │
       │                   │  write tenant rows                │              │
       │                   │  send welcome email ─────────────────────────── │
       │◄── tenant created ┤                  │               │              │
       │    "Awaiting setup"│                 │               │              │
       │                   │                  │               │    click link│
       │                   │                  │◄── set pw ────────────────── │
       │                   │                  │    + 2FA      │              │
       │                   │                  │ JWT issued    │              │
       │                   │◄─────────────────────────────────────── /acme/ ─┤

P11 — We Debug a Customer Issue (Impersonation)

  US (BACKSTAGE)      TENANT REGISTRY      KEYCLOAK         PORTAL (AS CUSTOMER)
        │                   │                  │                    │
        │  /backstage/      │                  │                    │
        │  tenants/acme/    │                  │                    │
        │  users →          │                  │                    │
        │  Impersonate Alice│                  │                    │
        │──────────────────►│                  │                    │
        │                   │ write audit_log  │                    │
        │                   │ {action:         │                    │
        │                   │  impersonate,    │                    │
        │                   │  actor: sharang, │                    │
        │                   │  target: alice}  │                    │
        │                   │── request token ►│                    │
        │                   │◄── imp. token ───│                    │
        │◄── token ─────────┤  (30min, signed, │                    │
        │                   │  impersonated_by │                    │
        │                   │  claim)          │                    │
        │                                      │                    │
        │ new tab: acme.breakpilot.com        │                    │
        │──────────────────────────────────────────────────────────►│
        │                                      │  [orange banner]   │
        │                                      │  Impersonating     │
        │                                      │  alice@acme.com    │
        │                                      │  29:47 remaining   │
        │  reproduce issue, identify root cause │                    │
        │──────────────────────────────────────────────────────────►│
        │                                      │  [Exit impersonation]

P12 — ERPNext Sales Order Activates a Tenant

  US (ERPNEXT)       ERPNEXT          TENANT REGISTRY      KEYCLOAK     IT ADMIN
       │               │                   │                   │            │
       │  Sales Order  │                   │                   │            │
       │  Submit ──── ►│                   │                   │            │
       │               │── webhook ───────►│                   │            │
       │               │   {order_id,      │                   │            │
       │               │    tenant_id,     │                   │            │
       │               │    products,      │                   │            │
       │               │    plan,          │                   │            │
       │               │    contract_start │                   │            │
       │               │    contract_end}  │                   │            │
       │               │                   │ tenant.status     │            │
       │               │                   │ = active          │            │
       │               │                   │ tenant_products   │            │
       │               │                   │ enabled=true      │            │
       │               │                   │── update claims ─►│            │
       │               │                   │   (protocol mapper│            │
       │               │                   │    picks up new   │            │
       │               │                   │    entitlements)  │            │
       │               │                   │── send email ─────────────────►│
       │               │                   │   "Subscription   │            │
       │               │                   │    now active"    │            │

P13 — Customer Browses Catalog and Requests a New Product

  USER (any role)    PORTAL              TENANT REGISTRY      ERPNEXT (CRM)
       │               │                       │                      │
       │ /acme/catalog │                       │                      │
       │──────────────►│                       │                      │
       │               │── GET /catalog ──────►│                      │
       │               │◄── product manifests +│                      │
       │               │    subscribed status  │                      │
       │◄── catalog page                       │                      │
       │   • CERTifAI    [✓ Subscribed]        │                      │
       │   • Compliance  [✓ Subscribed]        │                      │
       │   • Notetaker   [+ Request]           │                      │
       │   • Classifier  [+ Request]           │                      │
       │                                       │                      │
       │ click [Request] on Notetaker          │                      │
       │ Modal: "Why do you want this?"        │                      │
       │ + estimated seats / volume            │                      │
       │──────────────►│                       │                      │
       │               │── POST /catalog/      │                      │
       │               │     request ─────────►│                      │
       │               │   {tenant, product,   │                      │
       │               │    requested_by, note}│                      │
       │               │                       │── create CRM Lead ──►│
       │               │                       │   linked to Customer │
       │               │                       │◄── lead_id ──────────│
       │               │                       │  notify sales_owner  │
       │               │                       │  (email + ERPNext    │
       │               │                       │   activity)          │
       │◄── "We'll be in touch within 1 day" ──│                      │

P14 — Sales Rep Demos to a Prospect (Shared Demo Tenant)

  SALES REP        KEYCLOAK          PORTAL              DEMO TENANT
      │               │                 │                     │
      │ open Zoom with prospect, share screen                 │
      │                                                       │
      │ demo.breakpilot.com                                 │
      │────────────────────────────────►│                     │
      │               │                 │ Host: demo          │
      │               │                 │ → slug = demo       │
      │               │                 │ → tenant.kind=demo  │
      │               │                 │ tenant.status=demo  │
      │               │                 │                     │
      │               │ OIDC redirect   │                     │
      │◄──────────────│─────────────────│                     │
      │ login sales@breakpilot                                │
      │ realm_role=SALES_REP                                  │
      │──────────────►│                 │                     │
      │               │ verify SALES_REP allowed on demo only │
      │               │ issue JWT:                            │
      │               │   org_id=demo, org_roles=[IT_ADMIN],  │
      │               │   realm_roles=[SALES_REP],            │
      │               │   products=[certifai, compliance]     │
      │◄──────────────│                 │                     │
      │                                 │                     │
      │ /demo/dashboard ───────────────►│                     │
      │ /demo/products/certifai        ─►│ load custom elt   ►│
      │ /demo/products/compliance      ─►│ load custom elt   ►│
      │◄── show prospect every flow ────│                     │
      │                                 │                     │
      │ if prospect interested:                               │
      │   click [Request Trial] in /demo/catalog              │
      │   modal: prospect email, company, est. seats          │
      │   → POST /catalog/trial-request                       │
      │     creates CRM Lead in ERPNext, NOT a tenant         │
      │     sales_owner = the logged-in SALES_REP             │
      │                                                       │
      │ 03:00 nightly:                                        │
      │   cron → product /v1/tenants/demo/reset               │
      │   fixtures from catalog.seed_data_url restored        │
      │   demo is clean for next day                          │

Guardrails:

  • Keycloak policy: SALES_REP realm role MUST NOT be issued a token with org_id ≠ demo
  • Backstage policy: SALES_REP CANNOT see real-tenant data, CAN see CRM (their leads)
  • Real customer support is NEVER done from a SALES_REP login

P15 — Self-Serve Trial → Convert or Expire

  PROSPECT          PORTAL              TENANT REGISTRY      ERPNEXT     KEYCLOAK
      │               │                       │                  │           │
      │ breakpilot.com/start                │                  │           │
      │──────────────►│                       │                  │           │
      │ form: email, company, password        │                  │           │
      │──────────────►│                       │                  │           │
      │               │── POST /trials ──────►│                  │           │
      │               │   {email, company,    │                  │           │
      │               │    requested_products} │                 │           │
      │               │                       │                  │           │
      │               │                       │ slugify(company) │           │
      │               │                       │ create tenant    │           │
      │               │                       │  status=trial    │           │
      │               │                       │  trial_ends_at = │           │
      │               │                       │    now + 14d     │           │
      │               │                       │ create Customer ►│           │
      │               │                       │   tier=Trial     │           │
      │               │                       │   sales_owner=   │           │
      │               │                       │     unassigned   │           │
      │               │                       │── create org ───────────────►│
      │               │                       │   + IT_ADMIN user            │
      │               │                       │   + verify email             │
      │               │                       │                  │           │
      │◄── magic link │                       │                  │           │
      │ click link, set password              │                  │           │
      │ land on /acme-trial/dashboard         │                  │           │
      │ banner: "Trial: 14 days left — Add billing to keep your data"        │
      │                                       │                  │           │
      │ ── customer uses platform normally ──                    │           │
      │                                       │                  │           │
      │ DAY 7 cron: trial_ends_at - 7d        │                  │           │
      │   → email IT_ADMIN + CXO              │                  │           │
      │   → CRM Activity: "Day-7 nudge"      ►│                  │           │
      │                                       │                  │           │
      │ DAY 12: same, urgent tone             │                  │           │
      │ DAY 14: trial_ends_at reached         │                  │           │
      │                                       │                  │           │
      │   IF customer added payment:          │                  │           │
      │     status: trial → active            │                  │           │
      │     Stripe subscription created       │                  │           │
      │     OR Sales Order in ERPNext signed  │                  │           │
      │     banner removed                    │                  │           │
      │                                                                     │
      │   ELSE:                                                             │
      │     status: trial → frozen            │                  │           │
      │     30-day grace: portal read-only, products return 402  │           │
      │     daily reminder email until day 44                    │           │
      │                                                                     │
      │ DAY 44: frozen → archived             │                  │           │
      │   GDPR export ZIP emailed to IT_ADMIN │                  │           │
      │   each product called: DELETE /v1/tenants/{id}/data      │           │
      │   30 days later: tenant row deleted (audit_log retained 7y)         │

Trial scoping:

  • All paid products are available in trial mode by default unless catalog.available_on_plans excludes trial
  • Usage-billed products (e.g., LiteLLM calls) get a hard cap during trial (manifest: trial_quota)
  • Customer can upgrade plan mid-trial; trial timer just stops, no proration

P16 — Customer Cancels and Offboards

  IT ADMIN           PORTAL              TENANT REGISTRY      PRODUCTS        ERPNEXT
       │               │                       │                  │              │
       │ /acme/settings/billing                │                  │              │
       │──────────────►│                       │                  │              │
       │ [Cancel Subscription]                 │                  │              │
       │ Modal:                                │                  │              │
       │  • reason (dropdown)                  │                  │              │
       │  • confirm typing "acme"              │                  │              │
       │  • shows: data retained 30d, then deleted                │              │
       │──────────────►│                       │                  │              │
       │               │── POST /tenants/      │                  │              │
       │               │     acme/cancel ─────►│                  │              │
       │               │                       │ status: active   │              │
       │               │                       │   → frozen       │              │
       │               │                       │ frozen_at = now  │              │
       │               │                       │ delete_at =      │              │
       │               │                       │   now + 30d      │              │
       │               │                       │── Stripe cancel  │              │
       │               │                       │   at_period_end  │              │
       │               │                       │── opportunity ──────────────────►│
       │               │                       │   stage=Lost     │              │
       │               │                       │   reason=...     │              │
       │               │                       │── notify sales_owner            │
       │               │                       │   (could reach out for save)    │
       │◄── confirmation page                  │                  │              │
       │   "Frozen until <date>. Download your data anytime."     │              │
       │                                       │                  │              │
       │ frozen state:                         │                  │              │
       │   portal works READ-ONLY              │                  │              │
       │   /export available (all products)    │                  │              │
       │   product APIs return 402 on writes   │                  │              │
       │                                       │                  │              │
       │ if customer changes mind within 30d:                     │              │
       │   [Reactivate] → status: frozen → active                 │              │
       │   no data loss                                           │              │
       │                                       │                  │              │
       │ DAY 30 cron:                          │                  │              │
       │   tenant.delete_at reached            │                  │              │
       │   build final export ZIP per product ►│ /v1/tenants/{id}/export        │
       │   email ZIP link to IT_ADMIN + CXO    │                  │              │
       │     (signed URL, 7-day TTL)           │                  │              │
       │   for each product:                                      │              │
       │     DELETE /v1/tenants/{id}/data ────►│                  │              │
       │   Keycloak: org archived, users disabled                 │              │
       │   ERPNext Customer: status=Inactive                      │              │
       │   tenant.status = archived                               │              │
       │                                       │                                 │
       │ audit_log retained 7y per GDPR / accounting              │              │

Self-serve vs. enterprise:

  • Stripe-billed customers cancel in-portal; flow above
  • ERPNext-billed (enterprise) customers send written notice; sales rep updates Sales Order; flow runs from /backstage/tenants/[id]/lifecycle with the same downstream effects

Headless Product Flows

P1P13 cover interactive products that ship a UI. Products declared as frontend.type = headless (see PRODUCT_INTEGRATION_SPEC.md §5) ship no frontend code — customers configure them through a portal-rendered UI and consume them via API/MCP from their own systems. Examples: a notetaker bot, a document classifier, a webhook router, a compliance reporter.

The portal still hosts these products end-to-end: the customer area, billing, audit, and Backstage all work the same. Only the "use the product" surface changes from a UI to API keys + webhooks.

H1 — Customer Enables a Headless Product

  IT ADMIN          PORTAL              TENANT REGISTRY      HEADLESS PRODUCT
      │               │                       │                      │
      │ /acme/products│                       │                      │
      │ /notetaker    │                       │                      │
      │──────────────►│                       │                      │
      │               │ load manifest         │                      │
      │               │ frontend.type =       │                      │
      │               │ "headless"            │                      │
      │               │                       │                      │
      │               │ render portal-owned   │                      │
      │               │ config page from      │                      │
      │               │ manifest sections:    │                      │
      │               │  • API Keys           │                      │
      │               │  • Webhooks           │                      │
      │               │  • Usage chart        │                      │
      │               │  • Docs link          │                      │
      │               │  • Code samples       │                      │
      │◄── page ──────│                       │                      │

H2 — Generate API Key for a Headless Product

  IT ADMIN          PORTAL              TENANT REGISTRY      HEADLESS PRODUCT
      │               │                       │                      │
      │ [Generate Key]│                       │                      │
      │  name: "prod" │                       │                      │
      │  scopes:[r,w] │                       │                      │
      │──────────────►│                       │                      │
      │               │── POST /api-keys ────►│                      │
      │               │   {tenant, scopes,    │                      │
      │               │    product, name}     │                      │
      │               │                       │ generate raw key     │
      │               │                       │ store HASH only      │
      │               │                       │ bind: tenant +       │
      │               │                       │       product +      │
      │               │                       │       scopes         │
      │               │◄── raw key (once) ────│                      │
      │◄── show once ─│                       │                      │
      │ "Copy now —   │                       │                      │
      │  won't show   │                       │                      │
      │  again"       │                       │                      │

H3 — Customer's System Calls the Headless Product

  CUSTOMER SYSTEM         HEADLESS PRODUCT       TENANT REGISTRY
        │                       │                      │
        │ POST /v1/sessions     │                      │
        │ Auth: ApiKey k_xxx    │                      │
        │ X-Tenant: acme        │                      │
        │──────────────────────►│                      │
        │                       │ validate key ───────►│
        │                       │  → tenant_id,        │
        │                       │    scopes            │
        │                       │◄─────────────────────│
        │                       │ enforce scope        │
        │                       │ tenant_id in EVERY   │
        │                       │   DB query           │
        │                       │ process request      │
        │                       │ emit usage ─────────►│
        │                       │ emit audit ─────────►│
        │◄── 200 response ──────│                      │

H4 — Async Result Delivered via Webhook

  HEADLESS PRODUCT      CUSTOMER WEBHOOK URL      PORTAL (delivery log)
        │                         │                        │
        │ async job finishes      │                        │
        │ load webhook config     │                        │
        │ for this tenant +       │                        │
        │ this event type         │                        │
        │                         │                        │
        │ POST customer URL ─────►│                        │
        │ Body: {event, result,   │                        │
        │  tenant, signature}     │                        │
        │◄── 200 ─────────────────│                        │
        │ log delivery ────────────────────────────────────►
        │   (success/fail, ts,    │                        │
        │    response code)       │                        │
        │                         │                        │
        │ if delivery fails:      │                        │
        │   retry with backoff    │                        │
        │   3 attempts, then      │                        │
        │   dead-letter           │                        │
        │   visible in portal at  │                        │
        │   /webhooks/deliveries  │                        │

H5 — Headless Product Tile on Customer Dashboard

  USER              PORTAL              TENANT REGISTRY      HEADLESS PRODUCT
   │                  │                       │                      │
   │ /acme/dashboard  │                       │                      │
   │─────────────────►│                       │                      │
   │                  │ for each entitled     │                      │
   │                  │ product in JWT:       │                      │
   │                  │                       │                      │
   │                  │  type=interactive →   │                      │
   │                  │  render "Open" tile   │                      │
   │                  │                       │                      │
   │                  │  type=widget →        │                      │
   │                  │  load widget bundle   │                      │
   │                  │  render custom elt    │                      │
   │                  │                       │                      │
   │                  │  type=headless →      │                      │
   │                  │  GET /v1/usage ─────────────────────────────►│
   │                  │◄────────────────────── usage summary ────────│
   │                  │  render stat tile:    │                      │
   │                  │   "Notetaker          │                      │
   │                  │    142 sessions       │                      │
   │                  │    last 30d"          │                      │
   │                  │   click → goes to     │                      │
   │                  │   /products/notetaker │                      │
   │◄── dashboard ────│                       │                      │

H6 — Backstage Operates a Headless Product

  US (BACKSTAGE)     PORTAL              TENANT REGISTRY      HEADLESS PRODUCT
       │               │                       │                      │
       │ /backstage/   │                       │                      │
       │ tenants/acme/ │                       │                      │
       │ products/     │                       │                      │
       │ notetaker     │                       │                      │
       │──────────────►│                       │                      │
       │               │ NO "Impersonate" btn  │                      │
       │               │ (no UI to enter)      │                      │
       │               │                       │                      │
       │               │ shows:                │                      │
       │               │  • Health             │                      │
       │               │  • Usage 30/90d       │                      │
       │               │  • API call errors    │                      │
       │               │  • Webhook deliveries │                      │
       │               │  • Failed deliveries  │                      │
       │               │  • Admin actions from │                      │
       │               │    manifest:          │                      │
       │               │     [Flush queue]     │                      │
       │               │     [Rotate keys]     │                      │
       │               │     [Reset state]     │                      │
       │ [Flush queue] │                       │                      │
       │──────────────►│                       │                      │
       │               │── service token ─────►│                      │
       │               │   POST /admin/flush ─────────────────────────►
       │               │                       │ audit event          │
       │               │◄─────────────────────────────────── ok ───── │
       │◄── done ──────│                       │                      │

9. Technology Decisions (Locked)

Decision Choice Rationale
Identity Keycloak, single realm Already in CERTifAI; Organizations + IdP brokering built-in
Tenant model Keycloak Organization per customer Native isolation, JWT claims, no custom multi-tenant auth code
Subdomain routing Orca-Proxy, wildcard cert Consistent with existing infra; tenant from Host header
Secret management Infisical, machine identity per service Uniform across all services; path-namespaced per service
Business operations ERPNext (Frappe) CRM + sales + invoicing + HR in one; avoids building our own
Customer support Frappe Helpdesk Same Frappe bench as ERPNext; native customer-ticket-account link
Engineering issues Gitea Issues Already running Gitea; Frappe HD → Gitea via REST API (server script)
Data isolation Logical (tenant_id / org_id columns) Sufficient for Starter/Professional; physical isolation offered for Enterprise
Billing — self-serve Stripe (Starter, Professional) Standard; portal billing page reads Stripe
Billing — enterprise ERPNext Sales Invoices Manual invoicing, DATEV export for accountant
Customer portal New Next.js 15 app Clean slate; existing admin apps have product-specific chrome
Tenant Registry New Go service Thin glue layer; owns entitlements, IdP config, audit log
Products scope CERTifAI + breakpilot-compliance only Dataroom and pitch-deck out of scope

10. Open Items / Phasing

Phase 0 — Foundation (pilot-ready, one real customer)

  • Orca-Proxy: wildcard TLS, subdomain routing table
  • Infisical: machine identities + secrets for all existing services
  • Keycloak: Organizations enabled, realm roles (incl. SALES_REP), one test org
  • Tenant Registry: core schema + API (/tenants CRUD + /activate), status enum
  • Backstage minimal: create tenant form, tenant list, impersonation
  • Portal login: subdomain detection → Keycloak OIDC → tenant context
  • CERTifAI: MongoDB-backed sessions, org_id query scoping, role enforcement
  • breakpilot-compliance: JWT → X-Tenant-ID validated at Next.js proxy
  • Demo tenant demo seeded; sales rep can log in and walk a screen-shared prospect

Phase 1 — Customer-Facing Portal

  • Full customer dashboard, product tiles, usage summary
  • User management and invite flow
  • IdP configuration wizard (OIDC + SAML)
  • Billing page (ERPNext invoices + Stripe usage)
  • Audit log page and CSV/PDF export
  • Frappe HD embedded in /[slug]/support/

Phase 2 — Business Operations

  • ERPNext configured: CRM, Sales Orders, Invoicing, HR
  • ERPNext → Tenant Registry webhook (Sales Order submit → tenant activate)
  • Frappe HD → Gitea escalation (server script)
  • Backstage health dashboard (service health, incidents)
  • Keycloak protocol mapper (products + plan injected into JWT)
  • Self-serve trial flow P15: /start form, 14-day timer, day-7/12/14 emails, trial → frozen → archived state machine
  • Cancel + offboard flow P16: cancel modal, 30-day frozen window, automated final-export ZIP, GDPR erasure call to every product
  • Demo nightly reset: cron at 03:00 Europe/Berlin calls each product's /v1/tenants/demo/reset

Phase 3 — Product API Surface

  • CERTifAI: OpenAPI spec, /api/v1/health + /api/v1/usage
  • breakpilot-compliance: OpenAPI spec, /api/v1/usage
  • Customer-facing API keys (IT Admin generates, scoped to their org)
  • LiteLLM per-tenant API key metering → usage data in portal

Phase 4 — Enterprise Tier

  • Physical data isolation option (dedicated PostgreSQL schema per tenant)
  • Customer-hosted LiteLLM (URL stored in tenant_products.config)
  • Custom domain support (compliance.acme.com → Orca-Proxy → portal)
  • MCP servers per product (CERTifAI MCP, compliance MCP)
  • SLA enforcement in Frappe HD per plan tier

End of document. Updated after design review 2026-05-11.