Files
docs/IMPLEMENTATION_PLAN.md
sharang 03a5b4846e
ci / shared (push) Successful in 4s
chore(domain): yourplatform.com → breakpilot.com
Apply platform-domain decision (2026-05-18). No services touched; docs/config only.

Refs: M1.1
2026-05-18 20:28:41 +00:00

44 KiB
Raw Permalink Blame History

Implementation Plan — Breakpilot Platform

Companion to PLATFORM_ARCHITECTURE.md, INFRASTRUCTURE.md, and PRODUCT_INTEGRATION_SPEC.md.

This is the build plan for an AI coding agent (Claude Code, executing PRs against the listed repos). Each milestone is sized to fit in 13 PRs, ships independently, and leaves the system in a working state.


0. How to read this document

  • Milestones are named M{phase}.{n} and grouped by phase.
  • Each milestone has: Goal, Depends on, Repos/files, Deliverables, Acceptance, Tests, Gate, Effort (S = ≤1 day, M = 24 days, L = ≥1 week).
  • "Gate" is who/what approves the PR for merge. Standard is 1 human reviewer + green CI; some milestones add a manual sign-off.
  • Phases are ordered; milestones within a phase can be parallelised where dependencies allow.
  • The dependency graph at §11 is the source of truth — when in doubt, read it.

1. Cross-cutting conventions (apply to every PR in every repo)

1.1 Repo strategy

Polyrepo under a new Gitea org gitea.meghsakha.com/platform/. One repo per deployable unit. Existing product repos stay where they are.

Repos to create:

Repo Purpose Created in
platform/orca-platform IaC for VMs, Orca manifests, DNS, TLS, backups M1.1
platform/tenant-registry Go service: tenant glue, audit, API keys M4.1
platform/portal Next.js 15: customer area + backstage M5.1
platform/docs Architecture, integration spec, this plan, runbooks M0.1
platform/seed-data Demo tenant fixtures per product M13.1
platform/design-tokens CSS variables / fonts (consumed by product web comps) M5.1

Existing repos that get changes (no new repos):

  • benjamin_boenisch/certifai — M6.1 / M6.2 / M6.3
  • benjamin_boenisch/breakpilot-compliance — M7.1 / M7.2

1.2 Per-repo scaffolding (must exist before any feature work)

Every new repo lands in M0.1 with:

/README.md              what this repo is, how to run, links to architecture
/CONTRIBUTING.md        branch model, commit format, how to open a PR
/CODEOWNERS             at least one mandatory reviewer (us)
/.gitea/
  /pull_request_template.md
  /issue_template/
    bug.md
    feature.md
/.gitea/workflows/
  ci.yaml               fmt → lint → test → build (per-language details in M0.2)
  release.yaml          on tag: build image, push to registry
/CHANGELOG.md           generated from conventional commits
/LICENSE                MIT for portal/docs; Apache-2.0 for libraries

1.3 Branch + commit conventions

  • Trunk-based. main is always deployable. Feature branches: feat/<short-slug>, fix/<short-slug>, chore/<short-slug>. Max lifetime 5 days.
  • Conventional Commits (feat:, fix:, chore:, docs:, refactor:, test:, breaking!:). Enforced by commitlint in CI.
  • Squash-merge to main. PR title becomes the commit message.
  • Direct push to main is blocked by Gitea branch protection.

1.4 PR template (in every repo)

## What
<1-3 bullets>

## Why
<link to architecture section, milestone ID, or issue>

## How
<implementation notes for the reviewer>

## Test plan
- [ ] unit
- [ ] integration (if API surface changed)
- [ ] e2e (if user-facing flow changed)
- [ ] manual smoke on stage

## Risk
<what could break, blast radius, rollback plan>

## Linked milestone
M{phase}.{n}

1.5 CI checks (required before merge, configured in M0.2)

Per language defaults:

Stack Required checks
Go go fmt -l (no diff), go vet, golangci-lint, go test ./..., go build
Rust cargo fmt --check, cargo clippy -- -D warnings, cargo test -j8
TypeScript pnpm lint, pnpm typecheck, pnpm test, pnpm build
Python ruff check, ruff format --check, mypy, pytest
All commitlint, image build, container scan (trivy), SBOM upload

1.6 Approval gates

  • Standard gate (most milestones): 1 human reviewer approves + all CI checks green. Enforced by Gitea branch protection on main.
  • CODEOWNERS auto-requests the right reviewer based on path.
  • Production-promotion gate (release tags only): manual sign-off by @sharang on the release issue + stage soak ≥ 24h.
  • Security gate (M2.x, M4.x, M14.x): security checklist in PR body completed.

1.7 Versioning + release strategy

  • Semver per repo. Container images carry three tags: :sha-<short>, :v1.4.2, :env-stage / :env-prod.
  • Stage auto-deploys on every merge to main (Gitea Actions → Orca apply against stage cluster).
  • Production deploys only when a release tag vX.Y.Z is created. Tag creation requires the production-promotion gate.
  • Rollback: orca rollout undo <service> flips back to previous image tag. RTO target ≤ 5 min for any single service.
  • Database migrations are forward-only and run as an init container before the service starts. Migrations that delete columns require two releases (1: stop writing, 2: drop).

1.8 Environments

Three Orca clusters, all on the same hardware until volume justifies separation:

Env Cluster name Purpose Data Auto-deploy?
dev local Developer machine, docker-compose fixtures n/a
stage orca-stage Pre-prod validation seeded demo + synthetic customers yes (on merge to main)
prod orca-prod Live customer traffic real tag + gate

Domain pattern:

  • dev: *.localhost (mkcert)
  • stage: *.stage.breakpilot.com
  • prod: *.breakpilot.com

1.9 Observability + audit

  • SigNoz (already running at signoz.meghsakha.com) for traces, logs, metrics. Every service ships OTel SDK from day one.
  • Audit events in the Retraced-shape schema (PRODUCT_INTEGRATION_SPEC.md §8.4) emitted to Tenant Registry /audit from every service. Required for every state-changing endpoint.
  • Structured logs (JSON) only. No fmt.Println / console.log in committed code; CI rejects.

1.10 Secrets

  • Infisical machine identity per service, path /{env}/{service}/.
  • The only secret allowed in an Orca env file is the Keycloak DB URI (bootstrap exception — see INFRASTRUCTURE.md).
  • CI scans for committed secrets via gitleaks. Failures block merge.

1.11 Testing policy (mandatory; see also feedback_testing_everything)

  • Unit: every non-trivial function.
  • Integration: every API endpoint, against real Postgres/MongoDB via testcontainers. No mock databases.
  • E2E: every user-facing flow has at least one Playwright spec running against stage post-deploy.
  • Regression: when a bug is fixed, a failing test is added FIRST, then the fix.
  • No PR ships without tests. "Manual tested" is not acceptable except for IaC.

1.12 A/B testing (designed for, adopted later)

Every place where a future flag would gate behaviour MUST flow through a single featureFlags.evaluate(tenantId, flagKey) function. Initial implementation returns hard-coded values from manifest.yaml. Swap to Unleash/OpenFeature in M19.1 with zero call-site changes.


2. Phase 0 — Foundations (M0.x M3.x)

Goal: Repos exist, CI works, infra is provisioned and observable, identity + secrets are usable. No customer-visible features yet.

M0.1 — Bootstrap repos and docs

  • Depends on: nothing
  • Repos: platform/docs, platform/orca-platform, platform/portal, platform/tenant-registry, platform/design-tokens, platform/seed-data
  • Deliverables: create the Gitea platform org; for each repo add the §1.2 scaffolding; platform/docs ingests the existing PLATFORM_ARCHITECTURE.md, INFRASTRUCTURE.md, PRODUCT_INTEGRATION_SPEC.md, this plan.
  • Acceptance: every repo has a working README.md, CONTRIBUTING.md, CODEOWNERS, PR template.
  • Tests: n/a
  • Gate: standard
  • Effort: S

M0.2 — CI templates + branch protection

  • Depends on: M0.1
  • Repos: all of the above
  • Deliverables: .gitea/workflows/ci.yaml per repo (matching §1.5 by stack), Gitea branch protection on main (require PR, 1 review, status checks green, no direct push), commitlint, gitleaks, trivy configured.
  • Acceptance: a deliberately-broken PR is rejected by every check; a clean PR is mergeable.
  • Tests: smoke PR per repo demonstrating green CI.
  • Gate: standard
  • Effort: S

M0.3 — Self-hosted DNS + wildcard TLS

  • Depends on: M1.2 (vm-edge must exist before PowerDNS lands)
  • Repos: platform/orca-platform
  • Deliverables:
    • PowerDNS Authoritative on vm-edge (Orca-managed). PostgreSQL backend on same VM (small; ~100 records).
    • At the registrar (Benjamin's account): set ns1.breakpilot.com and ns2.breakpilot.com glue records pointing at vm-edge public IP; delegate the domain to those NS.
    • Zone file committed in orca-platform/dns/breakpilot.com.zone; Orca syncs into PowerDNS on apply.
    • Records: apex breakpilot.com, wildcards *.breakpilot.com + *.stage.breakpilot.com, plus auth., erp., mcp., cdn., mail., ns1., ns2., SPF/DKIM/DMARC TXT records (for M3.2).
    • Wildcard TLS via Let's Encrypt DNS-01 against PowerDNS (Lego's --dns=pdns provider); ACME credentials in Infisical at /prod/orca-proxy/PDNS_API_KEY.
    • Orca-Proxy reloads the cert via watch on the secret file; renewal cron runs at 02:00 daily.
  • Acceptance: dig @1.1.1.1 anything.breakpilot.com returns an answer; curl https://anything.breakpilot.com returns 404 from Orca-Proxy (no TLS error).
  • Tests: ACME renewal dry-run; PowerDNS zone-diff check in CI; reach via stage and prod subdomains; cert expiry page wired to SigNoz alert.
  • Gate: standard + manual DNS-delegation check by both founders (irreversible from registrar side without 2448h propagation)
  • Effort: M (was S — registrar delegation + PowerDNS adds setup time vs. Cloudflare)

M1.1 — orca-platform repo (IaC)

  • Depends on: M0.1, M0.2
  • Repos: platform/orca-platform
  • Deliverables: directory layout per INFRASTRUCTURE.md; one Orca manifest per VM × service; per-env overlays (overlays/dev, overlays/stage, overlays/prod); a Makefile with make plan / make apply per env.
  • Acceptance: make plan ENV=stage produces a no-op diff once applied.
  • Tests: orca validate runs in CI; PRs that break a manifest fail.
  • Gate: standard
  • Effort: M

M1.2 — Provision VMs (locked topology)

  • Depends on: M1.1 (Orca manifest layout)
  • Repos: platform/orca-platform
  • Deliverables: the 4 VMs from INFRASTRUCTURE.md §1 provisioned on SysEleven (DUS2):
    • stage (m2.small, public IP) — runs app-plane code only, calls prod KC + Stalwart
    • vm-edge (m2.small, public IP) — Identity + Infra planes (orca-proxy, PowerDNS, Keycloak, pg-keycloak, Infisical, pg-infisical, Gitea)
    • vm-control (m2.medium) — Control plane (portal, tenant-registry, ERPNext, Frappe HD, MariaDB, Stalwart)
    • vm-data (m2.medium) — Data plane (CERTifAI, MongoDB, LiteLLM, compliance ×3, pg-app, Qdrant, MinIO)
    • Private network 10.0.0.0/16 between all four. Public ingress only via vm-edge (and stage's own IP for tester access).
    • SSH disabled; only orca exec for shell access.
  • Acceptance: every VM reachable from Orca control plane; private-network connectivity verified; resource limits per service set in manifest per INFRASTRUCTURE.md §6 co-tenant notes.
  • Tests: cold-start sequence from INFRASTRUCTURE.md §10 Scenario F runs successfully on stage VMs.
  • Gate: standard + manual sign-off (touches infra spend and 36M commitment decision)
  • Effort: M
  • Cost impact: see COST_PLAN.md §3. Initial run: ~€552/mo On-Demand, dropping to ~€310/mo after 36M-upfront commit in Month 4.

M1.3 — Backups, monitoring, on-call

  • Depends on: M1.2
  • Repos: platform/orca-platform
  • Deliverables: backup cron per VM per INFRASTRUCTURE.md §3 (Postgres pg_dump, MinIO bucket replication); SigNoz OTel collector running on every VM; alert routing to oncall@breakpilot.com; restore runbook in platform/docs/runbooks/restore.md.
  • Acceptance: restore drill on stage succeeds (script in platform/orca-platform/scripts/restore-drill.sh); SigNoz shows traces from a synthetic request.
  • Tests: disaster-recovery exercise per failure scenario in INFRASTRUCTURE.md §10 — at least Scenarios A, B, F validated on stage.
  • Gate: standard + manual sign-off
  • Effort: L

M2.1 — Keycloak deployment

  • Depends on: M1.2, M1.3
  • Repos: platform/orca-platform
  • Deliverables: Keycloak 26 on vm-identity, Postgres backing store on vm-control, exposed at auth.breakpilot.com and auth.stage.breakpilot.com. Realm import file in orca-platform/keycloak/realm-export.json (committed, source-of-truth).
  • Acceptance: master admin login works; realm breakpilot-prod exists in both envs.
  • Tests: automated realm-state diff in CI (kcadm against checked-in export).
  • Gate: standard + security checklist
  • Effort: M

M2.2 — Realm configuration: roles + protocol mappers + Organizations

  • Depends on: M2.1
  • Repos: platform/orca-platform (realm config)
  • Deliverables: Organizations feature enabled; realm roles BREAKPILOT_ADMIN, SUPPORT_ENGINEER, SALES_REP; org roles IT_ADMIN, CXO, FINANCE, LEGAL, USER; protocol mapper that calls Tenant Registry at token issuance for products, plan, tenant_status claims; SALES_REP guardrail policy (token only issuable with org_id = demo).
  • Acceptance: a test user gets the expected JWT claims; a SALES_REP user cannot get a JWT for a non-demo org (verified by integration test).
  • Tests: Keycloak integration suite in platform/tenant-registry/test/keycloak_test.go.
  • Gate: standard + security checklist
  • Effort: M

M3.1 — Infisical

  • Depends on: M1.2
  • Repos: platform/orca-platform
  • Deliverables: Infisical on vm-secrets, machine identity per service, secret paths laid out per PRODUCT_INTEGRATION_SPEC.md §9.4.
  • Acceptance: a stub service can read its secrets at startup; rotating a secret in Infisical UI is picked up on next pod start.
  • Tests: smoke test container reads secrets.
  • Gate: standard + security checklist
  • Effort: S

M3.2 — Stalwart transactional email

  • Depends on: M0.3 (needs DNS records under our control), M3.1
  • Repos: platform/orca-platform
  • Deliverables:
    • Stalwart on vm-control (Orca-managed); reachable at mail.breakpilot.com.
    • DNS records added to the zone in M0.3: mail A record, MX → mail, SPF (v=spf1 mx -all), DKIM (Stalwart-generated public key), DMARC (p=quarantine; rua=mailto:dmarc@breakpilot.com), reverse DNS (PTR) configured at the cloud provider for the vm-control public IP — coordinate with vm-edge since outbound mail must egress from a host with a clean PTR.
    • SMTP submission service account per platform sender: noreply@, oncall@, support@, billing@, dmarc@.
    • Outbound queue and bounce handler; failed deliveries surface as audit events.
    • Webhook receiver at /inbound/postmaster for bounce/complaint feedback loops (Gmail FBL, MS SNDS).
    • IP warming plan: write a platform/docs/runbooks/email-warming.md documenting the 48 week ramp from low daily volumes; first 2 weeks of trial nudges (M12.2) explicitly throttled.
  • Acceptance: test email from noreply@breakpilot.com to parnerkarsharang@gmail.com lands in inbox (not spam) on day 1; SPF/DKIM/DMARC all "pass" in Gmail's "show original" view; mail-tester.com score ≥ 9/10.
  • Tests: automated daily mail-tester check (failure pages on-call); bounce-handling integration test.
  • Gate: standard + security checklist + manual deliverability sign-off (DKIM keys are load-bearing)
  • Effort: L (deliverability tuning is the long tail)

Phase 0 exit criteria:

  • Stage cluster boots cold from cron-driven nightly stop/start using only INFRASTRUCTURE.md §5 ordering.
  • A synthetic HTTPS request to https://hello.stage.breakpilot.com reaches a stub container.
  • Restore drill on stage Postgres succeeds end-to-end.

3. Phase 1 — Control plane core (M4.x M5.x)

Goal: Tenant Registry stores tenants; the portal authenticates a user and resolves their tenant. No products surfaced yet.

M4.1 — Tenant Registry: schema + migrations

  • Depends on: M1.2, M2.2
  • Repos: platform/tenant-registry
  • Deliverables: Go service scaffold; golang-migrate migrations for tenants, tenant_projects, tenant_products, tenant_idp_config, api_keys, audit_log per PLATFORM_ARCHITECTURE.md §5c; the tenant.status enum + tenant.kind column from the lifecycle spec.
  • Acceptance: make migrate-up on a fresh Postgres produces the documented schema.
  • Tests: migration up/down round-trip via testcontainers-go.
  • Gate: standard
  • Effort: M

M4.2 — Tenant Registry: REST API

  • Depends on: M4.1
  • Repos: platform/tenant-registry
  • Deliverables: OpenAPI 3.1 spec at /openapi.yaml; endpoints POST /tenants, GET /tenants/:id, POST /tenants/:id/activate, POST /tenants/:id/cancel, GET /catalog, POST /catalog/request, POST /catalog/trial-request, POST /api-keys, POST /internal/api-keys/verify, POST /audit, GET /audit.
  • Acceptance: every endpoint passes the OpenAPI contract test; returns documented errors for invalid input.
  • Tests: integration tests against real Postgres for every endpoint.
  • Gate: standard
  • Effort: L

M4.3 — Tenant Registry: Keycloak adapter

  • Depends on: M4.2, M2.2
  • Repos: platform/tenant-registry
  • Deliverables: package internal/keycloak that creates orgs, invites IT_ADMIN users, sets realm roles, and serves the protocol-mapper claims endpoint (the URL Keycloak hits during token issuance from M2.2).
  • Acceptance: creating a tenant via POST /tenants provisions a Keycloak org and one IT_ADMIN user; user receives invite email.
  • Tests: integration test against the stage Keycloak.
  • Gate: standard
  • Effort: M

M5.1 — Portal scaffold: subdomain routing + OIDC login

  • Depends on: M2.2, M4.3, M0.3
  • Repos: platform/portal, platform/design-tokens
  • Deliverables: Next.js 15 app on vm-control; middleware reads Host → extracts slug → calls Tenant Registry GET /tenants?slug= → injects tenant context; Keycloak OIDC login; logout; design-tokens package consumed by portal.
  • Acceptance: visiting https://acme.stage.breakpilot.com redirects to Keycloak; after login, user lands on /acme/dashboard (empty page) with valid session.
  • Tests: Playwright e2e: login + logout for an existing test tenant.
  • Gate: standard
  • Effort: M

M5.2 — Portal: dashboard + backstage shells

  • Depends on: M5.1
  • Repos: platform/portal
  • Deliverables: customer dashboard route /[slug]/dashboard (renders product tiles from JWT products claim — empty initially), backstage routes per PLATFORM_ARCHITECTURE.md §5a skeleton, RBAC enforcement (§5a "Operating principles" — hide what user can't access), session refresh.
  • Acceptance: user with org_roles=[USER] cannot see settings or billing links; backstage routes return 403 for non-BREAKPILOT_ADMIN users.
  • Tests: Playwright spec per role × route matrix.
  • Gate: standard
  • Effort: M

M5.3 — Playwright e2e harness

  • Depends on: M5.2
  • Repos: platform/portal
  • Deliverables: Playwright config that runs against stage.breakpilot.com post-deploy; CI job e2e-stage triggered after stage deploy; failure pages on-call.
  • Acceptance: breaking change to login is caught in CI within 10 min of merge.
  • Tests: the suite itself.
  • Gate: standard
  • Effort: S

Phase 1 exit criteria:

  • A tenant created via POST /tenants results in a working login flow at <slug>.stage.breakpilot.com.
  • All Phase 1 routes have a passing Playwright spec running on every stage deploy.

4. Phase 2 — Existing product uplift (M6.x M7.x, parallel)

Goal: CERTifAI and breakpilot-compliance both honour the JWT contract and surface a real product tile in the portal.

M6.1 — CERTifAI: org_id scoping at DB layer

  • Depends on: M2.2
  • Repos: benjamin_boenisch/certifai
  • Deliverables: MongoDB middleware that requires org_id on every query; backfill script for existing collections; per-tenant collection-level role checks (IT_ADMIN → Admin, etc.).
  • Acceptance: integration test attempting a cross-tenant read returns 403; existing single-tenant flows still work for tenant default.
  • Tests: unit + integration; regression tests for every existing controller.
  • Gate: standard + security checklist
  • Effort: L (46 weeks per prior gap analysis)

M6.2 — CERTifAI: JWT validation + role mapping

  • Depends on: M6.1
  • Repos: benjamin_boenisch/certifai
  • Deliverables: Keycloak JWKS validation middleware; role mapping per PLATFORM_ARCHITECTURE.md §6; tenant_status middleware (returns 402 on writes when frozen, 410 when archived, allows demo with no metering).
  • Acceptance: all four tenant.status states behave per spec; tested against a stage Keycloak.
  • Tests: integration tests per status value.
  • Gate: standard + security checklist
  • Effort: M

M6.3 — CERTifAI: manifest + integration assets

  • Depends on: M6.2
  • Repos: benjamin_boenisch/certifai
  • Deliverables: product.manifest.yaml per PRODUCT_INTEGRATION_SPEC.md §10 published to cdn.breakpilot.com; OpenAPI 3.1 spec; /v1/health, /v1/usage, /v1/tenants/:id/export, DELETE /v1/tenants/:id/data, POST /v1/tenants/demo/reset; web component certifai-dashboard per §5.A.
  • Acceptance: CERTifAI appears in the portal catalog; subscribed tenants can open it from the dashboard.
  • Tests: contract test that manifest validates against schema; web component renders inside portal shadow-DOM host.
  • Gate: standard
  • Effort: L

M7.1 — Compliance: JWT validation upgrade

  • Depends on: M2.2
  • Repos: benjamin_boenisch/breakpilot-compliance
  • Deliverables: Next.js proxy validates JWT against Keycloak JWKS (replacing today's X-Tenant-ID trust); tenant_status middleware as in M6.2.
  • Acceptance: spoofing X-Tenant-ID without a JWT returns 401; valid JWT for tenant A cannot read tenant B data.
  • Tests: integration tests for both auth and status states.
  • Gate: standard + security checklist
  • Effort: M (35 weeks per prior gap analysis)

M7.2 — Compliance: manifest + integration assets

  • Depends on: M7.1
  • Repos: benjamin_boenisch/breakpilot-compliance
  • Deliverables: same endpoint set as M6.3; web component (existing React → @r2wc/react-to-web-component per §5.A); manifest with supports_projects: true (already implemented).
  • Acceptance: compliance appears in portal catalog; opens from dashboard; project switching works inside the product.
  • Tests: as M6.3.
  • Gate: standard
  • Effort: M

Phase 2 exit criteria:

  • A real tenant on stage can subscribe to both products and use them through the portal.
  • Cross-product audit at /[slug]/audit shows events from both products in the Retraced schema.

5. Phase 3 — Business operations (M8.x M9.x)

Goal: ERPNext and Frappe HD run, Sales Order → tenant activate works, tickets escalate to Gitea.

M8.1 — ERPNext deployment

  • Depends on: M1.2, M2.1
  • Repos: platform/orca-platform
  • Deliverables: Frappe + ERPNext on vm-control (separate Postgres database from tenant_registry — see INFRASTRUCTURE.md RISK-1); reached at erp.breakpilot.com; Keycloak OIDC; IP-restricted at Orca-Proxy.
  • Acceptance: us login works; a Customer record can be created manually.
  • Tests: smoke test for OIDC; backup of Frappe filestore validated.
  • Gate: standard + manual sign-off (touches vm-control resources)
  • Effort: M

M8.2 — ERPNext customization

  • Depends on: M8.1
  • Repos: platform/orca-platform/erpnext-app/
  • Deliverables: custom Frappe app with: tenant_id field on Customer; sales_owner field on Lead; server scripts for the Sales Order → Tenant Registry webhook; Cancel workflow that calls Tenant Registry /cancel.
  • Acceptance: submitting a Sales Order in ERPNext triggers a tenant activation in stage Tenant Registry.
  • Tests: server-script unit tests (Frappe test harness); integration test exercises the full webhook.
  • Gate: standard
  • Effort: M

M8.3 — Self-serve billing (Polar.sh)

  • Depends on: M8.1, M5.2
  • Repos: platform/portal, platform/tenant-registry
  • Deliverables:
    • Polar.sh organization + products configured for Starter / Professional / per-seat tiers.
    • Polar Checkout embedded in portal /[slug]/billing/upgrade.
    • Webhook listener at tenant-registry /polar/webhook (HMAC-verified) handles subscription.created, subscription.updated, subscription.canceled, order.paid → flips tenant.status, mirrors the customer + invoice into ERPNext via REST.
    • Polar acts as Merchant of Record — they handle EU VAT MOSS, no per-country tax registration needed for our side.
    • Portal billing page reads invoices from ERPNext (single source of truth for accounting) but links out to Polar's customer portal for payment-method management.
  • Acceptance: signing up self-serve creates a tenant, a Polar subscription, an ERPNext Customer + Invoice, and a usable login; VAT line item appears correctly on the EU customer's invoice.
  • Tests: integration test against Polar sandbox; webhook replay test; tax calculation correct for at least DE, FR, NL, US.
  • Gate: standard + security checklist
  • Effort: L

Why Polar.sh over Stripe / Lemon Squeezy: OSS-aligned, Merchant of Record (handles EU VAT MOSS automatically), developer-first, 4% + Stripe fees vs. Lemon's 5%. Stripe direct would require us to register for VAT in 27 countries — not viable for a 2-person team. See self-hosted-oss-first.

M9.1 — Frappe Helpdesk

  • Depends on: M8.1
  • Repos: platform/orca-platform
  • Deliverables: Frappe HD on the same Frappe bench; customer portal embedded at /[slug]/support/.
  • Acceptance: a customer user can submit a ticket; we receive it.
  • Tests: Playwright spec for ticket submission.
  • Gate: standard
  • Effort: S

M9.2 — HD → Gitea escalation

  • Depends on: M9.1
  • Repos: platform/orca-platform/erpnext-app/
  • Deliverables: server script that on a Ticket: Escalate to Engineering action creates a Gitea issue in the matching repo via Gitea REST API; reverse webhook from Gitea on issue close marks ticket resolved.
  • Acceptance: the round-trip works for a test ticket on stage.
  • Tests: integration test against stage Gitea.
  • Gate: standard
  • Effort: S

Phase 3 exit criteria:

  • ERPNext is the source of truth for billing/CRM/HR.
  • The full Lead → Quote → Sales Order → Tenant chain works on stage.

6. Phase 4 — Customer UX & lifecycle (M10.x M14.x)

Goal: Every customer-facing flow from PLATFORM_ARCHITECTURE.md works end-to-end on stage.

M10.1 — Customer area: full surfaces

  • Depends on: M5.2, M6.3, M7.2
  • Repos: platform/portal
  • Deliverables: real implementations of /[slug]/dashboard, /[slug]/products/*, /[slug]/projects, /[slug]/settings/{identity,users,api-keys,integrations}, /[slug]/billing, /[slug]/audit, /[slug]/support.
  • Acceptance: every route is implemented, RBAC-gated, with empty/loading/error states.
  • Tests: one Playwright spec per route × primary role.
  • Gate: standard
  • Effort: L

M10.2 — Cross-product audit view

  • Depends on: M10.1, M4.2
  • Repos: platform/portal
  • Deliverables: audit page filters by product/actor/action/time; CSV + PDF export; events rendered from the Retraced-shape schema.
  • Acceptance: a DPO-style query ("show me everything user X did across all products last month") returns in <2s for a tenant with 100k events.
  • Tests: load test with synthetic events.
  • Gate: standard + security checklist
  • Effort: M

M11.1 — Catalog flow (P13)

  • Depends on: M4.2, M10.1
  • Repos: platform/portal, platform/tenant-registry
  • Deliverables: /[slug]/catalog UI per PLATFORM_ARCHITECTURE.md P13; "Request" button creates ERPNext CRM Lead.
  • Acceptance: customer requests a non-subscribed product; sales sees a Lead in ERPNext with the right sales_owner.
  • Tests: Playwright e2e covering the full P13 sequence.
  • Gate: standard
  • Effort: M

M12.1 — Self-serve trial (P15)

  • Depends on: M8.3, M11.1
  • Repos: platform/portal, platform/tenant-registry
  • Deliverables: public /start form; trial tenant provisioning (status=trial, trial_ends_at); banner; trial_quota enforcement (read by products from JWT).
  • Acceptance: prospect signs up; trial tenant with 14-day timer exists; quota enforced.
  • Tests: Playwright e2e signs up → uses → hits quota.
  • Gate: standard
  • Effort: M

M12.2 — Trial lifecycle cron + emails

  • Depends on: M12.1, M3.2 (Stalwart must be deliverability-clean)
  • Repos: platform/tenant-registry
  • Deliverables: scheduler in tenant-registry that runs day-7/12/14 emails; status transitions trial → active (on payment) or trial → frozen → archived; SMTP via Stalwart at mail.breakpilot.com:587; sender noreply@breakpilot.com; HTML + plaintext templates committed under tenant-registry/templates/email/; List-Unsubscribe headers per RFC 8058.
  • Acceptance: in a time-warped stage test (script that advances trial_ends_at), all transitions fire in order and all three emails land in Gmail inbox.
  • Tests: integration test with time injection; deliverability spot-check at each release.
  • Gate: standard
  • Effort: M

M13.1 — Demo tenant seeding

  • Depends on: M6.3, M7.2
  • Repos: platform/seed-data
  • Deliverables: per-product fixture archives (certifai/seed-v1.tar.gz, compliance/seed-v1.tar.gz); publishing pipeline to cdn.breakpilot.com; catalog.demo.seed_data_url populated in product manifests.
  • Acceptance: calling POST /v1/tenants/demo/reset on either product restores fixtures.
  • Tests: integration test asserts fixture state after reset.
  • Gate: standard
  • Effort: M

M13.2 — Sales demo flow (P14)

  • Depends on: M2.2, M13.1
  • Repos: platform/portal, platform/tenant-registry
  • Deliverables: demo tenant created in stage and prod with kind=demo, status=demo; SALES_REP role usable; backstage routes restricted to /backstage/leads and /backstage/demo; demo tenant audit events tagged {"demo": true} and hidden from real-tenant audit views.
  • Acceptance: sales rep logs in at demo.breakpilot.com, walks both products live, [Request Trial] modal creates a CRM Lead with sales_owner = the rep.
  • Tests: Playwright e2e for the sales walk-through.
  • Gate: standard + security checklist (SALES_REP guardrail enforcement is the load-bearing piece)
  • Effort: M

M13.3 — Nightly demo reset

  • Depends on: M13.2
  • Repos: platform/tenant-registry
  • Deliverables: cron at 03:00 Europe/Berlin calls each product's reset endpoint; failures page on-call.
  • Acceptance: after a deliberately-corrupted demo state, the next 03:00 reset restores fixtures.
  • Tests: test runs the reset manually + verifies fixture state.
  • Gate: standard
  • Effort: S

M14.1 — Cancel + frozen state (P16 part 1)

  • Depends on: M10.1, M6.2, M7.1
  • Repos: platform/portal, platform/tenant-registry
  • Deliverables: cancel modal with reason + typed-confirm; status active → frozen transition; Stripe cancel_at_period_end; ERPNext Opportunity → Lost; reactivation path within 30 days.
  • Acceptance: test customer cancels; portal switches to read-only; reactivate restores active status without data loss.
  • Tests: Playwright e2e covering cancel + reactivate.
  • Gate: standard + security checklist
  • Effort: M

M14.2 — Offboarding cron + final export (P16 part 2)

  • Depends on: M14.1, M6.3, M7.2
  • Repos: platform/tenant-registry
  • Deliverables: day-30 cron builds final export ZIP per product, emails signed URL (7-day TTL), calls DELETE /v1/tenants/:id/data on every subscribed product, archives Keycloak org, marks tenant.status = archived.
  • Acceptance: time-warped test runs the full P16 sequence end-to-end on stage; export ZIP contains data from both products; second post-archive request to either product returns 410.
  • Tests: integration test with time injection; GDPR-compliance regression suite added.
  • Gate: standard + security checklist + manual sign-off (irreversible operation)
  • Effort: L

Phase 4 exit criteria:

  • Every flow P1P16 from PLATFORM_ARCHITECTURE.md has a passing Playwright spec.
  • Stage runs a full lifecycle: sign-up trial → convert → use → cancel → offboard, in an automated nightly job.
  • We can hand a prospect a real demo using demo.breakpilot.com.

7. Phase 5 — Headless products (M15.x M17.x)

Goal: Make the platform host products with no UI of their own.

M15.1 — API key infrastructure

  • Depends on: M4.2, M10.1
  • Repos: platform/tenant-registry, platform/portal
  • Deliverables: API key CRUD per PRODUCT_INTEGRATION_SPEC.md §6.2; portal UI at /[slug]/settings/api-keys; POST /internal/api-keys/verify for products.
  • Acceptance: create key in portal; product call with key succeeds; revoke kills access within 60s.
  • Tests: integration tests for verify endpoint; Playwright for portal UI.
  • Gate: standard + security checklist (rotation + scope enforcement)
  • Effort: M

M15.2 — Webhook delivery

  • Depends on: M15.1
  • Repos: platform/tenant-registry, platform/portal
  • Deliverables: webhook config + delivery service per PLATFORM_ARCHITECTURE.md H4; portal page /[slug]/integrations; signed payloads; 3-attempt retry with backoff; dead-letter visible at /webhooks/deliveries.
  • Acceptance: test webhook to https://requestbin.com works; failed deliveries appear in dead letter.
  • Tests: integration tests with a local sink.
  • Gate: standard
  • Effort: M

M16.1 — First headless product reference implementation

  • Depends on: M15.2
  • Repos: TBD (proof-of-concept can live in platform/docs/examples/headless-template/)
  • Deliverables: a minimal headless product (e.g., echo-bot) that implements the full §5.C contract: manifest, API, audit emit, usage emit, demo reset, GDPR endpoints.
  • Acceptance: echo-bot is bookable from catalog, works end-to-end, passes the same lifecycle test as Phase 4.
  • Tests: the lifecycle e2e from M14.2 extended to include echo-bot.
  • Gate: standard
  • Effort: M

M17.1 — MCP servers (Enterprise)

  • Depends on: M6.3, M7.2
  • Repos: benjamin_boenisch/certifai, benjamin_boenisch/breakpilot-compliance
  • Deliverables: MCP endpoints per PRODUCT_INTEGRATION_SPEC.md §10 mcp: block; gated on plan == enterprise; routed via mcp.breakpilot.com.
  • Acceptance: Claude Code can connect to mcp.breakpilot.com/certifai with a service token and call list_ai_agents.
  • Tests: MCP contract test using mcp-cli.
  • Gate: standard + security checklist
  • Effort: L

Phase 5 exit criteria:

  • A third-party (or us) can add a new headless product by following PRODUCT_INTEGRATION_SPEC.md and a referenced template, with no portal code changes required.

8. Phase 6 — Enterprise + scale (M18.x M19.x)

These ship only when a paying customer requires them.

M18.1 — Custom domains

  • Depends on: M0.3, M10.1
  • Repos: platform/orca-platform, platform/portal
  • Deliverables: ACME on-demand TLS in Orca-Proxy; portal UI for customer to add domain; CNAME verification.
  • Acceptance: compliance.acme.com resolves and renders the Acme portal.
  • Tests: integration test with a synthetic domain.
  • Gate: standard
  • Effort: M

M18.2 — Physical data isolation

  • Depends on: M4.1, M6.1, M7.1
  • Repos: all data-plane products + tenant-registry
  • Deliverables: option per tenant for a dedicated Postgres / Mongo schema or database; provisioning automation; migration path from logical → physical.
  • Acceptance: an enterprise tenant runs on a dedicated schema; cross-tenant queries are physically impossible.
  • Tests: isolation enforcement test.
  • Gate: standard + security review + manual sign-off
  • Effort: L

M19.1 — A/B testing infra

  • Depends on: anywhere featureFlags.evaluate() is called
  • Repos: new platform/feature-flags (Unleash on vm-control or hosted) + portal SDK shim
  • Deliverables: swap the hard-coded evaluate() from §1.12 to call Unleash; eval results land in audit events for reproducibility.
  • Acceptance: flipping a flag in Unleash changes behaviour for the targeted tenant set within 30s; no behavior change for other tenants.
  • Tests: integration test asserts flag-driven branches.
  • Gate: standard
  • Effort: M

9. Cross-cutting work (every phase, ongoing)

These are not milestones — they are commitments enforced by CI and process.

  • Regression suite expansion. Every bug fix lands with a regression test FIRST. Tracked by tests-added label on PRs; fix-without-test PRs are rejected by reviewer.
  • Security review per phase. End of each phase: dependency audit (cargo audit, npm audit, pip-audit), SAST scan (semgrep), threat model update in platform/docs/security/.
  • Disaster-recovery drills. Once per phase on stage: pick one scenario from INFRASTRUCTURE.md §10, run it, document time-to-recover in the runbook.
  • Doc currency. PR template requires the author to tick "docs updated" or "n/a" — CI fails on a missing tick.
  • OSS swap-in readiness. When adding metering / audit / SCIM / flag eval code, use the schema/interface noted in PRODUCT_INTEGRATION_SPEC.md §15 so swap-in stays cheap.

10. First-PR checklist for Claude Code

When starting work, the first sequence of PRs should be:

  1. PR-1 (M0.1): Create platform/docs with copied architecture docs + this plan. Land in 1 day.
  2. PR-2 to PR-7 (M0.1 continued): Bootstrap each of the other five repos with §1.2 scaffolding. Land in parallel.
  3. PR-8 (M0.2): CI templates + branch protection per repo.
  4. PR-9 (M1.1): orca-platform directory layout + first stub manifest.
  5. PR-10 (M1.2): VM provisioning (vm-edge, vm-identity, vm-secrets, vm-control first — DNS and Keycloak depend on these).
  6. PR-11 (M0.3): PowerDNS on vm-edge + zone file + registrar NS delegation + wildcard TLS via Let's Encrypt DNS-01.

After PR-11, the dependency graph fans out and parallel work begins.

For each PR, Claude Code MUST:

  • Open the PR with the §1.4 template filled in.
  • Link the milestone ID in PR body (Linked milestone: M0.1).
  • Wait for human approval (no self-merge — branch protection enforces).
  • After merge: verify the stage deploy succeeds before starting the next dependent PR.

11. Dependency graph

                                                ┌── M6.1 ── M6.2 ── M6.3 ──┐
                                                │                          │
                       ┌── M2.1 ── M2.2 ────────┤                          ├── M10.1 ── M10.2
                       │                        │                          │       │
M0.1 ── M0.2 ── M1.1 ──┼── M1.2 ── M0.3 ── M1.3 │                          │       ├── M11.1 ── M12.1 ── M12.2
                       │           │            │                          │       │                       │
                       │           └── M3.1 ── M3.2                        │       ├── M13.2 ── M13.3      │
                       │                                                   │       │           │           │
                       └─────────────────────── M4.1 ── M4.2 ── M4.3 ── M5.1 ── M5.2 ── M5.3   M13.1       │
                                                                                                            │
                                                M8.1 ── M8.2 ── M8.3 ── M9.1 ── M9.2 ──────────────────────┤
                                                                                                            │
                                                M15.1 ── M15.2 ── M16.1 ── M17.1                            │
                                                                                                            │
                                                                                                      M14.1 ── M14.2

Phase-6 (M18, M19) depends on Phase-4 completion + a paying customer.
M12.2 depends on M3.2 (Stalwart deliverability must be clean before trial emails go out).

Critical path (longest chain to first paying customer): M0.1 → M0.2 → M1.1 → M1.2 → M0.3 → M1.3 → M2.1 → M2.2 → M4.1 → M4.2 → M4.3 → M5.1 → M5.2 → M6.2 → M6.3 → M10.1 → M11.1 → M12.1

That's 18 milestones. With one full-time agent and standard human review pacing, plan for 913 weeks to first paying customer flow on stage (added 1 week for the PowerDNS / DNS-delegation cycle vs. the prior Cloudflare path); +24 weeks for prod hardening and the Phase-4 lifecycle completion.

Note on M3.2 critical path: Stalwart IP warming (48 weeks) runs in background parallel — start it immediately after M3.1 so warming finishes before M12.2 needs it. It is NOT on the critical path for first paying customer (that customer can be onboarded by hand), but it IS on the critical path for self-serve trial volume.

Parallelism opportunities:

  • M6.x and M7.x can run fully in parallel (different repos, different stacks).
  • M8.x is independent of all data-plane work once M2.2 is done.
  • M15.x can begin as soon as M10.1 lands.

12. Open questions to resolve before starting

Resolved:

  • Email providerStalwart, self-hosted on vm-control. Plan in M3.2; 48 week IP warming acknowledged.
  • Stripe vs Lemon SqueezyPolar.sh. Plan in M8.3.
  • Cloudflare account ownership → not used; DNS is self-hosted via PowerDNS on vm-edge (M0.3). Registrar account (Benjamin's) still needs documented 2FA recovery — see new DR item below.

Still open:

  • CDN host for cdn.breakpilot.com: self-hosted MinIO + Caddy on vm-edge is the OSS-aligned default; alternative is BunnyCDN (cheap, EU). Decide before M6.3 (manifest bundles + hero images).
  • Cloud provider for port 25 outbound. Stalwart needs unblocked port 25 to send mail. Hetzner blocks by default and requires a request to unblock with proof of intent + abuse contact; OVH and Scaleway unblock on request faster. Confirm with Benjamin which provider vm-control runs on. Block on M3.2 if port 25 is unblockable — fallback is sending via a different provider's IP with reverse DNS.
  • Test data privacy. The demo tenant must contain ONLY synthetic data — confirm seed pipeline strips real PII even if our test orgs accidentally seed from prod.
  • Registrar + DNS bus-factor. Document who owns the registrar account, who has 2FA recovery codes, and the procedure to update NS records without that person available. Goes in platform/docs/runbooks/dr.md before M0.3 ships.
  • Internal CA. step-ca listed in INFRASTRUCTURE.md vm-edge as "optional" — decide whether inter-service mTLS is in scope for Phase 0 or deferred until Phase 4 (Enterprise tier).

End of document. Open items in §12 should be triaged before M0.1 starts; the bus-factor and port-25 items are the only hard blockers.