feat(dashboard): proactively refresh expired Keycloak tokens #91

Merged
sharang merged 1 commits from feat/dashboard-token-refresh into main 2026-06-17 20:01:43 +00:00
Owner

Summary

The dashboard stored a refresh_token in the session at login (auth.rs) but never used it. Once the access_token's 5-minute lifespan ran out, every subsequent agent call failed with 401 ExpiredSignature and the UI showed "unable to load X" until the user manually logged out and back in.

Fix

Before attaching the bearer in agent_client::attach_token:

  1. Decode the JWT's exp claim (no signature verification — the agent does that).
  2. If expired OR within REFRESH_SKEW_SECS (30s) of expiry, exchange refresh_token for a fresh pair via the realm's token endpoint.
  3. Persist the new access_token + (possibly rotated) refresh_token back into the session.
  4. Send the request with the fresh token.

If the refresh fails (refresh_token also expired or rejected), fall through with the stale token. The agent's 401 then surfaces to the UI, which can prompt re-login — better UX cue than failing silently at the dashboard layer.

Why proactive, not retry-on-401

  • Saves a wasted round-trip on every call once the token has aged past 5 minutes
  • No need to clone reqwest::RequestBuilder bodies for retry (some bodies aren't cloneable)
  • Same end state — fresh token reaches the agent

Test plan

  • cargo test -p compliance-dashboard --features server --no-default-features infrastructure::agent_client::tests5 pass:
    • expired JWT → refresh
    • near-expiry within 30s skew → refresh
    • fresh JWT (plenty of life) → no refresh
    • malformed / empty / no-second-segment JWT → refresh (defensive)
    • JWT without exp claim → refresh (defensive)
  • Manual after deploy: dashboard works past the 5-min token lifespan without manual re-login.

Note on the other error

While diagnosing the original "unable to load repositories" symptom, the agent log surfaced two distinct failure modes:

  1. JWT validation failed: ExpiredSignature — what this PR fixes.
  2. JWT validation failed: JWT is missing tenant_id claim — a Keycloak realm config issue (user logging in lacks the M7.1 attributes that the protocol mappers consume). Being fixed separately by switching both services to the breakpilot-dev realm in orca-infra.

🤖 Generated with Claude Code

## Summary The dashboard stored a `refresh_token` in the session at login (`auth.rs`) but never used it. Once the access_token's **5-minute lifespan** ran out, every subsequent agent call failed with 401 `ExpiredSignature` and the UI showed "unable to load X" until the user manually logged out and back in. ## Fix Before attaching the bearer in `agent_client::attach_token`: 1. Decode the JWT's `exp` claim (no signature verification — the agent does that). 2. If expired OR within `REFRESH_SKEW_SECS` (30s) of expiry, exchange `refresh_token` for a fresh pair via the realm's token endpoint. 3. Persist the new `access_token` + (possibly rotated) `refresh_token` back into the session. 4. Send the request with the fresh token. If the refresh fails (refresh_token also expired or rejected), fall through with the stale token. The agent's 401 then surfaces to the UI, which can prompt re-login — better UX cue than failing silently at the dashboard layer. ## Why proactive, not retry-on-401 - Saves a wasted round-trip on every call once the token has aged past 5 minutes - No need to clone `reqwest::RequestBuilder` bodies for retry (some bodies aren't cloneable) - Same end state — fresh token reaches the agent ## Test plan - [x] `cargo test -p compliance-dashboard --features server --no-default-features infrastructure::agent_client::tests` — **5 pass**: - expired JWT → refresh - near-expiry within 30s skew → refresh - fresh JWT (plenty of life) → no refresh - malformed / empty / no-second-segment JWT → refresh (defensive) - JWT without `exp` claim → refresh (defensive) - [ ] Manual after deploy: dashboard works past the 5-min token lifespan without manual re-login. ## Note on the other error While diagnosing the original "unable to load repositories" symptom, the agent log surfaced **two distinct** failure modes: 1. `JWT validation failed: ExpiredSignature` — what this PR fixes. 2. `JWT validation failed: JWT is missing tenant_id claim` — a Keycloak realm config issue (user logging in lacks the M7.1 attributes that the protocol mappers consume). Being fixed separately by switching both services to the `breakpilot-dev` realm in `orca-infra`. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
sharang added 1 commit 2026-06-17 19:40:38 +00:00
feat(dashboard): proactively refresh expired Keycloak tokens
CI / Check (pull_request) Successful in 8m7s
CI / Detect Changes (pull_request) Has been skipped
CI / Deploy Agent (pull_request) Has been skipped
CI / Deploy Dashboard (pull_request) Has been skipped
CI / Deploy Docs (pull_request) Has been skipped
CI / Deploy MCP (pull_request) Has been skipped
bec47f8c7d
The dashboard stored a refresh_token in the session at login (auth.rs)
but never used it. Once the access_token's 5-minute lifespan ran out,
every subsequent agent call failed with 401 ExpiredSignature. The UI
showed "unable to load X" until the user logged out and back in.

Fix: before attaching the bearer, decode the JWT's `exp` claim and
proactively refresh via the stored refresh_token if the token is
expired or within REFRESH_SKEW_SECS (30s) of expiry. Updates the
session with the new access_token (and rotated refresh_token if KC
sends one). Refresh failures fall through with the stale token so the
agent's 401 surfaces to the UI rather than failing the request at the
dashboard layer.

Why "proactive" instead of "retry on 401"
- Saves a wasted round-trip on every agent call once the token has
  aged past 5 min.
- Doesn't require cloning RequestBuilder bodies for retry.
- Same end state — fresh token reaches the agent.

Test plan
- cargo test -p compliance-dashboard --features server
  --no-default-features infrastructure::agent_client::tests — 5 pass:
    * expired JWT → refresh
    * near-expiry within skew window → refresh
    * fresh JWT → no refresh
    * malformed/empty JWT → refresh (defensive)
    * JWT without exp claim → refresh (defensive)
- Manual after deploy: dashboard works past the 5-min token lifespan
  without manual re-login.

Note
- The refresh code addresses the ExpiredSignature failure mode. The
  separate "JWT is missing tenant_id claim" 401 is a Keycloak realm
  config issue (the user logging in lacks the M7.1 attributes that
  the protocol mappers consume) and is fixed by realm/attribute
  config, not by this PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sharang merged commit 69c4f7bb78 into main 2026-06-17 20:01:43 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: sharang/compliance-scanner-agent#91