[draft] High Availability Deployment Models page by lukeknep · Pull Request #4703 · temporalio/documentation

lukeknep · 2026-06-11T17:46:17Z

What does this PR do?

When using multi-region High Availability, Temporal Cloud customers often ask us how to decide where to deploy their Workers and other systems.

This page gives recommendations on common patterns for an overall High Availability strategy that a Temporal Cloud user can adopt in their architecture.

Notes to reviewers

Internal context: https://temporaltechnologies.slack.com/archives/C04V0LSU5S6/p1781117451071889?thread_ts=1781008921.964629&cid=C04V0LSU5S6

┆Attachments: EDU-6522 [draft] High Availability Deployment Models page

vercel · 2026-06-11T17:46:33Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
temporal-documentation	Ready	Preview, Comment	Jun 16, 2026 11:47pm

github-actions · 2026-06-11T20:35:49Z

📖 Docs PR preview links

Cloud
- High Availability

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel · 2026-06-15T20:57:10Z

Deployment failed with the following error:

The `vercel.json` schema validation failed with the following message: should NOT have additional property `public`

Learn More: https://vercel.com/docs/concepts/projects/project-configuration

thestephenstanton · 2026-06-17T13:51:05Z

+- **Active / Passive** — Workflows process in one region at a time, the "active" region. The other region is "passive" and ready for failover. This pattern has two variants:
+  - **[Active / Passive (Cold)](#active-cold)** — a.k.a. Active / Cold — Workers run in only one region at a time. After a failover, Workers start in the secondary region. The region where Workers run == the region where Workflows process. To fail over, Workers need a "cold start" in the other region.
+  - **[Active / Passive (Hot)](#active-hot)** — a.k.a. Active / Hot — Workers run in **both regions** simultaneously, but Workflows still process in only one region at any given time. The other region's Workers are on "hot" standby.
+- **[Active / Active](#active-active)** — Workflows process in both regions at the same time. Necessarily, Workers run in both regions at all times.


nit: necessarily is an odd word to use here. Id just remove

thestephenstanton · 2026-06-17T13:56:21Z

+Active / Cold Pattern: **On failover**
+
+- **The Namespace fails over automatically.** Temporal Cloud promotes the secondary region's replica to active. No action is needed to fail over the Namespace itself.
+- **You bring the Workers up in the secondary region.** Because no Workers were running there, they start from nothing — a "cold" start. Starting and scaling that fleet is your responsibility, ideally through tested automation. Until the Workers are running, no Workflows make progress.


I feel like the question everyone reading this is going to ask is, how do we detect a failover.

I know we have plans to answer this in H2, but is there something we want to tell them now? Like them have some sort or system that is constantly querying what the active is to detect a failover? Or do we just want to wait for the question and address it then?

It could just be one of those things where we fix the problem before we expect to be asked about it.

Another thing I thought about is them knowing when to scale down those workers and do their own failback

thestephenstanton · 2026-06-17T13:59:32Z

+Active / Cold Pattern: **Tradeoffs**
+
+- Highest overall recovery time of the three patterns, due to cold starting the Worker fleet after failover.
+- Depends on tested automation to bring up the secondary-region fleet quickly.


"tested automation", I see this 3 times and as a user I'd have no idea what this means personally.

thestephenstanton · 2026-06-17T14:03:31Z

+
+- **Use the Namespace Endpoint.**
+   - Connect Workers through the [Namespace Endpoint](/cloud/namespaces#access-namespaces), which always connects to the Namespace in its active region and automatically fails over to the new region.
+   - **Rationale:** If a Temporal Cloud incident requires the Namespace to fail over while the rest of the primary region is healthy, the Workers in the primary region can still connect through the Namespace Endpoint and process Workflows. If the Workers use the Regional Endpoint for the primary region, they will not reliably connect to the Namespace during a Temporal Cloud incident in the primary region.


If the Workers use the Regional Endpoint for the primary region, they will not reliably connect to the Namespace during a Temporal Cloud incident in the primary region.

won't they be forwarded?

ah I see lower about turning off forwarding. This seems like this would be a really good feature to have in the worker and pass up the flag. Cause if you know you are connecting to a regional endpoint, and you don't want to have forwarding, seeing it all in one spot in the code is much more clear than having to set the regional endpoint in the worker and make a cli call externally.

just a thought

thestephenstanton · 2026-06-17T14:14:32Z

+- **Codec Servers and proxies** — run in both regions continuously.
+- **Databases and queues** — accessed from both regions; cross-region consistency must be designed for.
+
+### Dual Active (Multi-Active) {/* #dual-active */}


I'm a little confused about this one. Is this not just taking the active passive pattern and now just doing it for 2 namespaces now? I guess I'm confused about this being here when we already have active passive.

Like is this pattern here just really saying "you can have different namespaces in different regions"?

lukeknep added 2 commits June 10, 2026 11:21

Disable forwarding setting for HA

ee33475

First draft of deployment models page

508d477

lukeknep requested a review from a team as a code owner June 11, 2026 17:46

edits to deployment models

d1a61cc

vercel Bot deployed to Preview June 11, 2026 18:03 View deployment

more edits to deployment models

3f98c90

vercel Bot deployed to Preview June 11, 2026 18:47 View deployment

more updates

810fa7d

vercel Bot deployed to Preview June 11, 2026 20:09 View deployment

Merge branch 'main' into ha-worker-deployments

2f486b3

vercel Bot deployed to Preview June 11, 2026 21:23 View deployment

Add High Availability deployment patterns docs page

6722da6

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel Bot deployed to Preview June 12, 2026 04:18 View deployment

Updates to worker deployment patterns

24c9151

vercel Bot had a problem deploying to Preview June 15, 2026 20:57 Failure

updates

420713a

vercel Bot had a problem deploying to Preview June 15, 2026 22:32 Failure

sync-by-unito Bot assigned brianmacdonald-temporal Jun 16, 2026

updates

d6b5d36

vercel Bot had a problem deploying to Preview June 16, 2026 19:30 Failure

Merge branch 'main' into ha-worker-deployments

d1016d4

vercel Bot deployed to Preview June 16, 2026 23:47 View deployment

thestephenstanton reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[draft] High Availability Deployment Models page#4703

[draft] High Availability Deployment Models page#4703
lukeknep wants to merge 11 commits into
mainfrom
ha-worker-deployments

lukeknep commented Jun 11, 2026 •

edited by sync-by-unito Bot

Loading

Uh oh!

vercel Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 15, 2026

Uh oh!

thestephenstanton Jun 17, 2026

Uh oh!

thestephenstanton Jun 17, 2026

Uh oh!

thestephenstanton Jun 17, 2026

Uh oh!

thestephenstanton Jun 17, 2026

Uh oh!

thestephenstanton Jun 17, 2026

Uh oh!

thestephenstanton Jun 17, 2026

Uh oh!

thestephenstanton Jun 17, 2026

Uh oh!

thestephenstanton Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lukeknep commented Jun 11, 2026 • edited by sync-by-unito Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Notes to reviewers

Uh oh!

vercel Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📖 Docs PR preview links

Uh oh!

vercel Bot commented Jun 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lukeknep commented Jun 11, 2026 •

edited by sync-by-unito Bot

Loading

vercel Bot commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading