Skip to content

docs(Sizing): rewrite as workload-driven guide#7592

Open
germangarces wants to merge 5 commits into
mainfrom
feat/7200-sizing-scaling-docs
Open

docs(Sizing): rewrite as workload-driven guide#7592
germangarces wants to merge 5 commits into
mainfrom
feat/7200-sizing-scaling-docs

Conversation

@germangarces
Copy link
Copy Markdown
Member

@germangarces germangarces commented May 25, 2026

Thanks for submitting a PR! Please check the boxes below:

  • I have read the Contributing Guide.
  • I have added information to docs/ if required so people know about the feature.
  • I have filled in the "Changes" section below.
  • I have filled in the "How did you test this code" section below.

Changes

Closes #7200

Rewrite Sizing and Scaling docs as a workload-driven guide

Mental mode:

  • Sections 1–4: figure out your tier
  • Section 5: Day-1 cache setting
  • Sections 6–8: advanced tuning when something specific demands it
  • Sections 9–10: operate it
  • Sections 11–12: edges / specialty

How did you test this code?

N/A

Signed-off-by: germangarces <german.garces@flagsmith.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment May 25, 2026 1:50pm
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
flagsmith-frontend-preview Ignored Ignored Preview May 25, 2026 1:50pm
flagsmith-frontend-staging Ignored Ignored Preview May 25, 2026 1:50pm

Request Review

@github-actions github-actions Bot added the docs Documentation updates label May 25, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands the 'Sizing and Scaling' documentation for self-hosted Flagsmith, introducing workload-driven sizing patterns, worked examples, and specific infrastructure recommendations across four tiers. It also adds comprehensive sections on cache configuration, monitoring metrics, and a scaling decision tree. Feedback indicates that the removal of specific environment variables for database replication (such as REPLICA_DATABASE_URLS and REPLICA_READ_STRATEGY) is a regression in documentation quality, as these details are essential for users implementing the recommended scaling strategies.

Signed-off-by: germangarces <german.garces@flagsmith.com>
Signed-off-by: germangarces <german.garces@flagsmith.com>
Signed-off-by: germangarces <german.garces@flagsmith.com>
Signed-off-by: germangarces <german.garces@flagsmith.com>
@germangarces germangarces marked this pull request as ready for review May 25, 2026 13:51
@germangarces germangarces requested a review from a team as a code owner May 25, 2026 13:51
@germangarces germangarces requested review from Holmus and adamvialpando and removed request for a team May 25, 2026 13:51
Copy link
Copy Markdown
Contributor

@Holmus Holmus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gold and will be incredibly helpful for our selfhosted customers. Great work!

Some very minor comments. Also, did you look for places in the existing documentation where we can reference to this page?


### B: Server-side service with local cache

Backend polls Flagsmith every 60 seconds for the full environment snapshot, then evaluates flags locally. No round-trip
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a reference to local evaluation: https://docs.flagsmith.com/integrating-with-flagsmith/sdks#local-evaluation.

Maybe even as a tip or a note: "Unsure if you should be using local or remote evaluation? Learn more here"

| If you change… | New RPS | New tier |
| ----------------------------------------------- | ------- | -------- |
| Pods scale up to 300 (same one environment) | 5 RPS | Small |
| Poll interval dropped to 10 s (default is 60 s) | 3 RPS | Medium |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doublecheck this, i guess the second row should be small or the first row should be medium

| `CACHE_ENVIRONMENT_DOCUMENT_MODE` | `EXPIRING` | `PERSISTENT` at Large+ | Persistent mode survives pod restarts; warm-up cost amortised across the deployment. |
| `GET_IDENTITIES_ENDPOINT_CACHE_SECONDS` | `0` (off) | `30–60` | Cache the personalised response from a _GET_ identity request. _POST_ identity (which updates traits) always bypasses the cache. |

### Cache backend trade-offs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe clarify that this section is directly tied to CACHE_ENVIRONMENT_DOCUMENT_BACKEND.

- **Database (default).** Shared across pods. Cache hits still touch PostgreSQL. Fine through Medium.
- **LocMemCache.** Pod-local. Zero DB round-trip, but each pod warms separately and memory cost scales with pod count.
Best at Small / Medium with a small number of pods.
- **Redis / Memcached.** Shared, fast, off-DB. Adds a service you operate. Right at Large+.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have docs on how to set up Redis or Memcached? Thinking if we can reference helm charts or some docs

@Holmus Holmus removed the request for review from adamvialpando May 26, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docs: Scaling and SKU recommendations

2 participants