docs(Sizing): rewrite as workload-driven guide#7592
Conversation
Signed-off-by: germangarces <german.garces@flagsmith.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
There was a problem hiding this comment.
Code Review
This pull request significantly expands the 'Sizing and Scaling' documentation for self-hosted Flagsmith, introducing workload-driven sizing patterns, worked examples, and specific infrastructure recommendations across four tiers. It also adds comprehensive sections on cache configuration, monitoring metrics, and a scaling decision tree. Feedback indicates that the removal of specific environment variables for database replication (such as REPLICA_DATABASE_URLS and REPLICA_READ_STRATEGY) is a regression in documentation quality, as these details are essential for users implementing the recommended scaling strategies.
Signed-off-by: germangarces <german.garces@flagsmith.com>
Signed-off-by: germangarces <german.garces@flagsmith.com>
Signed-off-by: germangarces <german.garces@flagsmith.com>
Signed-off-by: germangarces <german.garces@flagsmith.com>
Holmus
left a comment
There was a problem hiding this comment.
This is gold and will be incredibly helpful for our selfhosted customers. Great work!
Some very minor comments. Also, did you look for places in the existing documentation where we can reference to this page?
|
|
||
| ### B: Server-side service with local cache | ||
|
|
||
| Backend polls Flagsmith every 60 seconds for the full environment snapshot, then evaluates flags locally. No round-trip |
There was a problem hiding this comment.
Add a reference to local evaluation: https://docs.flagsmith.com/integrating-with-flagsmith/sdks#local-evaluation.
Maybe even as a tip or a note: "Unsure if you should be using local or remote evaluation? Learn more here"
| | If you change… | New RPS | New tier | | ||
| | ----------------------------------------------- | ------- | -------- | | ||
| | Pods scale up to 300 (same one environment) | 5 RPS | Small | | ||
| | Poll interval dropped to 10 s (default is 60 s) | 3 RPS | Medium | |
There was a problem hiding this comment.
Doublecheck this, i guess the second row should be small or the first row should be medium
| | `CACHE_ENVIRONMENT_DOCUMENT_MODE` | `EXPIRING` | `PERSISTENT` at Large+ | Persistent mode survives pod restarts; warm-up cost amortised across the deployment. | | ||
| | `GET_IDENTITIES_ENDPOINT_CACHE_SECONDS` | `0` (off) | `30–60` | Cache the personalised response from a _GET_ identity request. _POST_ identity (which updates traits) always bypasses the cache. | | ||
|
|
||
| ### Cache backend trade-offs |
There was a problem hiding this comment.
Maybe clarify that this section is directly tied to CACHE_ENVIRONMENT_DOCUMENT_BACKEND.
| - **Database (default).** Shared across pods. Cache hits still touch PostgreSQL. Fine through Medium. | ||
| - **LocMemCache.** Pod-local. Zero DB round-trip, but each pod warms separately and memory cost scales with pod count. | ||
| Best at Small / Medium with a small number of pods. | ||
| - **Redis / Memcached.** Shared, fast, off-DB. Adds a service you operate. Right at Large+. |
There was a problem hiding this comment.
Do we have docs on how to set up Redis or Memcached? Thinking if we can reference helm charts or some docs
Thanks for submitting a PR! Please check the boxes below:
docs/if required so people know about the feature.Changes
Closes #7200
Rewrite Sizing and Scaling docs as a workload-driven guide
Mental mode:
How did you test this code?
N/A