Clarify v0-vs-v1 latency metric semantics and low-traffic percentiles#4735
Open
dustin-temporal wants to merge 1 commit into
Open
Clarify v0-vs-v1 latency metric semantics and low-traffic percentiles#4735dustin-temporal wants to merge 1 commit into
dustin-temporal wants to merge 1 commit into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
📖 Docs PR preview links
|
Two OpenMetrics doc gaps surfaced by a customer alerting false alarm after migrating from the v0 query endpoint to v1 OpenMetrics: - Migration guide: add a caution that v0 service_latency_sum/count is an average (~p50) and _bucket is a count, not a percentile. Comparing either against v1 _p95/_p99 reports higher values for identical traffic. Includes safe-migration steps and a pointer to the p99 latency SLO. - Metrics reference: add a note that percentile metrics on low-traffic namespaces are computed from small per-minute samples, so a single slow request dominates p50/p95/p99. Recommends gating latency alerts on a minimum request count, and notes that pre-calculated percentiles cannot be re-aggregated into an accurate longer-window percentile. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
0b98052 to
8faa8b1
Compare
TimSimmons
approved these changes
Jun 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two small additions to the OpenMetrics docs to close gaps that caused a customer-side latency alerting false alarm after migrating from the v0 query endpoint to the v1 OpenMetrics endpoint.
1. Migration guide —
Percentile metricssectionAdds a caution that the v0 latency metrics are a histogram, not a percentile:
v0_service_latency_sum / v0_service_latency_countis an average (≈ p50).v0_service_latency_bucket{le="..."}only counts requests under a threshold.v1_service_latency_p95/_p99, so v1 will report higher values for identical traffic — a measurement change, not a regression.2. Metrics reference —
Metric ConventionssectionAdds a note that percentile metrics on low-traffic namespaces are computed from small per-minute samples, so a single slow request dominates p50/p95/p99. Recommends gating latency alerts on a minimum request count (e.g.
service_request_count) so sparse windows don't trigger them. Also notes that these pre-calculated percentiles cannot be re-aggregated into an accurate longer-window percentile, so widening the evaluation window does not by itself make a sparse sample meaningful — consistent with the existing per-metric "avoid aggregating this metric" caution.Why
A customer migrated to v1 cloud metrics, set a per-namespace p95 alert against the 200ms p99 SLO, and saw frequent
StartWorkflowExecutionlatency spikes that turned out to be a metrics artifact: their low-RPS namespaces produced tiny per-minute samples where one slow request defined the whole quantile, and their v0 baseline had been an average rather than a percentile. No actual latency regression — the v0→v1 measurement change just made existing tail latency visible. These docs would have pre-empted the confusion.Scope
Prose-only; no metric behavior changes. Two
.mdxfiles, additive callouts only.🤖 Generated with Claude Code