Route deepseek-v4-flash to Fireworks when DeepSeek API is unhealthy#745
Open
jahooma wants to merge 1 commit into
Open
Route deepseek-v4-flash to Fireworks when DeepSeek API is unhealthy#745jahooma wants to merge 1 commit into
jahooma wants to merge 1 commit into
Conversation
Adds Fireworks as a transparent fallback for deepseek-v4-flash, gated by
a passive circuit breaker so we only divert when the official DeepSeek
API actually misbehaves.
- New deepseek-health.ts circuit breaker: 3 failures in 60s opens the
circuit for 5 min; the next request after expiry probes DeepSeek again
and resets on success. No background polling — every user request is
itself the probe.
- Tighter 60s headersTimeout for the Flash undici agent so dead-API
requests fail fast (the existing 30-min default is kept for reasoning
models on v4-pro).
- handleDeepSeek{Stream,NonStream} now wrap the fetch call so network
errors, timeouts, and 5xx/408/429 responses feed the breaker; 2xx
resets it.
- _post.ts routes to Fireworks when the circuit is open and adds inline
pre-stream failover so the first user to hit an outage also gets a
Fireworks response instead of an error.
- Adds accounts/fireworks/models/deepseek-v4-flash to the Fireworks
model + pricing maps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
deepseek-v4-flash(accounts/fireworks/models/deepseek-v4-flash).deepseek-health.ts): 3 failures / 60s window opens the circuit for 5 min, then the next request probes DeepSeek and resets on success. No background polling — every user request is the probe, so all pods converge naturally.headersTimeoutfor the Flash undici agent so dead-API requests fail fast instead of hanging on the existing 30-min default (kept for reasoning models on v4-pro)._post.tsroutes to Fireworks when the circuit is open, plus inline pre-stream failover so the first user to hit an outage also gets a Fireworks response instead of an error.FIREWORKS_PRICING_MAP: 0.14 / 0.03 / 0.28 per M tokens (input / cached / output).How it works
createDeepSeekRequestTrackedwraps the DeepSeek fetch. Network errors, timeouts, 5xx/408/429 →recordDeepSeekFailure(). 2xx →recordDeepSeekSuccess()(clears state).recentFailures.length >= 3within the 60s window,openUntil = now + 5min._post.tscallsshouldBypassDeepSeek(model)and, if true, setsuseDeepSeek = falseso the existing Fireworks branch picks up the same model id (now inFIREWORKS_MODEL_MAP).Test plan
deepseek-health.test.ts, all passing).fireworks-deploymentandfireworks-healthtest suites still pass (75 llm-api tests total).bun run typecheckclean for changed files (pre-existing SDK errors unrelated).deepseek/deepseek-v4-flashcalls succeed via DeepSeek when healthy and via Fireworks once the breaker opens (induce by temporarily pointingDEEPSEEK_BASE_URLat a sink, or watch real outage logs).🤖 Generated with Claude Code