Skip to content

Route deepseek-v4-flash to Fireworks when DeepSeek API is unhealthy#745

Open
jahooma wants to merge 1 commit into
mainfrom
deepseek-fireworks-fallback
Open

Route deepseek-v4-flash to Fireworks when DeepSeek API is unhealthy#745
jahooma wants to merge 1 commit into
mainfrom
deepseek-fireworks-fallback

Conversation

@jahooma
Copy link
Copy Markdown
Contributor

@jahooma jahooma commented May 24, 2026

Summary

  • Adds Fireworks as a transparent fallback for deepseek-v4-flash (accounts/fireworks/models/deepseek-v4-flash).
  • New passive circuit breaker (deepseek-health.ts): 3 failures / 60s window opens the circuit for 5 min, then the next request probes DeepSeek and resets on success. No background polling — every user request is the probe, so all pods converge naturally.
  • Tighter 60s headersTimeout for the Flash undici agent so dead-API requests fail fast instead of hanging on the existing 30-min default (kept for reasoning models on v4-pro).
  • _post.ts routes to Fireworks when the circuit is open, plus inline pre-stream failover so the first user to hit an outage also gets a Fireworks response instead of an error.
  • Pricing entry in FIREWORKS_PRICING_MAP: 0.14 / 0.03 / 0.28 per M tokens (input / cached / output).

How it works

  1. createDeepSeekRequestTracked wraps the DeepSeek fetch. Network errors, timeouts, 5xx/408/429 → recordDeepSeekFailure(). 2xx → recordDeepSeekSuccess() (clears state).
  2. When recentFailures.length >= 3 within the 60s window, openUntil = now + 5min.
  3. Routing in _post.ts calls shouldBypassDeepSeek(model) and, if true, sets useDeepSeek = false so the existing Fireworks branch picks up the same model id (now in FIREWORKS_MODEL_MAP).
  4. After cooldown expires, the next request retries DeepSeek directly. Success resets; another failure re-opens.

Test plan

  • Unit tests for circuit breaker and outage classifier (11 new tests in deepseek-health.test.ts, all passing).
  • Existing fireworks-deployment and fireworks-health test suites still pass (75 llm-api tests total).
  • bun run typecheck clean for changed files (pre-existing SDK errors unrelated).
  • Manual verification once deployed: confirm deepseek/deepseek-v4-flash calls succeed via DeepSeek when healthy and via Fireworks once the breaker opens (induce by temporarily pointing DEEPSEEK_BASE_URL at a sink, or watch real outage logs).

🤖 Generated with Claude Code

Adds Fireworks as a transparent fallback for deepseek-v4-flash, gated by
a passive circuit breaker so we only divert when the official DeepSeek
API actually misbehaves.

- New deepseek-health.ts circuit breaker: 3 failures in 60s opens the
  circuit for 5 min; the next request after expiry probes DeepSeek again
  and resets on success. No background polling — every user request is
  itself the probe.
- Tighter 60s headersTimeout for the Flash undici agent so dead-API
  requests fail fast (the existing 30-min default is kept for reasoning
  models on v4-pro).
- handleDeepSeek{Stream,NonStream} now wrap the fetch call so network
  errors, timeouts, and 5xx/408/429 responses feed the breaker; 2xx
  resets it.
- _post.ts routes to Fireworks when the circuit is open and adds inline
  pre-stream failover so the first user to hit an outage also gets a
  Fireworks response instead of an error.
- Adds accounts/fireworks/models/deepseek-v4-flash to the Fireworks
  model + pricing maps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant