Skip to content

Kubernetes API client QPS/Burst rate limits are hardcoded, causing throttling at high WORKERS concurrency #1280

@mcornea

Description

@mcornea

Describe the bug
When running NTH in Queue Processor mode with WORKERS >= 50, the Kubernetes API client is severely throttled by the hardcoded client-go default rate limits (QPS=5, Burst=10). All workers share a single clientset created via rest.InClusterConfig()

With WORKERS=50 dispatching simultaneously during a correlated spot interruption, ~500 API calls compete for the 5 QPS rate limit. Workers are blocked for 8-10 seconds per API call due to client-side throttling:

INF Waited for 7.994525579s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver:6443/api/v1/nodes?labelSelector=kubernetes.io%2Fhostname
INF Waited for 9.994951525s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver:6443/api/v1/nodes?labelSelector=kubernetes.io%2Fhostname
INF Waited for 9.971182606s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver:6443/api/v1/nodes?labelSelector=kubernetes.io%2Fhostname

Steps to reproduce

  1. Deploy NTH in Queue Processor mode with WORKERS=50 (or higher)
  2. Trigger 50+ concurrent spot interruptions (e.g., via AWS FIS aws:ec2:send-spot-instance-interruptions)
  3. Observe NTH logs: continuous client-side throttling warnings with 8-10 second waits, and all workers busy, waiting every second

Expected outcome
NTH should allow configuring the Kubernetes API client QPS and burst rate limits via environment variables or CLI flags so that high-concurrency deployments can avoid client-side throttling.

Environment

  • NTH Version: latest main branch (also reproduced on OpenShift fork)
  • NTH Mode: Queue Processor (SQS)
  • Kubernetes version: v1.35.5 (OpenShift 4.22)
  • Platform: ROSA HCP on AWS, us-east-2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions