fix(feed): Improve NVD feed download resilience (x2)#3344
Conversation
📝 WalkthroughWalkthroughThis PR makes two small numeric configuration changes in the NVD vulnerability loader package: the HTTP client timeout is increased from 6 to 10 minutes, and the feed download retry limit is increased from 5 to 10 attempts. No other logic is modified. ChangesNVD Loader Resilience Tuning
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/vulnloader/nvdloader/loader_feed.go (1)
53-68: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy liftWorst-case retry duration grows substantially with unbounded exponential backoff.
With
maxRetries = 10and backoff doubling from 10s each attempt, the worst-case cumulative sleep time before giving up on a single year is10*(2^9-1) ≈ 5110s(~85 minutes), on top of per-attempt HTTP timeouts (now 10 minutes each, perloader_api.go). Since this loop runs for every year from 2002 to present sequentially, a few consecutive failing years could push the overall job well past typical CI timeout budgets, defeating the goal of improving resilience. Consider capping the backoff (e.g., viamin(backoff, maxBackoff)) so retries don't grow unbounded.♻️ Cap exponential backoff
+ const maxBackoff = 2 * time.Minute const maxRetries = 10 backoff := 10 * time.Second var apiFeed *apischema.CVEAPIJSON20 for attempt := 1; ; attempt++ { var err error apiFeed, err = fetchFeed(url, year) if err == nil { break } if attempt >= maxRetries { return errors.Wrapf(err, "failed to download feed for year %d after %d attempts", year, attempt) } log.Warnf("Feed year %d: attempt %d failed: %v; retrying in %s", year, attempt, err, backoff) time.Sleep(backoff) - backoff *= 2 + backoff *= 2 + if backoff > maxBackoff { + backoff = maxBackoff + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/vulnloader/nvdloader/loader_feed.go` around lines 53 - 68, The retry loop in fetchFeed currently uses uncapped exponential backoff, which can make loader_feed.go spend far too long retrying a failing year. Update the retry logic around maxRetries/backoff in the feed download loop to cap the sleep duration with a maximum backoff value, while still doubling between attempts until that cap is reached. Keep the existing retry logging and error handling intact, but ensure the backoff growth in this loop cannot become unbounded.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/vulnloader/nvdloader/loader_feed.go`:
- Around line 53-68: The retry loop in fetchFeed currently uses uncapped
exponential backoff, which can make loader_feed.go spend far too long retrying a
failing year. Update the retry logic around maxRetries/backoff in the feed
download loop to cap the sleep duration with a maximum backoff value, while
still doubling between attempts until that cap is reached. Keep the existing
retry logging and error handling intact, but ensure the backoff growth in this
loop cannot become unbounded.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 8200de36-faa1-4359-bd95-b6519545d632
📒 Files selected for processing (2)
pkg/vulnloader/nvdloader/loader_api.gopkg/vulnloader/nvdloader/loader_feed.go
I think doing retries as needed will be sufficient. If yall are curious about what I was referring to, here's an example from when I bumped the konflux timeout last time: |
Expands on #3343 to increase the retires / timeouts - based on observed status in https://github.com/stackrox/scanner/actions/runs/28467785361/job/84372363491 riding the threshold (update: and failing)