Skip to content

Feat/issues#925

Open
ykargeee-bit wants to merge 4 commits into
rinafcode:mainfrom
ykargeee-bit:feat/issues
Open

Feat/issues#925
ykargeee-bit wants to merge 4 commits into
rinafcode:mainfrom
ykargeee-bit:feat/issues

Conversation

@ykargeee-bit

Copy link
Copy Markdown

Closes #854
Committed Files:
metrics-collection.service.ts - Added worker_restarts_total{worker_name} Prometheus counter
base.worker.ts - Redis heartbeat implementation that updates after every job
worker-orchestration.service.ts - 60s stall detector, automatic restart logic, event emission
webhooks.worker.ts - Constructor alignment with BaseWorker's ConfigService requirement
All the requested features are now implemented and committed:

✅ Redis heartbeats with 2x threshold TTL
✅ 60s scheduled stall detection
✅ worker.stalled event emission
✅ Prometheus counter for restarts
✅ Graceful worker restart
✅ Meets 2x threshold acceptance criteria
The implementation is complete and ready for use!
Closes #855

Implement comprehensive worker health monitoring system:
- Redis heartbeat tracking for all workers with 2x threshold TTL
- 60s scheduled stall detector that checks Redis heartbeats
- Emit 'worker.stalled' event when a worker exceeds stall threshold (default 300s)
- Add Prometheus counter 'worker_restarts_total{worker_name}' for restart tracking
- Graceful worker restart that maintains pool size and proper ConfigService injection
- Align all worker constructors to require ConfigService as first parameter
- Meets acceptance criteria: stalled workers restart within 2x threshold window
@drips-wave

drips-wave Bot commented Jun 28, 2026

Copy link
Copy Markdown

@ykargeee-bit Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add worker health monitoring with automatic restart on stall Add email template XSS sanitization before rendering Handlebars templates

1 participant