Skip to content

fix: after pool has entered into broken state, log error and recreate the process pool#36

Merged
ktstrader merged 1 commit into
mainfrom
fix/BED-8692
Jun 23, 2026
Merged

fix: after pool has entered into broken state, log error and recreate the process pool#36
ktstrader merged 1 commit into
mainfrom
fix/BED-8692

Conversation

@ktstrader

Copy link
Copy Markdown
Contributor

Description

Change to src/openhound/scheduler/service.py:

  • Import BrokenProcessPool from concurrent.futures.process.
  • Add a private _reset_executor() helper that shuts down the broken pool non-blocking and creates a fresh ProcessPoolExecutor with the same settings (max_workers=1, max_tasks_per_child=1).
  • Add a dedicated except BrokenProcessPool branch to _handle_completed_job, placed before the existing generic except Exception. It reports the in-flight job as FAILED to BHE and calls _reset_executor(). The existing finally block continues to clear future/job_running.
  • Wrap the executor.submit(...) call in _start_job with a try/except BrokenProcessPool. On failure it clears job_running/future, rebuilds the executor, and reports the job as FAILED to BHE so it does not remain in Running.

Two tests added to tests/test_bhe_job_scheduling.py:

  • test_poll_recovers_from_broken_process_pool — sets a BrokenProcessPool exception on the future, asserts that job_running/future are cleared, the executor instance is replaced, and BHE receives FAILED with the expected message.
  • test_start_job_recovers_when_submit_raises_broken_pool — monkeypatches executor.submit to raise BrokenProcessPool, asserts that both start_job and end_job were called on BHE, local state is cleared, and the executor instance is replaced.

Motivation and Context

Resolves: BED-8692

@ktstrader ktstrader self-assigned this Jun 23, 2026
@d3vzer0 d3vzer0 self-requested a review June 23, 2026 23:01
@ktstrader ktstrader merged commit 056f802 into main Jun 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants