Skip to content

fix(proxy): make CODE cold-start handling request-independent and status-driven#343

Open
joshtrichards wants to merge 4 commits into
CollaboraOnline:masterfrom
joshtrichards:jtr/fix-backend-robustness
Open

fix(proxy): make CODE cold-start handling request-independent and status-driven#343
joshtrichards wants to merge 4 commits into
CollaboraOnline:masterfrom
joshtrichards:jtr/fix-backend-robustness

Conversation

@joshtrichards
Copy link
Copy Markdown
Contributor

@joshtrichards joshtrichards commented May 29, 2026

Summary

Improve cold-start handling in proxy.php for the built-in CODE server.

Previously, startup was already launched in the background, but request handling could still wait indefinitely for readiness and cold-start progress was not tracked explicitly across requests. This made startup behavior harder to reason about during slow starts, repeated requests, or failure scenarios.

This change makes cold-start progress more request-independent and status-driven by:

  • tracking startup-in-progress state explicitly
  • avoiding indefinite request-scoped waiting
  • making ?status handling more robust during startup
  • preserving existing status-based behavior while adding better startup coordination

Related changes in frontend, but not dependent on: nextcloud/richdocuments#5705

Changes

Startup coordination

  • add a coolwsd.starting marker file in the temp directory
  • track startup age and detect stale startup markers
  • prevent duplicate startup attempts while a fresh startup is already in progress

Startup lifecycle

  • keep startup progress independent of the initiating HTTP request
  • change startCoolwsd() to trigger startup and return quickly
  • retain background launch behavior while making readiness handling less request-bound
  • clear stale pid/startup state where appropriate

Status endpoint behavior

  • make ?status handling more robust during cold start
  • preserve existing status values such as OK, starting, restarting, and error
  • include elapsed startup time for transitional states
  • keep the status endpoint responsive while startup is still in progress

Request-path behavior

  • replace potentially indefinite waits with bounded readiness checks
  • allow a short fast-path wait for cases where CODE becomes ready almost immediately
  • fail normal proxy requests cleanly if the service is still starting instead of hanging indefinitely

Hardening

  • use reachability checks in places where socket availability matters more than PID presence

Why

This makes the proxy a better backend for clients that poll readiness during cold start, especially during first-run extraction or slower startup scenarios.

It also improves operational behavior by making startup:

  • easier to diagnose
  • less sensitive to request timing
  • less prone to duplicate launch attempts

User/admin impact

Users

  • fewer hangs during cold start
  • faster, clearer behavior when startup is still in progress

Administrators

  • more reliable ?status behavior during startup
  • clearer distinction between “still starting” and “stale/broken”
  • better behavior under repeated requests or slower cold starts

Review notes

Most of the code churn is due to extracting repeated cold-start checks into helpers.

I tried split this into small commits by concern:

  1. startup-state tracking
    • Add a lightweight file-based marker to coordinate cold start across requests and to detect stale startup attempts.
  2. detached cold-start behavior
    • Replace indefinite startup waiting with a short readiness window and let cold-start progress continue independently in the background.
  3. status endpoint handling
    • Reuse the startup marker and short readiness wait in ?status so clients can poll startup progress reliably.
  4. normal request handling

The intended functional changes are mainly in 2–4; 1 provides the shared state mechanism.

Testing

Suggested manual verification:

  1. Ensure the built-in CODE server is not running
  2. Request proxy.php?status
  3. Confirm it returns transitional startup status during cold start and OK once ready
  4. Confirm repeated ?status requests during startup do not trigger duplicate launches
  5. Confirm normal proxy requests no longer hang indefinitely if startup is slow
  6. Confirm stale startup state is eventually cleared when startup never completes

@joshtrichards joshtrichards changed the title Jtr/fix backend robustness fix(proxy): make CODE cold-start handling request-independent and status-driven May 29, 2026
@joshtrichards joshtrichards changed the title fix(proxy): make CODE cold-start handling request-independent and status-driven WIP - fix(proxy): make CODE cold-start handling request-independent and status-driven May 29, 2026
@joshtrichards joshtrichards changed the title WIP - fix(proxy): make CODE cold-start handling request-independent and status-driven fix(proxy): make CODE cold-start handling request-independent and status-driven May 29, 2026
@joshtrichards joshtrichards marked this pull request as ready for review May 29, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant