fix(scrapy): async-thread startup race, shutdown lifecycle, and timeout setting by vdusek · Pull Request #979 · apify/apify-sdk-python

vdusek · 2026-06-12T14:06:25Z

Description

Fixes several defects in the Scrapy integration's background event-loop thread (AsyncThread), the scheduler, and the HTTP cache storage, and makes the loop timeout configurable.

Fixes

run_coro startup race — the is_running() guard fired spuriously when a coroutine was submitted before the loop thread reached run_forever() (observed ~122/500 in scheduler.open()). It now guards on is_closed(). A coroutine queued on a not-yet-running loop runs once the loop starts; only a closed loop raises.
close() thread leak — if task cancellation timed out or raised, the loop was never stopped or joined. Stop, join, and the forced-shutdown fallback now run in a finally, and the original error still propagates.
close() second call — a repeated close raised RuntimeError: Event loop is closed. An is_closed() early-return makes it a no-op.
close() ignored its timeout for the cancellation step (it used the constructor default). It now passes the caller's timeout through.
run_coro timeout left the coroutine running. It now cancels the future on timeout.
HTTP cache open/cleanup thread leaks — open_spider now closes the thread if opening the key-value store fails (matching ApifyScheduler.open). The expiration sweep runs inside try with close() in a finally.
Configurable timeout (refactor(scrapy): make AsyncThread timeout configurable #955) — new APIFY_ASYNC_THREAD_TIMEOUT_SECS setting, wired into the scheduler (via from_crawler) and the cache storage.

Error logging

The integration now follows consistent conventions for caught exceptions:

except … as exc: → logger.warning(f'… {exc}'), swallowed — for expected, recoverable conditions handled locally: a malformed or legacy stored payload skipped as a cache/queue miss, or non-UTF-8 headers preserved in the serialized request. A short message plus the exception text, with no traceback, because it is not a bug.
except Exception: → logger.exception('…'), swallowed — for unexpected failures handled at a terminal point: the cleanup sweep, shutdown, or skip-and-continue. logger.exception attaches the full traceback, and nothing re-raises because the error is handled here.
except …: → raise (no logging) — when the error is re-raised and the caller or Scrapy logs it with a traceback anyway. run_coro's timeout path cancels the future and re-raises without logging, so the failure is reported once.
except Exception: → logger.exception('…'); raise — the boundary log, used only where local context materially helps and the propagated error would otherwise be logged only generically or not at all. The scheduler's next_request / enqueue_request / has_pending_requests are called synchronously by the Scrapy engine (not inside a Deferred), so without this log the Apify-specific context would be lost.

Why logger.exception replaced traceback.print_exc(): traceback.print_exc() writes a bare traceback straight to stderr, bypassing logging entirely. It has no level, no logger name, no message, and ignores Scrapy's and the SDK's log configuration and handlers. logger.exception(msg) logs at ERROR through the configured logging, so it is routed, formatted, and filterable like every other log line. It adds a message explaining what failed and still attaches the full traceback automatically, which makes including the exception object in the message ({exc}) redundant (ruff TRY401).

Tests

New tests/unit/scrapy/test_async_thread.py covers the startup race, run-after-close, timeout cancellation, idempotent close, the caller timeout reaching the shutdown step, and stop/join when task cancellation fails. The scheduler and HTTP cache test modules gain coverage for the timeout setting, closing the thread on open failure, and the cleanup-failure path still closing the thread.

codecov · 2026-06-12T14:07:49Z

Codecov Report

❌ Patch coverage is 71.42857% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.60%. Comparing base (0daca28) to head (3c4a3ec).
⚠️ Report is 32 commits behind head on master.

Files with missing lines	Patch %	Lines
src/apify/scrapy/extensions/_httpcache.py	76.27%	14 Missing ⚠️
src/apify/scrapy/scheduler.py	55.55%	8 Missing ⚠️
src/apify/scrapy/_async_thread.py	78.94%	4 Missing ⚠️
src/apify/scrapy/requests.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #979      +/-   ##
==========================================
+ Coverage   89.90%   91.60%   +1.69%     
==========================================
  Files          49       49              
  Lines        3091     3168      +77     
==========================================
+ Hits         2779     2902     +123     
+ Misses        312      266      -46

Flag	Coverage Δ
e2e	`35.73% <0.00%> (-0.18%)`	⬇️
integration	`56.47% <0.00%> (-0.41%)`	⬇️
unit	`80.49% <71.42%> (+1.74%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

… setting

…artup-race

…read-lifecycle

…errors

…lure or shutdown error

…er.exception

…on (TRY401)

… paths

Pijukatel · 2026-06-18T06:48:21Z

-        value = self._async_thread.run_coro(self._kvs.get_value(key))
+        try:
+            value = self._async_thread.run_coro(self._kvs.get_value(key))
+        except Exception:


This pattern is really hard to understand without more context. I think this deserves an inline comment, as the natural thing to do with this is to remove the try/except as it seems redundant.

It is used everywhere. IIRC, there were some issues with the exception propagation from a separate thread and its event loop. This was extreme pain to make it work, so I definitely wouldn't change it.

Swallowing all exceptions to add additional log and re-raise the exception is an intentional non-standard action, which is here due to some external context circumstances. Those external circumstances are not obvious on this level, so that is why I am asking for some comment, so that the reader without context can understand the intention of the code.

I am not asking to change this, but to help protect it by adding a comment explaining it.

Ah, okay, I just thought you wanted an explanation, sorry, adding it.

fix: Resolve AsyncThread.run_coro startup race

5cca584

vdusek added adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. labels Jun 12, 2026

vdusek self-assigned this Jun 12, 2026

github-actions Bot added this to the 142nd sprint - Tooling team milestone Jun 12, 2026

github-actions Bot added the tested Temporary label used only programatically for some analytics. label Jun 12, 2026

vdusek added 3 commits June 12, 2026 17:12

fix(scrapy): async-thread shutdown, duplicate error logs, and timeout…

c025745

… setting

Merge remote-tracking branch 'origin/master' into fix/async-thread-st…

37aa4cb

…artup-race

Merge remote-tracking branch 'origin/master' into fix/scrapy-async-th…

2275ad3

…read-lifecycle

vdusek changed the title ~~fix: Resolve AsyncThread.run_coro startup race~~ fix(scrapy): Resolve AsyncThread.run_coro startup race Jun 12, 2026

vdusek requested a review from Pijukatel June 12, 2026 17:03

vdusek marked this pull request as ready for review June 12, 2026 17:04

vdusek added 4 commits June 12, 2026 19:07

test: remove redundant section-header comments

8540470

fix(scrapy): keep traceback.print_exc() on background-loop coroutine …

50aa539

…errors

fix(scrapy): don't leak the async-thread event loop on cache-open fai…

98edce2

…lure or shutdown error

merge: consolidate startup-race fix (#979) with lifecycle fixes (#980)

a60a55c

vdusek changed the title ~~fix(scrapy): Resolve AsyncThread.run_coro startup race~~ fix(scrapy): async-thread startup race, shutdown lifecycle, and timeout setting Jun 13, 2026

vdusek mentioned this pull request Jun 13, 2026

fix(scrapy): async-thread shutdown, duplicate error logs, and timeout setting #980

Closed

vdusek marked this pull request as draft June 13, 2026 08:14

vdusek removed the request for review from Pijukatel June 13, 2026 08:14

fix(scrapy): apply timeout to async-thread close, log errors via logg…

29ec5ff

…er.exception

vdusek marked this pull request as ready for review June 17, 2026 12:24

vdusek added 3 commits June 17, 2026 14:36

refactor(scrapy): drop redundant exception object from logger.excepti…

8b7b8c4

…on (TRY401)

update

5e56424

refactor(scrapy): align warning/exception logging in skip and cleanup…

4a5a77e

… paths

vdusek requested a review from Pijukatel June 17, 2026 12:56

Pijukatel reviewed Jun 18, 2026

View reviewed changes

vdusek requested a review from Pijukatel June 18, 2026 10:13

Address feedback

3c4a3ec

vdusek force-pushed the fix/async-thread-startup-race branch from aba09c6 to 3c4a3ec Compare June 18, 2026 11:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scrapy): async-thread startup race, shutdown lifecycle, and timeout setting#979

fix(scrapy): async-thread startup race, shutdown lifecycle, and timeout setting#979
vdusek wants to merge 13 commits into
masterfrom
fix/async-thread-startup-race

vdusek commented Jun 12, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Pijukatel Jun 18, 2026

Uh oh!

vdusek Jun 18, 2026 •

edited

Loading

Uh oh!

Pijukatel Jun 18, 2026

Uh oh!

vdusek Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vdusek commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Fixes

Error logging

Tests

Uh oh!

codecov Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Pijukatel Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

vdusek Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pijukatel Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

vdusek Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vdusek commented Jun 12, 2026 •

edited

Loading

codecov Bot commented Jun 12, 2026 •

edited

Loading

vdusek Jun 18, 2026 •

edited

Loading