Skip to content

RequestQueue.add_request does not retry unprocessed requests (inconsistent with add_requests) #1975

@vdusek

Description

@vdusek

Summary

RequestQueue.add_request (singular) does not retry requests that the storage client returns as unprocessed, whereas RequestQueue.add_requests (plural) does. On a best-effort backend (e.g. the Apify platform's batch_add_requests endpoint, which may legitimately return a request as unprocessed), this means a single add_request call can silently drop the request and just return None, while the same request added via add_requests would be retried and survive.

The two methods should behave consistently: a single add should be as durable as a batched add.

Current behavior

In src/crawlee/storages/_request_queue.py:

  • add_request does exactly one add_batch_of_requests([request]) call. If the response comes back with unprocessed_requests, it logs a warning and returns None — no retry:

    async def add_request(self, request, *, forefront=False) -> ProcessedRequest | None:
        request = self._transform_request(request)
        response = await self._client.add_batch_of_requests([request], forefront=forefront)
        if response.processed_requests:
            return response.processed_requests[0]
        if response.unprocessed_requests:
            logger.warning(f'Request {request.url} was not processed ...')
        ...
        return None
  • add_requests routes through _process_batch, which retries unprocessed requests up to 5 attempts with backoff (base_retry_wait * attempt):

    async def _process_batch(self, batch, *, base_retry_wait, attempt=1, forefront=False) -> None:
        max_attempts = 5
        response = await self._client.add_batch_of_requests(batch, forefront=forefront)
        if response.unprocessed_requests:
            if attempt > max_attempts:
                logger.warning(...)
            else:
                retry_batch = [...]
                await asyncio.sleep((base_retry_wait * attempt).total_seconds())
                await self._process_batch(retry_batch, base_retry_wait=base_retry_wait, attempt=attempt + 1, forefront=forefront)
        ...

The retry asymmetry lives entirely in RequestQueue; storage clients (including Apify's add_batch_of_requests) intentionally do a single best-effort call and report what was/wasn't processed. So this affects every backend whose add_batch_of_requests can return unprocessed requests, not just Apify.

Impact

This surfaced as an intermittent e2e failure in apify-sdk-python: a pre-reboot add_request came back unprocessed, the request was dropped, and the post-reboot fetch_next_request returned None. The workaround there was to switch the singular adds to add_requests purely to get the retry behavior (apify/apify-sdk-python#1000). That's a band-aid — the real inconsistency is here.

Proposed fix

Make add_request retry unprocessed requests, reusing the existing _process_batch machinery so the retry policy stays in one place:

  • Have _process_batch return its AddRequestsResponse (it already computes it; it currently returns None).
  • Rewrite add_request to delegate to _process_batch([request], ...) and return response.processed_requests[0], returning None only after retries are exhausted.

This preserves the existing ProcessedRequest | None return contract and the blocking semantics (for a single request, add_requests already runs the first batch synchronously — the background task is a no-op), and is safe because adds are idempotent by unique_key.

Caveat worth noting

This makes add_request blocking-with-backoff on the unhappy path (worst case a few seconds of sleeps before returning None), where today it returns immediately. That's the intended trade — durability over a fast silent failure — but it is a behavior change on the failure path.

Metadata

Metadata

Assignees

Labels

t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions