Summary
RequestQueue.add_request (singular) does not retry requests that the storage client returns as unprocessed, whereas RequestQueue.add_requests (plural) does. On a best-effort backend (e.g. the Apify platform's batch_add_requests endpoint, which may legitimately return a request as unprocessed), this means a single add_request call can silently drop the request and just return None, while the same request added via add_requests would be retried and survive.
The two methods should behave consistently: a single add should be as durable as a batched add.
Current behavior
In src/crawlee/storages/_request_queue.py:
-
add_request does exactly one add_batch_of_requests([request]) call. If the response comes back with unprocessed_requests, it logs a warning and returns None — no retry:
async def add_request(self, request, *, forefront=False) -> ProcessedRequest | None:
request = self._transform_request(request)
response = await self._client.add_batch_of_requests([request], forefront=forefront)
if response.processed_requests:
return response.processed_requests[0]
if response.unprocessed_requests:
logger.warning(f'Request {request.url} was not processed ...')
...
return None
-
add_requests routes through _process_batch, which retries unprocessed requests up to 5 attempts with backoff (base_retry_wait * attempt):
async def _process_batch(self, batch, *, base_retry_wait, attempt=1, forefront=False) -> None:
max_attempts = 5
response = await self._client.add_batch_of_requests(batch, forefront=forefront)
if response.unprocessed_requests:
if attempt > max_attempts:
logger.warning(...)
else:
retry_batch = [...]
await asyncio.sleep((base_retry_wait * attempt).total_seconds())
await self._process_batch(retry_batch, base_retry_wait=base_retry_wait, attempt=attempt + 1, forefront=forefront)
...
The retry asymmetry lives entirely in RequestQueue; storage clients (including Apify's add_batch_of_requests) intentionally do a single best-effort call and report what was/wasn't processed. So this affects every backend whose add_batch_of_requests can return unprocessed requests, not just Apify.
Impact
This surfaced as an intermittent e2e failure in apify-sdk-python: a pre-reboot add_request came back unprocessed, the request was dropped, and the post-reboot fetch_next_request returned None. The workaround there was to switch the singular adds to add_requests purely to get the retry behavior (apify/apify-sdk-python#1000). That's a band-aid — the real inconsistency is here.
Proposed fix
Make add_request retry unprocessed requests, reusing the existing _process_batch machinery so the retry policy stays in one place:
- Have
_process_batch return its AddRequestsResponse (it already computes it; it currently returns None).
- Rewrite
add_request to delegate to _process_batch([request], ...) and return response.processed_requests[0], returning None only after retries are exhausted.
This preserves the existing ProcessedRequest | None return contract and the blocking semantics (for a single request, add_requests already runs the first batch synchronously — the background task is a no-op), and is safe because adds are idempotent by unique_key.
Caveat worth noting
This makes add_request blocking-with-backoff on the unhappy path (worst case a few seconds of sleeps before returning None), where today it returns immediately. That's the intended trade — durability over a fast silent failure — but it is a behavior change on the failure path.
Summary
RequestQueue.add_request(singular) does not retry requests that the storage client returns as unprocessed, whereasRequestQueue.add_requests(plural) does. On a best-effort backend (e.g. the Apify platform'sbatch_add_requestsendpoint, which may legitimately return a request as unprocessed), this means a singleadd_requestcall can silently drop the request and just returnNone, while the same request added viaadd_requestswould be retried and survive.The two methods should behave consistently: a single add should be as durable as a batched add.
Current behavior
In
src/crawlee/storages/_request_queue.py:add_requestdoes exactly oneadd_batch_of_requests([request])call. If the response comes back withunprocessed_requests, it logs a warning and returnsNone— no retry:add_requestsroutes through_process_batch, which retries unprocessed requests up to 5 attempts with backoff (base_retry_wait * attempt):The retry asymmetry lives entirely in
RequestQueue; storage clients (including Apify'sadd_batch_of_requests) intentionally do a single best-effort call and report what was/wasn't processed. So this affects every backend whoseadd_batch_of_requestscan return unprocessed requests, not just Apify.Impact
This surfaced as an intermittent e2e failure in apify-sdk-python: a pre-reboot
add_requestcame back unprocessed, the request was dropped, and the post-rebootfetch_next_requestreturnedNone. The workaround there was to switch the singular adds toadd_requestspurely to get the retry behavior (apify/apify-sdk-python#1000). That's a band-aid — the real inconsistency is here.Proposed fix
Make
add_requestretry unprocessed requests, reusing the existing_process_batchmachinery so the retry policy stays in one place:_process_batchreturn itsAddRequestsResponse(it already computes it; it currently returnsNone).add_requestto delegate to_process_batch([request], ...)and returnresponse.processed_requests[0], returningNoneonly after retries are exhausted.This preserves the existing
ProcessedRequest | Nonereturn contract and the blocking semantics (for a single request,add_requestsalready runs the first batch synchronously — the background task is a no-op), and is safe because adds are idempotent byunique_key.Caveat worth noting
This makes
add_requestblocking-with-backoff on the unhappy path (worst case a few seconds of sleeps before returningNone), where today it returns immediately. That's the intended trade — durability over a fast silent failure — but it is a behavior change on the failure path.