Skip to content

Fix: AsyncWebCrawler blocks event loop with scrap, use ascrap instead.#1986

Open
jtgi wants to merge 1 commit into
unclecode:developfrom
jtgi:bugfix/use-ascrap-in-async-webcrawler
Open

Fix: AsyncWebCrawler blocks event loop with scrap, use ascrap instead.#1986
jtgi wants to merge 1 commit into
unclecode:developfrom
jtgi:bugfix/use-ascrap-in-async-webcrawler

Conversation

@jtgi
Copy link
Copy Markdown

@jtgi jtgi commented May 26, 2026

Summary

AsyncWebCrawler.arun() uses synchronous scraping (scrap) which blocks the event loop. It should use asynchronous ascrap.

Discovered during production capacity issue.

Testing

aprocess_html() was calling the synchronous scrap() method directly
on the event loop, blocking it during lxml DOM parsing and HTML
processing. All ContentScrapingStrategy subclasses already implement
ascrap() via asyncio.to_thread(). Switch to using it.
@jtgi jtgi changed the title Fix: Use ascrap() in AsyncWebCrawler.aprocess_html() to avoid blocking event loop Fix: AsyncWebCrawler.aprocess_html() blocks event loop with scrap, use ascrap instead. May 26, 2026
@jtgi jtgi changed the title Fix: AsyncWebCrawler.aprocess_html() blocks event loop with scrap, use ascrap instead. Fix: AsyncWebCrawler blocks event loop with scrap, use ascrap instead. May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant