Skip to content

Pr/develop to 5fa3703 fix: multiple bug fixes, security hardening, Docker improvements and MCP enhancements#1978

Open
wchy1128 wants to merge 10 commits into
unclecode:developfrom
wchy1128:pr/develop-to-5fa3703
Open

Pr/develop to 5fa3703 fix: multiple bug fixes, security hardening, Docker improvements and MCP enhancements#1978
wchy1128 wants to merge 10 commits into
unclecode:developfrom
wchy1128:pr/develop-to-5fa3703

Conversation

@wchy1128
Copy link
Copy Markdown

@wchy1128 wchy1128 commented May 24, 2026

Fix: HTTP status code not updated after JS challenge

  • async_crawler_strategy.py: Added same-domain document response tracking (_js_nav_status). When the initial goto
    returns 403/503 but a JS challenge triggers a same-domain navigation with the real status code, the final status is
    automatically corrected. Resolves crawl failures on sites like Zhihu.

Unify verbose configuration strategy

  • async_configs.py: Changed BrowserConfig.verbose and CrawlerRunConfig.verbose defaults from True to False to reduce
    noisy logs.
  • cli.py: Fixed verbose priority to CLI --verbose flag > global config > default (False).
  • config.py: Added DB_VERBOSE config option to independently control database logging.
  • async_database.py: Database logger verbose now reads from ~/.crawl4ai/global.yml (DB_VERBOSE).
  • async_webcrawler.py: When verbose is enabled, prints full BrowserConfig/CrawlerRunConfig and anti-bot block reasons.
  • Tests: Updated test_verbose_default_false to match the new default.

Fix: MCP ask node schema returning empty

  • Fixed the schema definition for the MCP ask endpoint.

MCP nodes support full config passthrough

  • mcp_bridge.py: All MCP nodes now uniformly accept BrowserConfig and CrawlerRunConfig parameters. Improved config
    loading logic.

Add MCP ask node parameter constraints

  • Enhanced schema definitions with parameter constraint descriptions.

Docker Deployment Changes (deploy/docker/)

New custom image build pipeline

  • Added Dockerfile.custom: Builds on the official image, pulls code from a GitHub fork — no local files needed.
  • Added build-custom-image.sh: Convenience build script supporting repo/branch/tag/base-image arguments.
  • Removed old Dockerfile.patch (manual patching no longer needed).

Improved debug playground + new result preview page

  • server.py: Added /playground2 route for viewing crawled HTML/Markdown results.
  • playground/index.html: Improved debug playground with --verbose and --json-ensure-ascii config options; fixed
    innerHTML causing extra link loading.
  • playground2/index.html: New result preview page (~1300 lines) for real-time crawl result inspection.

Fix: Regex deprecation warning

  • Fixed Python regex deprecation warnings on startup.

Total: 17 files changed, +1740 / -141 lines. Main focus: JS challenge status code fix, verbose config overhaul, Docker
custom image build, debug/preview page enhancements.

Test plan

  • Verify Docker build and playground pages work
  • Confirm MCP endpoints handle browser/crawler configs correctly
  • Run regression test suite (c4ai-check passed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant