Skip to content

-sNODERAWSOCKETS DNS support#27162

Open
guybedford wants to merge 7 commits into
emscripten-core:mainfrom
guybedford:nodenet-dns
Open

-sNODERAWSOCKETS DNS support#27162
guybedford wants to merge 7 commits into
emscripten-core:mainfrom
guybedford:nodenet-dns

Conversation

@guybedford

@guybedford guybedford commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

This is a follow-on to #27080 and is based to that PR. See just the last commit of this PR for the exact diff.

This adds support for using Node.js's DNS resolver through getaddrinfo() and an async counterpart, supporting both sync and JSPI modes under -sNODERAWSOCKETS.

In order to support non-JSPI builds we introduce a new async DNS syscall, emscripten_dns_lookup_async(node, service, hints) as a direct async conversion of getaddrinfo, which can be used as a system integration point for async DNS resoultion in Emscripten. It returns a pollable socket fd that can either be polled for completion or a listener can be attached via the existing socket callback emscripten_set_socket_message_callback. On completion, emscripten_dns_lookup_result(fd, **addrinfo) can be used to read the result. The direct getaddrinfo form shares a cache with the async form so that the sync resolution can be "pre-warmed" by async resolution in environments that aren't able to upgrade to the async syscall. If no sync DNS is available, EAI_AGAIN is returned.

For example, when integrating this API with a runtime like Rust Tokio, it can then be possible to support full client connect lifecycles without needing JSPI by specializing the emscripten target to this async DNS API, while still also supporting JSPI builds.

Under the JSPI mode, the above continue to work, but getaddrinfo can also do full DNS resolution asynchronously per standard semantics, and does not use an internal cache at all.

DNS lookups first check the /etc/hosts file, then the internal cache (for non-JSPI builds), before doing a full async call via Node.js's DNS module.

Adds a new NODERAWSOCKETS setting that backs the POSIX sockets API directly
with Node.js's node:net and node:dgram, giving real, non-blocking TCP and UDP
sockets without WebSockets, an external proxy process, or pthreads. This is the
sockets counterpart to NODERAWFS: where NODERAWFS gives direct access to the
host filesystem, this gives direct access to host sockets.

Unlike PROXY_POSIX_SOCKETS this is single-threaded and event-driven: socket
readiness is delivered through the same emscripten_set_socket_*_callback hooks
the default WebSocket backend uses, so it drops into existing readiness reactors
unchanged. Under -pthread the socket syscalls are proxied to the main thread, so
the backend always runs on node's event loop and a SharedArrayBuffer heap is
safe.

Supported:

* TCP clients: connect, send, recv, shutdown and close, with non-blocking
  semantics and backpressure (send reports EAGAIN rather than buffering
  unboundedly).
* TCP servers: bind, listen, accept, getsockname/getpeername.
* UDP: bind, connect, sendto/recvfrom, with connected-peer filtering.
* IPv4 and IPv6 (AF_INET6): TCP and UDP over v6, including IPV6_V6ONLY.
* get/setsockopt: SO_ERROR, SO_KEEPALIVE and TCP_KEEPIDLE, TCP_NODELAY,
  SO_RCVBUF/SO_SNDBUF, SO_BROADCAST, IP_TTL, SO_REUSEPORT and IPV6_V6ONLY.
  Options are mirrored to a cache (the getsockopt source of truth) and projected
  onto the live socket; we only report options we can actually honor (e.g.
  SO_REUSEADDR reads back as 1 since libuv forces it on, and IPV6_V6ONLY returns
  EINVAL if changed after bind).

Binding is eager and synchronous, so a conflict surfaces as EADDRINUSE at bind()
and getsockname() reports the kernel-assigned ephemeral port immediately - there
is no deferred-bind or lazy-handle promotion. A bound socket is a role-neutral
handle, adopted as-is by listen() (server.listen) or connect() (net.Socket), and
released by close() only if it was never adopted. Bind-time options (ipv6Only,
reusePort) are passed to the handle at construction. The bind primitive is
selected once per capability:

* the public, synchronous net.BoundHandle (and dgram bindSync/connectSync) when
  the Node.js runtime provides them; and
* the private tcp_wrap/udp_wrap bindings as a fallback on Node.js versions that
  do not (bind6/send6 for IPv6).

Details:

* new node backend in src/lib/libsockfs_node.js, pulled in only under
  -sNODERAWSOCKETS, implementing the sock_ops contract
* __syscall_setsockopt and __syscall_shutdown now live in JS, routing to the
  backend under NODERAWSOCKETS (else reporting the option/feature as
  unsupported), avoiding a libstubs variation
* tests under test/sockets exercise TCP echo, server accept/echo (including
  listen-without-bind autobind), client source-port bind plus synchronous
  EADDRINUSE, client semantics (EISCONN, half-close, EPIPE), backpressure,
  connection refused, UDP echo/connect, and IPv6 TCP/UDP over ::1 (including
  IPV6_V6ONLY before/after bind); all build and run natively against the host
  stack and run under node, including PROXY_TO_PTHREAD variants
Adds a new NODERAWSOCKETS setting that backs the POSIX sockets API directly
with Node.js's node:net and node:dgram, giving real, non-blocking TCP and UDP
sockets without WebSockets, an external proxy process, or pthreads. This is the
sockets counterpart to NODERAWFS: where NODERAWFS gives direct access to the
host filesystem, this gives direct access to host sockets.

Unlike PROXY_POSIX_SOCKETS this is single-threaded and event-driven: socket
readiness is delivered through the same emscripten_set_socket_*_callback hooks
the default WebSocket backend uses, so it drops into existing readiness reactors
unchanged. Under -pthread the socket syscalls are proxied to the main thread, so
the backend always runs on node's event loop and a SharedArrayBuffer heap is
safe.

Supported:

* TCP clients: connect, send, recv, shutdown and close, with non-blocking
  semantics and backpressure (send reports EAGAIN rather than buffering
  unboundedly).
* TCP servers: bind, listen, accept, getsockname/getpeername.
* UDP: bind, connect, sendto/recvfrom, with connected-peer filtering.
* IPv4 and IPv6 (AF_INET6): TCP and UDP over v6, including IPV6_V6ONLY.
* get/setsockopt: SO_ERROR, SO_KEEPALIVE and TCP_KEEPIDLE, TCP_NODELAY,
  SO_RCVBUF/SO_SNDBUF, SO_BROADCAST, IP_TTL, SO_REUSEPORT and IPV6_V6ONLY.
  Options are mirrored to a cache (the getsockopt source of truth) and projected
  onto the live socket; we only report options we can actually honor (e.g.
  SO_REUSEADDR reads back as 1 since libuv forces it on, and IPV6_V6ONLY returns
  EINVAL if changed after bind).

Binding is eager and synchronous, so a conflict surfaces as EADDRINUSE at bind()
and getsockname() reports the kernel-assigned ephemeral port immediately - there
is no deferred-bind or lazy-handle promotion. A bound socket is a role-neutral
handle, adopted as-is by listen() (server.listen) or connect() (net.Socket), and
released by close() only if it was never adopted. Bind-time options (ipv6Only,
reusePort) are passed to the handle at construction. The bind primitive is
selected once per capability:

* the public, synchronous net.BoundHandle (and dgram bindSync/connectSync) when
  the Node.js runtime provides them; and
* the private tcp_wrap/udp_wrap bindings as a fallback on Node.js versions that
  do not (bind6/send6 for IPv6).

Details:

* new node backend in src/lib/libsockfs_node.js, pulled in only under
  -sNODERAWSOCKETS, implementing the sock_ops contract
* __syscall_setsockopt and __syscall_shutdown now live in JS, routing to the
  backend under NODERAWSOCKETS (else reporting the option/feature as
  unsupported), avoiding a libstubs variation
* tests under test/sockets exercise TCP echo, server accept/echo (including
  listen-without-bind autobind), client source-port bind plus synchronous
  EADDRINUSE, client semantics (EISCONN, half-close, EPIPE), backpressure,
  connection refused, UDP echo/connect, and IPv6 TCP/UDP over ::1 (including
  IPV6_V6ONLY before/after bind); all build and run natively against the host
  stack and run under node, including PROXY_TO_PTHREAD variants
Under -sNODERAWSOCKETS getaddrinfo() previously fabricated fake addresses via
DNS.lookup_name. This adds real resolution backed by node:dns, plus a general
asynchronous getaddrinfo so clients can resolve names without blocking.

getaddrinfo() now resolves numeric addresses and /etc/hosts entries (read fresh
through emscripten's FS) synchronously, and returns a full addrinfo linked list
(one node per resolved address) rather than a single entry. For a real hostname:

- without JSPI it returns EAI_AGAIN (no synchronous DNS); resolve it via the
  async API below and read the result
- under JSPI it suspends the wasm stack on the real node:dns lookup and returns
  the resolved addresses directly (gated on ASYNCIFY == 2; non-JSPI unchanged)

The async API (available in all builds, not just -sNODERAWSOCKETS):

- emscripten_dns_lookup_async(node, service, hint) takes the same inputs as
  getaddrinfo() and returns a pollable fd that becomes readable - and delivers
  the emscripten_set_socket_message_callback - when resolution completes
- emscripten_dns_lookup_result(fd, struct addrinfo **res) reads the outcome:
  0 on success, writing the addrinfo list head to *res (freed with freeaddrinfo,
  as for getaddrinfo), or an EAI_* code on failure
- with -sNODERAWSOCKETS a hostname is resolved via node:dns; otherwise (and for
  numeric/ /etc/hosts names) resolution is synchronous and the fd is simply
  readable on the next turn, so integration code need not branch on the backend

Memory is minted only when the caller takes the result, so closing the fd
without reading leaks nothing; the whole addrinfo chain is owned by the caller
and freed uniformly by freeaddrinfo.

Internally getaddrinfo is split into reusable stages - parse (getAddrInfo),
resolve (resolveAddrInfo, node:dns), and mint (writeAddrInfoList) - threading a
single descriptor through, which both the sync and async entry points share.

- freeaddrinfo now walks and frees the whole ai_next chain (previously only the
  head node + its ai_addr)
- adds EAI_AGAIN to the generated struct info

Tested with test_dns_async (static /etc/hosts, multi-address list, async
localhost), test_dns_callback (completion via the socket message callback),
test_dns_async_net (real hostname over the network), test_dns_async_default
(the async API without -sNODERAWSOCKETS), and test_dns_jspi (JSPI blocking
resolution), including -pthread/PROXY_TO_PTHREAD variants.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant