Problem
When a queue-group publish reaches a server with no local subscriber, NATS
picks among remote-route candidates uniformly at random
(client.go:5321-5380). In a multi-AZ cluster, ~half the publishes that need
forwarding cross AZ boundaries even when a same-AZ peer is available. For a
high-throughput firehose this is real money in AWS data-transfer charges.
Proposal
Add an opt-in cluster.prefer_matching_tags boolean. When set, queue routing
prefers remote peers whose server_tags overlap with the local server's
tags. Default off, behavior unchanged.
Priority becomes: directly-connected sub > tag-matching peer > non-matching
peer. server_tags already exists as a top-level option; this just teaches
queue routing to consult it.
Empirical impact
Kind cluster, 4 NATS pods (2 tagged az1, 2 tagged az2), one queue sub per
zone. 10 trials × 100 messages from a publisher with no local sub.
|
Cross-tag deliveries (2,000 publishes) |
nats:latest (today) |
982 / 2,000 (~49%) |
Patched, prefer_matching_tags: true |
0 / 2,000 (0%) |
Scope
Core NATS routes only — no JetStream, leafnodes, or gateways. Reuses
opts.Tags; one new boolean. Opt-in.
This is conceptually similar to Kafka's KIP-392
but applied to core NATS queue-group routing rather than consumer reads. The
broader JetStream / consumer-side angle is being discussed in #8007; this
issue is the smaller core-NATS slice.
Problem
When a queue-group publish reaches a server with no local subscriber, NATS
picks among remote-route candidates uniformly at random
(
client.go:5321-5380). In a multi-AZ cluster, ~half the publishes that needforwarding cross AZ boundaries even when a same-AZ peer is available. For a
high-throughput firehose this is real money in AWS data-transfer charges.
Proposal
Add an opt-in
cluster.prefer_matching_tagsboolean. When set, queue routingprefers remote peers whose
server_tagsoverlap with the local server'stags. Default off, behavior unchanged.
Priority becomes: directly-connected sub > tag-matching peer > non-matching
peer.
server_tagsalready exists as a top-level option; this just teachesqueue routing to consult it.
Empirical impact
Kind cluster, 4 NATS pods (2 tagged
az1, 2 taggedaz2), one queue sub perzone. 10 trials × 100 messages from a publisher with no local sub.
nats:latest(today)prefer_matching_tags: trueScope
Core NATS routes only — no JetStream, leafnodes, or gateways. Reuses
opts.Tags; one new boolean. Opt-in.This is conceptually similar to Kafka's KIP-392
but applied to core NATS queue-group routing rather than consumer reads. The
broader JetStream / consumer-side angle is being discussed in #8007; this
issue is the smaller core-NATS slice.