Skip to content

Reduce cross-AZ traffic for core NATS queue groups via tag-aware routing #8072

@bobo

Description

@bobo

Problem

When a queue-group publish reaches a server with no local subscriber, NATS
picks among remote-route candidates uniformly at random
(client.go:5321-5380). In a multi-AZ cluster, ~half the publishes that need
forwarding cross AZ boundaries even when a same-AZ peer is available. For a
high-throughput firehose this is real money in AWS data-transfer charges.

Proposal

Add an opt-in cluster.prefer_matching_tags boolean. When set, queue routing
prefers remote peers whose server_tags overlap with the local server's
tags. Default off, behavior unchanged.

Priority becomes: directly-connected sub > tag-matching peer > non-matching
peer. server_tags already exists as a top-level option; this just teaches
queue routing to consult it.

Empirical impact

Kind cluster, 4 NATS pods (2 tagged az1, 2 tagged az2), one queue sub per
zone. 10 trials × 100 messages from a publisher with no local sub.

Cross-tag deliveries (2,000 publishes)
nats:latest (today) 982 / 2,000 (~49%)
Patched, prefer_matching_tags: true 0 / 2,000 (0%)

Scope

Core NATS routes only — no JetStream, leafnodes, or gateways. Reuses
opts.Tags; one new boolean. Opt-in.

This is conceptually similar to Kafka's KIP-392
but applied to core NATS queue-group routing rather than consumer reads. The
broader JetStream / consumer-side angle is being discussed in #8007; this
issue is the smaller core-NATS slice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions