Implement TABLETS_ROUTING_V2 by dawmd · Pull Request #913 · scylladb/python-driver

dawmd · 2026-06-26T16:55:06Z

Placeholder for the time being. I'll edit the cover letter tomorrow.

The PR was tested against scylladb/scylladb#30291.

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

Add per-connection negotiation of the TABLETS_ROUTING_V2 extension, the successor to TABLETS_ROUTING_V1. When the server advertises it in the SUPPORTED response, the driver echoes it back during STARTUP to opt in; a driver that negotiates v2 does not negotiate v1. While the feature is experimental the wire name carries the `_EXPERIMENTAL` suffix (TABLETS_ROUTING_V2_EXPERIMENTAL), and the server only advertises it when started with the `strongly-consistent-tables` experimental feature enabled. Also add the optional trailing tablet_version_block byte to the EXECUTE message body, written only when set, so later commits can carry the cached tablet version to the server.

Store the server-provided 64-bit tablet_version on each cached Tablet (None until learned) and add helpers to encode it into the one-byte tablet_version_block exchanged on the wire: * choose_tablet_version_block() packs a randomly chosen block index in the high nibble and that block's value in the low nibble, matching the server's locator::compare_tablet_version_block layout. Blocks are indexed from the least significant bits, so block i covers bits [i*4, i*4 + 4) of the version. A random index avoids any shared mutable counter on the hot path while still probing every nibble often enough to detect a server-side version change quickly. * random_tablet_version_block() returns a random byte for cold start, when no version is cached yet.

Add KeyspaceMetadata.strongly_consistent, derived from the per-keyspace `consistency` option in system_schema.scylla_keyspaces. A keyspace is strongly consistent when its consistency is `local` or `global` (null/`eventual` means eventually consistent). The lookup is cached per schema refresh and degrades to "eventually consistent" on non-Scylla clusters or older Scylla versions that lack the table or column. This flag lets the leader-aware routing logic tell which tablet tables actually have a Raft leader.

With TABLETS_ROUTING_V2 the server returns, on a tablet_version mismatch, the tablet's replica set with the Raft leader first plus the new tablet_version. Use this to route strongly-consistent reads and writes straight to the leader: * Every EXECUTE on a V2 connection carries a tablet_version_block computed from the cached version (or a random byte on cold start and for non-single-partition requests). The block is attached per connection, keyed off the connection that actually serves the request, so a rolling upgrade that mixes v1/v2 connections never desyncs the frame. * On the response, the routing payload is parsed according to what the serving connection negotiated; the v2 tuple additionally carries the tablet_version, which is stored back on the tablet. * TokenAwarePolicy yields the leader (the first replica) first, but only for strongly-consistent keyspaces -- a tablet_version is assigned to eventually-consistent tablet tables too and must not be mistaken for a leader hint. To keep this hot path cheap, the ring token is memoized once per statement (Statement.routing_token) and reused by the load balancing policy, the shard selection in the pool, and the tablet_version_block computation, instead of re-hashing the routing key three times.

Cover the end-to-end behaviour against a live ScyllaDB started with the `strongly-consistent-tables` experimental feature: v2 negotiation, payload-driven cache population, the tablet_version_block matching rules (no payload on a matching block, exactly one matching value per index, precedence over v1), the protocol error on a missing block, and leader-aware routing targeting the Raft leader for a strongly-consistent keyspace.

coderabbitai · 2026-06-26T16:55:22Z

📝 Walkthrough

Walkthrough

The driver adds tablets routing v2 support across feature negotiation, tablet version block generation, execute message framing, and shard selection. Response handling now parses tablets-routing-v1/v2 payloads into cluster tablet metadata, and schema parsing marks keyspaces as strongly consistent from Scylla system tables. Token-aware routing now uses tablet replicas and prefers the tablet leader for strongly consistent keyspaces.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ResponseFuture
  participant Statement
  participant HostConnection
  participant ExecuteMessage
  participant ScyllaServer
  participant ClusterMetadata

  Client->>ResponseFuture: execute(query)
  ResponseFuture->>Statement: routing_token()
  ResponseFuture->>ResponseFuture: _compute_tablet_version_block(query)
  ResponseFuture->>HostConnection: borrow_connection(...)
  HostConnection-->>ResponseFuture: connection
  ResponseFuture->>ExecuteMessage: set tablet_version_block
  ResponseFuture->>ScyllaServer: send EXECUTE
  ScyllaServer-->>ResponseFuture: response + tablets-routing payload
  ResponseFuture->>ClusterMetadata: register tablets

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is mostly a placeholder and does not summarize the changes or complete the required checklist.	Replace the placeholder with a real PR summary, explain why the changes are needed, and complete the checklist items relevant to this patch.
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.69% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise and accurately summarizes the main change: adding TABLETS_ROUTING_V2 support.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

Copilot

Pull request overview

Implements Scylla TABLETS_ROUTING_V2 negotiation and request/response handling in the driver, including encoding the per-request tablet_version_block, parsing the new tablets-routing-v2 custom payload (with tablet_version), and adding leader-aware routing behavior for strongly-consistent keyspaces.

Changes:

Add TABLETS_ROUTING_V2 protocol feature negotiation and ensure V2 subsumes V1 in STARTUP options.
Attach a per-request tablet_version_block to EXECUTE messages on V2-negotiated connections; parse and cache V2 tablet routing payloads including tablet_version.
Introduce strongly-consistent keyspace detection (Scylla-only) and leader-first routing in TokenAwarePolicy; add unit and integration coverage.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/unit/test_tablets.py	Adds unit tests for `tablet_version_block` encoding and for storing `tablet_version` on `Tablet`.
tests/unit/test_protocol_features.py	Adds negotiation tests ensuring V2 is preferred over V1 in STARTUP options.
tests/unit/test_policies.py	Adds unit tests for leader-aware routing behavior (leader-first, fallback, and non-SC cases).
tests/integration/standard/test_tablets_routing_v2.py	Adds opt-in end-to-end tests validating negotiation, payload decoding, and wire behavior against a V2-capable Scylla cluster.
cassandra/tablets.py	Adds `tablet_version_block` helpers and extends `Tablet` to store `tablet_version`.
cassandra/query.py	Adds memoization for routing-key token computation via `Statement.routing_token()`.
cassandra/protocol.py	Extends `ExecuteMessage` to optionally append `tablet_version_block` byte when present.
cassandra/protocol_features.py	Adds TABLETS_ROUTING_V2 constant and negotiation logic; makes V2 mutually exclusive with V1 in STARTUP.
cassandra/pool.py	Adds per-pool V2 capability detection and reuses memoized routing token to avoid repeated hashing.
cassandra/policies.py	Adds leader-first routing for strongly-consistent keyspaces when tablet replicas are available.
cassandra/metadata.py	Adds `KeyspaceMetadata.strongly_consistent` and populates it from Scylla `system_schema.scylla_keyspaces`.
cassandra/cluster.py	Computes/attaches `tablet_version_block` per connection and parses V2 routing payloads based on the serving connection’s negotiated features.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        token = self._routing_token
+        if token is None:
+            token = token_class.from_key(routing_key)
+            self._routing_token = token
+        return token


+    @property
+    def tablets_routing_v2(self):
+        return any(c.features.tablets_routing_v2 for c in self._connections.values())


+            if isinstance(message, ExecuteMessage):
+                # Attach the tablet_version_block only when the *specific connection*
+                # we are about to send on negotiated TABLETS_ROUTING_V2. The server
+                # reads the trailing tablet_version_block byte only on connections
+                # that negotiated V2 (gated on the cluster-wide feature), so keying
+                # off the borrowed connection -- which is already in hand here, since
+                # borrow_connection() ran above -- is both necessary and sufficient:
+                #   * a V2 connection always gets the block, even if this pool was
+                #     created (and any cached flag latched) before the cluster
+                #     feature was enabled, e.g. mid rolling-upgrade;
+                #   * a non-V2 connection never gets it, even if a sibling shard
+                #     connection in the same pool already negotiated V2 -- which can
+                #     happen transiently while connections opened before and after
+                #     the feature flip coexist. Attaching the block to a non-V2
+                #     connection would leave an unread trailing byte and desync the
+                #     frame, so a pool-level flag cannot get this right regardless of
+                #     how it is latched.
+                if connection.features.tablets_routing_v2:
+                    message.tablet_version_block = self.session._compute_tablet_version_block(self.query)
+                else:
+                    message.tablet_version_block = None
+


+        """
+        Compute the tablet_version_block byte for a BoundStatement.
+        Returns an int in [0, 255] or None if V2 should not be used.
+        """


+                if connection.features.tablets_routing_v2:
+                    message.tablet_version_block = self.session._compute_tablet_version_block(self.query)
+                else:
+                    message.tablet_version_block = None


coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

tests/unit/test_policies.py (1)
1043-1082: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Optional: test name/docstring don't match what's actually exercised.

This test asserts the "no tablet_version (V1)" path, but the production code (make_query_plan) never reads tablet.tablet_version; leader-first is gated solely on ks_meta.strongly_consistent, which is set to False here. So the test passes because of the strongly_consistent=False flag, not because tablet_version=None. Consider renaming/reframing it (or setting strongly_consistent=True with tablet_version=None) so the test actually guards the behavior its name implies; otherwise it overlaps with test_no_leader_routing_for_eventually_consistent_keyspace.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_policies.py` around lines 1043 - 1082, The test name and
docstring in test_no_leader_routing_without_tablet_version do not match the
behavior actually being exercised, because make_query_plan only gates
leader-first routing on keyspace metadata strong consistency, not
Tablet.tablet_version. Update the test so it truly covers the intended case by
either renaming/reframing it to reflect strongly_consistent=False, or by setting
cluster.metadata.keyspaces['ks'].strongly_consistent to True while keeping
tablet_version=None to verify the V1 tablet path; keep the assertions aligned
with the TokenAwarePolicy routing behavior.
cassandra/policies.py (1)
510-514: 🚀 Performance & Scalability | 🔵 Trivial

Optional: Cache the child query plan to prevent double invocation and potential inconsistency.

child.make_query_plan(keyspace, query) is invoked at line 512 to filter replicas and is called again at line 551 to build the final plan. If the child policy is stateful (e.g., RoundRobinPolicy or DCAwareRoundRobinPolicy), the second invocation may advance internal pointers and return a different host sequence, causing the filtering logic to diverge from the execution order and introducing unnecessary overhead.
♻️ Suggested approach

Materialize the child plan once and reuse it:
-        if tablet is not None:
-            replicas_mapped = set(map(lambda r: r[0], tablet.replicas))
-            child_plan = child.make_query_plan(keyspace, query)
-
-            replicas = [host for host in child_plan if host.host_id in replicas_mapped]
+        child_plan = list(child.make_query_plan(keyspace, query))
+        if tablet is not None:
+            replicas_mapped = set(map(lambda r: r[0], tablet.replicas))
+            replicas = [host for host in child_plan if host.host_id in replicas_mapped]
Update the subsequent loop (line 551) to iterate over the cached child_plan instead of re-invoking child.make_query_plan(...).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cassandra/policies.py` around lines 510 - 514, Cache the result of
child.make_query_plan(keyspace, query) in the policy logic so it is only
evaluated once, then reuse that same child_plan for both the replica filtering
in the tablet block and the final plan construction loop. Update the query
planning flow in the policy method that handles tablet replicas to iterate over
the cached child_plan instead of calling child.make_query_plan(...) a second
time, which avoids stateful policy divergence and redundant work.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cassandra/cluster.py`:
- Around line 5055-5075: The ExecuteMessage tablet_version_block is only being
set in the pooled connection path, so control-connection fallback can miss the
mandatory V2 trailing byte. Update _query_control_connection() in
cassandra/cluster.py to use the same tablets_routing_v2 check and
_compute_tablet_version_block(self.query) logic as the ExecuteMessage handling
in the main send path, placing it after self._connection = connection and before
connection.send_msg(...) so V2 control connections always include the block and
non-V2 ones do not.
- Around line 5212-5231: The tablet-routing cache is using only
self.query.keyspace when adding tablets, which can store entries under a None
key and miss cache hits for session-level keyspaces. Update the tablet handling
in ResponseFuture to use the effective keyspace consistently, matching
_compute_tablet_version_block() by falling back to self.keyspace when
self.query.keyspace is unset, and then pass that resolved keyspace into
metadata._tablets.add_tablet().

In `@cassandra/metadata.py`:
- Around line 2658-2660: The fallback in metadata schema refresh is too broad
because the except block in the system_schema.scylla_keyspaces read path catches
every Exception, which masks unexpected failures. Narrow the handler around the
code in metadata.py that logs “Could not read system_schema.scylla_keyspaces” so
it only catches the specific expected driver/server schema-unavailable error(s),
and let other refresh errors propagate. Keep the existing debug log and exc_info
behavior for the expected case, but do not use a blanket Exception catch in this
branch.

In `@cassandra/pool.py`:
- Around line 451-453: The tablets_routing_v2 property in Pool should not
iterate the live _connections view directly, because concurrent
replacement/shutdown can mutate it and closed connections can incorrectly keep
V2 enabled. Snapshot the current connections first, then compute the flag only
from still-open/live connections before checking each connection’s
features.tablets_routing_v2.

In `@cassandra/query.py`:
- Line 278: The cached routing token in Statement is being reused without
considering token_class, which can cause the wrong ring/token to be used across
sessions or clusters. Update the token cache logic in Statement-related routing
code so _routing_token is keyed by token_class as well, and make the
lookup/reuse path in the affected routing methods distinguish tokens by the
partitioner class used. Ensure the changes cover the Statement fields and the
routing/token selection flow that currently reads and writes _routing_token.

---

Nitpick comments:
In `@cassandra/policies.py`:
- Around line 510-514: Cache the result of child.make_query_plan(keyspace,
query) in the policy logic so it is only evaluated once, then reuse that same
child_plan for both the replica filtering in the tablet block and the final plan
construction loop. Update the query planning flow in the policy method that
handles tablet replicas to iterate over the cached child_plan instead of calling
child.make_query_plan(...) a second time, which avoids stateful policy
divergence and redundant work.

In `@tests/unit/test_policies.py`:
- Around line 1043-1082: The test name and docstring in
test_no_leader_routing_without_tablet_version do not match the behavior actually
being exercised, because make_query_plan only gates leader-first routing on
keyspace metadata strong consistency, not Tablet.tablet_version. Update the test
so it truly covers the intended case by either renaming/reframing it to reflect
strongly_consistent=False, or by setting
cluster.metadata.keyspaces['ks'].strongly_consistent to True while keeping
tablet_version=None to verify the V1 tablet path; keep the assertions aligned
with the TokenAwarePolicy routing behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 587d3541-27be-4132-8dda-c90e66cff85c

📥 Commits

Reviewing files that changed from the base of the PR and between c1bfd54 and 7cbb8a9.

📒 Files selected for processing (12)

cassandra/cluster.py
cassandra/metadata.py
cassandra/policies.py
cassandra/pool.py
cassandra/protocol.py
cassandra/protocol_features.py
cassandra/query.py
cassandra/tablets.py
tests/integration/standard/test_tablets_routing_v2.py
tests/unit/test_policies.py
tests/unit/test_protocol_features.py
tests/unit/test_tablets.py

coderabbitai · 2026-06-26T17:04:03Z

+            if isinstance(message, ExecuteMessage):
+                # Attach the tablet_version_block only when the *specific connection*
+                # we are about to send on negotiated TABLETS_ROUTING_V2. The server
+                # reads the trailing tablet_version_block byte only on connections
+                # that negotiated V2 (gated on the cluster-wide feature), so keying
+                # off the borrowed connection -- which is already in hand here, since
+                # borrow_connection() ran above -- is both necessary and sufficient:
+                #   * a V2 connection always gets the block, even if this pool was
+                #     created (and any cached flag latched) before the cluster
+                #     feature was enabled, e.g. mid rolling-upgrade;
+                #   * a non-V2 connection never gets it, even if a sibling shard
+                #     connection in the same pool already negotiated V2 -- which can
+                #     happen transiently while connections opened before and after
+                #     the feature flip coexist. Attaching the block to a non-V2
+                #     connection would leave an unread trailing byte and desync the
+                #     frame, so a pool-level flag cannot get this right regardless of
+                #     how it is latched.
+                if connection.features.tablets_routing_v2:
+                    message.tablet_version_block = self.session._compute_tablet_version_block(self.query)
+                else:
+                    message.tablet_version_block = None


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Attach the V2 block on control-connection fallback too.

Line 5072 handles pooled connections, but _query_control_connection() can also send an ExecuteMessage; on a V2 control connection that path omits the mandatory trailing byte and can trigger a protocol error.

Proposed direction

+ def _set_tablet_version_block_for_connection(self, message, connection): + if not isinstance(message, ExecuteMessage): + return + if connection.features.tablets_routing_v2: + message.tablet_version_block = self.session._compute_tablet_version_block(self.query) + else: + message.tablet_version_block = None + def _query(self, host, message=None, cb=None): ... - if isinstance(message, ExecuteMessage): - ... - if connection.features.tablets_routing_v2: - message.tablet_version_block = self.session._compute_tablet_version_block(self.query) - else: - message.tablet_version_block = None + self._set_tablet_version_block_for_connection(message, connection)

Then call the same helper in _query_control_connection() after self._connection = connection and before connection.send_msg(...).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cassandra/cluster.py` around lines 5055 - 5075, The ExecuteMessage tablet_version_block is only being set in the pooled connection path, so control-connection fallback can miss the mandatory V2 trailing byte. Update _query_control_connection() in cassandra/cluster.py to use the same tablets_routing_v2 check and _compute_tablet_version_block(self.query) logic as the ExecuteMessage handling in the main send path, placing it after self._connection = connection and before connection.send_msg(...) so V2 control connections always include the block and non-V2 ones do not.

coderabbitai · 2026-06-26T17:04:03Z

+                    keyspace = self.query.keyspace
+                    table = self.query.table
+                    if tablet:
+                        self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet)
+                elif connection.features.tablets_routing_v1 and 'tablets-routing-v1' in self._custom_payload:
+                    protocol = self.session.cluster.protocol_version
+                    info = self._custom_payload.get('tablets-routing-v1')
+                    ctype = ResponseFuture._TABLET_ROUTING_CTYPE
+                    if ctype is None:
+                        ctype = types.lookup_casstype('TupleType(LongType, LongType, ListType(TupleType(UUIDType, Int32Type)))')
+                        ResponseFuture._TABLET_ROUTING_CTYPE = ctype
+                    tablet_routing_info = ctype.from_binary(info, protocol)
+                    first_token = tablet_routing_info[0]
+                    last_token = tablet_routing_info[1]
+                    tablet_replicas = tablet_routing_info[2]
+                    tablet = Tablet.from_row(first_token, last_token, tablet_replicas)
+                    keyspace = self.query.keyspace
+                    table = self.query.table
+                    if tablet:
+                        self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet)


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Cache routing payloads under the effective keyspace.

Line 5212 and Line 5228 use only self.query.keyspace, but _compute_tablet_version_block() looks up tablets with query.keyspace or self.keyspace. Prepared statements executed in a session keyspace can therefore cache under (None, table) and miss the cache on every later request.

Proposed fix

- keyspace = self.query.keyspace + keyspace = self.query.keyspace or self.session.keyspace table = self.query.table - if tablet: + if tablet and keyspace and table: self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet) ... - keyspace = self.query.keyspace + keyspace = self.query.keyspace or self.session.keyspace table = self.query.table - if tablet: + if tablet and keyspace and table: self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

keyspace = self.query.keyspace

table = self.query.table

if tablet:

self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet)

elif connection.features.tablets_routing_v1 and 'tablets-routing-v1' in self._custom_payload:

protocol = self.session.cluster.protocol_version

info = self._custom_payload.get('tablets-routing-v1')

ctype = ResponseFuture._TABLET_ROUTING_CTYPE

if ctype is None:

ctype = types.lookup_casstype('TupleType(LongType, LongType, ListType(TupleType(UUIDType, Int32Type)))')

ResponseFuture._TABLET_ROUTING_CTYPE = ctype

tablet_routing_info = ctype.from_binary(info, protocol)

first_token = tablet_routing_info[0]

last_token = tablet_routing_info[1]

tablet_replicas = tablet_routing_info[2]

tablet = Tablet.from_row(first_token, last_token, tablet_replicas)

keyspace = self.query.keyspace

table = self.query.table

if tablet:

self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet)

keyspace = self.query.keyspace or self.session.keyspace

table = self.query.table

if tablet and keyspace and table:

self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet)

elif connection.features.tablets_routing_v1 and 'tablets-routing-v1' in self._custom_payload:

protocol = self.session.cluster.protocol_version

info = self._custom_payload.get('tablets-routing-v1')

ctype = ResponseFuture._TABLET_ROUTING_CTYPE

if ctype is None:

ctype = types.lookup_casstype('TupleType(LongType, LongType, ListType(TupleType(UUIDType, Int32Type)))')

ResponseFuture._TABLET_ROUTING_CTYPE = ctype

tablet_routing_info = ctype.from_binary(info, protocol)

first_token = tablet_routing_info[0]

last_token = tablet_routing_info[1]

tablet_replicas = tablet_routing_info[2]

tablet = Tablet.from_row(first_token, last_token, tablet_replicas)

keyspace = self.query.keyspace or self.session.keyspace

table = self.query.table

if tablet and keyspace and table:

self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cassandra/cluster.py` around lines 5212 - 5231, The tablet-routing cache is using only self.query.keyspace when adding tablets, which can store entries under a None key and miss cache hits for session-level keyspaces. Update the tablet handling in ResponseFuture to use the effective keyspace consistently, matching _compute_tablet_version_block() by falling back to self.keyspace when self.query.keyspace is unset, and then pass that resolved keyspace into metadata._tablets.add_tablet().

coderabbitai · 2026-06-26T17:04:03Z

+        except Exception:
+            log.debug("Could not read system_schema.scylla_keyspaces; treating all "
+                      "keyspaces as eventually consistent", exc_info=True)


🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Narrow the Scylla-keyspaces fallback exception.

Line 2658 should tolerate the expected “table/column unavailable” schema error, but catching every Exception also hides timeouts, connection failures, and parser bugs; Ruff flags this as BLE001. Please catch the specific expected driver/server exception(s) and let unexpected refresh failures surface.

🧰 Tools

🪛 Ruff (0.15.18)

[warning] 2658-2658: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cassandra/metadata.py` around lines 2658 - 2660, The fallback in metadata schema refresh is too broad because the except block in the system_schema.scylla_keyspaces read path catches every Exception, which masks unexpected failures. Narrow the handler around the code in metadata.py that logs “Could not read system_schema.scylla_keyspaces” so it only catches the specific expected driver/server schema-unavailable error(s), and let other refresh errors propagate. Keep the existing debug log and exc_info behavior for the expected case, but do not use a blanket Exception catch in this branch.

Source: Linters/SAST tools

coderabbitai · 2026-06-26T17:04:03Z

+    @property
+    def tablets_routing_v2(self):
+        return any(c.features.tablets_routing_v2 for c in self._connections.values())


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Snapshot live connections before scanning V2 flags.

Line 453 iterates the mutable _connections dict view while replacement/shutdown paths can mutate it, which can raise RuntimeError on the request path; it also lets closed V2 connections keep the pool in V2 mode.

Proposed fix

`@property` def tablets_routing_v2(self): - return any(c.features.tablets_routing_v2 for c in self._connections.values()) + with self._lock: + connections = tuple(self._connections.values()) + return any( + not c.is_closed and not c.is_defunct and c.features.tablets_routing_v2 + for c in connections + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@property

def tablets_routing_v2(self):

return any(c.features.tablets_routing_v2 for c in self._connections.values())

`@property`

def tablets_routing_v2(self):

with self._lock:

connections = tuple(self._connections.values())

return any(

not c.is_closed and not c.is_defunct and c.features.tablets_routing_v2

for c in connections

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cassandra/pool.py` around lines 451 - 453, The tablets_routing_v2 property in Pool should not iterate the live _connections view directly, because concurrent replacement/shutdown can mutate it and closed connections can incorrectly keep V2 enabled. Snapshot the current connections first, then compute the flag only from still-open/live connections before checking each connection’s features.tablets_routing_v2.

coderabbitai · 2026-06-26T17:04:04Z


    _serial_consistency_level = None
    _routing_key = None
+    _routing_token = None


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Key the cached token by token_class too.

Line 353 reuses _routing_token regardless of which partitioner class was passed. If the same Statement is executed against another session/cluster with a different token_class, tablet lookup and shard selection can use a token from the wrong ring.

Proposed fix

_serial_consistency_level = None _routing_key = None _routing_token = None + _routing_token_class = None @@ if consistency_level is not None: self.consistency_level = consistency_level self._routing_key = routing_key + self._routing_token = None + self._routing_token_class = None @@ # The memoized ring token is derived from the routing key; invalidate it. self._routing_token = None + self._routing_token_class = None def _del_routing_key(self): self._routing_key = None self._routing_token = None + self._routing_token_class = None @@ token = self._routing_token - if token is None: + if token is None or self._routing_token_class is not token_class: token = token_class.from_key(routing_key) self._routing_token = token + self._routing_token_class = token_class return token

Also applies to: 318-323, 338-357

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cassandra/query.py` at line 278, The cached routing token in Statement is being reused without considering token_class, which can cause the wrong ring/token to be used across sessions or clusters. Update the token cache logic in Statement-related routing code so _routing_token is keyed by token_class as well, and make the lookup/reuse path in the affected routing methods distinguish tokens by the partitioner class used. Ensure the changes cover the Statement fields and the routing/token selection flow that currently reads and writes _routing_token.

dawmd added 5 commits June 26, 2026 16:13

dawmd requested review from Copilot and piodul June 26, 2026 16:55

dawmd self-assigned this Jun 26, 2026

Copilot started reviewing on behalf of dawmd June 26, 2026 16:55 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement TABLETS_ROUTING_V2#913

Implement TABLETS_ROUTING_V2#913
dawmd wants to merge 5 commits into
scylladb:masterfrom
dawmd:leader-awareness

dawmd commented Jun 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Walkthrough

Sequence Diagram(s)

❌ Failed checks (2 warnings)

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Uh oh!

coderabbitai Bot Jun 26, 2026

Uh oh!

coderabbitai Bot Jun 26, 2026

Uh oh!

coderabbitai Bot Jun 26, 2026

Uh oh!

coderabbitai Bot Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dawmd commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-review checklist

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Sequence Diagram(s)

❌ Failed checks (2 warnings)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dawmd commented Jun 26, 2026 •

edited

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading