Implement TABLETS_ROUTING_V2#913
Conversation
Add per-connection negotiation of the TABLETS_ROUTING_V2 extension, the successor to TABLETS_ROUTING_V1. When the server advertises it in the SUPPORTED response, the driver echoes it back during STARTUP to opt in; a driver that negotiates v2 does not negotiate v1. While the feature is experimental the wire name carries the `_EXPERIMENTAL` suffix (TABLETS_ROUTING_V2_EXPERIMENTAL), and the server only advertises it when started with the `strongly-consistent-tables` experimental feature enabled. Also add the optional trailing tablet_version_block byte to the EXECUTE message body, written only when set, so later commits can carry the cached tablet version to the server.
Store the server-provided 64-bit tablet_version on each cached Tablet (None until learned) and add helpers to encode it into the one-byte tablet_version_block exchanged on the wire: * choose_tablet_version_block() packs a randomly chosen block index in the high nibble and that block's value in the low nibble, matching the server's locator::compare_tablet_version_block layout. Blocks are indexed from the least significant bits, so block i covers bits [i*4, i*4 + 4) of the version. A random index avoids any shared mutable counter on the hot path while still probing every nibble often enough to detect a server-side version change quickly. * random_tablet_version_block() returns a random byte for cold start, when no version is cached yet.
Add KeyspaceMetadata.strongly_consistent, derived from the per-keyspace `consistency` option in system_schema.scylla_keyspaces. A keyspace is strongly consistent when its consistency is `local` or `global` (null/`eventual` means eventually consistent). The lookup is cached per schema refresh and degrades to "eventually consistent" on non-Scylla clusters or older Scylla versions that lack the table or column. This flag lets the leader-aware routing logic tell which tablet tables actually have a Raft leader.
With TABLETS_ROUTING_V2 the server returns, on a tablet_version mismatch, the tablet's replica set with the Raft leader first plus the new tablet_version. Use this to route strongly-consistent reads and writes straight to the leader: * Every EXECUTE on a V2 connection carries a tablet_version_block computed from the cached version (or a random byte on cold start and for non-single-partition requests). The block is attached per connection, keyed off the connection that actually serves the request, so a rolling upgrade that mixes v1/v2 connections never desyncs the frame. * On the response, the routing payload is parsed according to what the serving connection negotiated; the v2 tuple additionally carries the tablet_version, which is stored back on the tablet. * TokenAwarePolicy yields the leader (the first replica) first, but only for strongly-consistent keyspaces -- a tablet_version is assigned to eventually-consistent tablet tables too and must not be mistaken for a leader hint. To keep this hot path cheap, the ring token is memoized once per statement (Statement.routing_token) and reused by the load balancing policy, the shard selection in the pool, and the tablet_version_block computation, instead of re-hashing the routing key three times.
Cover the end-to-end behaviour against a live ScyllaDB started with the `strongly-consistent-tables` experimental feature: v2 negotiation, payload-driven cache population, the tablet_version_block matching rules (no payload on a matching block, exactly one matching value per index, precedence over v1), the protocol error on a missing block, and leader-aware routing targeting the Raft leader for a strongly-consistent keyspace.
📝 WalkthroughWalkthroughThe driver adds tablets routing v2 support across feature negotiation, tablet version block generation, execute message framing, and shard selection. Response handling now parses tablets-routing-v1/v2 payloads into cluster tablet metadata, and schema parsing marks keyspaces as strongly consistent from Scylla system tables. Token-aware routing now uses tablet replicas and prefers the tablet leader for strongly consistent keyspaces. Sequence Diagram(s)sequenceDiagram
participant Client
participant ResponseFuture
participant Statement
participant HostConnection
participant ExecuteMessage
participant ScyllaServer
participant ClusterMetadata
Client->>ResponseFuture: execute(query)
ResponseFuture->>Statement: routing_token()
ResponseFuture->>ResponseFuture: _compute_tablet_version_block(query)
ResponseFuture->>HostConnection: borrow_connection(...)
HostConnection-->>ResponseFuture: connection
ResponseFuture->>ExecuteMessage: set tablet_version_block
ResponseFuture->>ScyllaServer: send EXECUTE
ScyllaServer-->>ResponseFuture: response + tablets-routing payload
ResponseFuture->>ClusterMetadata: register tablets
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Implements Scylla TABLETS_ROUTING_V2 negotiation and request/response handling in the driver, including encoding the per-request tablet_version_block, parsing the new tablets-routing-v2 custom payload (with tablet_version), and adding leader-aware routing behavior for strongly-consistent keyspaces.
Changes:
- Add TABLETS_ROUTING_V2 protocol feature negotiation and ensure V2 subsumes V1 in STARTUP options.
- Attach a per-request
tablet_version_blockto EXECUTE messages on V2-negotiated connections; parse and cache V2 tablet routing payloads includingtablet_version. - Introduce strongly-consistent keyspace detection (Scylla-only) and leader-first routing in
TokenAwarePolicy; add unit and integration coverage.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_tablets.py | Adds unit tests for tablet_version_block encoding and for storing tablet_version on Tablet. |
| tests/unit/test_protocol_features.py | Adds negotiation tests ensuring V2 is preferred over V1 in STARTUP options. |
| tests/unit/test_policies.py | Adds unit tests for leader-aware routing behavior (leader-first, fallback, and non-SC cases). |
| tests/integration/standard/test_tablets_routing_v2.py | Adds opt-in end-to-end tests validating negotiation, payload decoding, and wire behavior against a V2-capable Scylla cluster. |
| cassandra/tablets.py | Adds tablet_version_block helpers and extends Tablet to store tablet_version. |
| cassandra/query.py | Adds memoization for routing-key token computation via Statement.routing_token(). |
| cassandra/protocol.py | Extends ExecuteMessage to optionally append tablet_version_block byte when present. |
| cassandra/protocol_features.py | Adds TABLETS_ROUTING_V2 constant and negotiation logic; makes V2 mutually exclusive with V1 in STARTUP. |
| cassandra/pool.py | Adds per-pool V2 capability detection and reuses memoized routing token to avoid repeated hashing. |
| cassandra/policies.py | Adds leader-first routing for strongly-consistent keyspaces when tablet replicas are available. |
| cassandra/metadata.py | Adds KeyspaceMetadata.strongly_consistent and populates it from Scylla system_schema.scylla_keyspaces. |
| cassandra/cluster.py | Computes/attaches tablet_version_block per connection and parses V2 routing payloads based on the serving connection’s negotiated features. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| token = self._routing_token | ||
| if token is None: | ||
| token = token_class.from_key(routing_key) | ||
| self._routing_token = token | ||
| return token |
| @property | ||
| def tablets_routing_v2(self): | ||
| return any(c.features.tablets_routing_v2 for c in self._connections.values()) |
| if isinstance(message, ExecuteMessage): | ||
| # Attach the tablet_version_block only when the *specific connection* | ||
| # we are about to send on negotiated TABLETS_ROUTING_V2. The server | ||
| # reads the trailing tablet_version_block byte only on connections | ||
| # that negotiated V2 (gated on the cluster-wide feature), so keying | ||
| # off the borrowed connection -- which is already in hand here, since | ||
| # borrow_connection() ran above -- is both necessary and sufficient: | ||
| # * a V2 connection always gets the block, even if this pool was | ||
| # created (and any cached flag latched) before the cluster | ||
| # feature was enabled, e.g. mid rolling-upgrade; | ||
| # * a non-V2 connection never gets it, even if a sibling shard | ||
| # connection in the same pool already negotiated V2 -- which can | ||
| # happen transiently while connections opened before and after | ||
| # the feature flip coexist. Attaching the block to a non-V2 | ||
| # connection would leave an unread trailing byte and desync the | ||
| # frame, so a pool-level flag cannot get this right regardless of | ||
| # how it is latched. | ||
| if connection.features.tablets_routing_v2: | ||
| message.tablet_version_block = self.session._compute_tablet_version_block(self.query) | ||
| else: | ||
| message.tablet_version_block = None | ||
|
|
| """ | ||
| Compute the tablet_version_block byte for a BoundStatement. | ||
| Returns an int in [0, 255] or None if V2 should not be used. | ||
| """ |
| if connection.features.tablets_routing_v2: | ||
| message.tablet_version_block = self.session._compute_tablet_version_block(self.query) | ||
| else: | ||
| message.tablet_version_block = None |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
tests/unit/test_policies.py (1)
1043-1082: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueOptional: test name/docstring don't match what's actually exercised.
This test asserts the "no tablet_version (V1)" path, but the production code (
make_query_plan) never readstablet.tablet_version; leader-first is gated solely onks_meta.strongly_consistent, which is set toFalsehere. So the test passes because of thestrongly_consistent=Falseflag, not becausetablet_version=None. Consider renaming/reframing it (or settingstrongly_consistent=Truewithtablet_version=None) so the test actually guards the behavior its name implies; otherwise it overlaps withtest_no_leader_routing_for_eventually_consistent_keyspace.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_policies.py` around lines 1043 - 1082, The test name and docstring in test_no_leader_routing_without_tablet_version do not match the behavior actually being exercised, because make_query_plan only gates leader-first routing on keyspace metadata strong consistency, not Tablet.tablet_version. Update the test so it truly covers the intended case by either renaming/reframing it to reflect strongly_consistent=False, or by setting cluster.metadata.keyspaces['ks'].strongly_consistent to True while keeping tablet_version=None to verify the V1 tablet path; keep the assertions aligned with the TokenAwarePolicy routing behavior.cassandra/policies.py (1)
510-514: 🚀 Performance & Scalability | 🔵 TrivialOptional: Cache the child query plan to prevent double invocation and potential inconsistency.
child.make_query_plan(keyspace, query)is invoked at line 512 to filter replicas and is called again at line 551 to build the final plan. If the child policy is stateful (e.g.,RoundRobinPolicyorDCAwareRoundRobinPolicy), the second invocation may advance internal pointers and return a different host sequence, causing the filtering logic to diverge from the execution order and introducing unnecessary overhead.♻️ Suggested approach
Materialize the child plan once and reuse it:
- if tablet is not None: - replicas_mapped = set(map(lambda r: r[0], tablet.replicas)) - child_plan = child.make_query_plan(keyspace, query) - - replicas = [host for host in child_plan if host.host_id in replicas_mapped] + child_plan = list(child.make_query_plan(keyspace, query)) + if tablet is not None: + replicas_mapped = set(map(lambda r: r[0], tablet.replicas)) + replicas = [host for host in child_plan if host.host_id in replicas_mapped]Update the subsequent loop (line 551) to iterate over the cached
child_planinstead of re-invokingchild.make_query_plan(...).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cassandra/policies.py` around lines 510 - 514, Cache the result of child.make_query_plan(keyspace, query) in the policy logic so it is only evaluated once, then reuse that same child_plan for both the replica filtering in the tablet block and the final plan construction loop. Update the query planning flow in the policy method that handles tablet replicas to iterate over the cached child_plan instead of calling child.make_query_plan(...) a second time, which avoids stateful policy divergence and redundant work.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cassandra/cluster.py`:
- Around line 5055-5075: The ExecuteMessage tablet_version_block is only being
set in the pooled connection path, so control-connection fallback can miss the
mandatory V2 trailing byte. Update _query_control_connection() in
cassandra/cluster.py to use the same tablets_routing_v2 check and
_compute_tablet_version_block(self.query) logic as the ExecuteMessage handling
in the main send path, placing it after self._connection = connection and before
connection.send_msg(...) so V2 control connections always include the block and
non-V2 ones do not.
- Around line 5212-5231: The tablet-routing cache is using only
self.query.keyspace when adding tablets, which can store entries under a None
key and miss cache hits for session-level keyspaces. Update the tablet handling
in ResponseFuture to use the effective keyspace consistently, matching
_compute_tablet_version_block() by falling back to self.keyspace when
self.query.keyspace is unset, and then pass that resolved keyspace into
metadata._tablets.add_tablet().
In `@cassandra/metadata.py`:
- Around line 2658-2660: The fallback in metadata schema refresh is too broad
because the except block in the system_schema.scylla_keyspaces read path catches
every Exception, which masks unexpected failures. Narrow the handler around the
code in metadata.py that logs “Could not read system_schema.scylla_keyspaces” so
it only catches the specific expected driver/server schema-unavailable error(s),
and let other refresh errors propagate. Keep the existing debug log and exc_info
behavior for the expected case, but do not use a blanket Exception catch in this
branch.
In `@cassandra/pool.py`:
- Around line 451-453: The tablets_routing_v2 property in Pool should not
iterate the live _connections view directly, because concurrent
replacement/shutdown can mutate it and closed connections can incorrectly keep
V2 enabled. Snapshot the current connections first, then compute the flag only
from still-open/live connections before checking each connection’s
features.tablets_routing_v2.
In `@cassandra/query.py`:
- Line 278: The cached routing token in Statement is being reused without
considering token_class, which can cause the wrong ring/token to be used across
sessions or clusters. Update the token cache logic in Statement-related routing
code so _routing_token is keyed by token_class as well, and make the
lookup/reuse path in the affected routing methods distinguish tokens by the
partitioner class used. Ensure the changes cover the Statement fields and the
routing/token selection flow that currently reads and writes _routing_token.
---
Nitpick comments:
In `@cassandra/policies.py`:
- Around line 510-514: Cache the result of child.make_query_plan(keyspace,
query) in the policy logic so it is only evaluated once, then reuse that same
child_plan for both the replica filtering in the tablet block and the final plan
construction loop. Update the query planning flow in the policy method that
handles tablet replicas to iterate over the cached child_plan instead of calling
child.make_query_plan(...) a second time, which avoids stateful policy
divergence and redundant work.
In `@tests/unit/test_policies.py`:
- Around line 1043-1082: The test name and docstring in
test_no_leader_routing_without_tablet_version do not match the behavior actually
being exercised, because make_query_plan only gates leader-first routing on
keyspace metadata strong consistency, not Tablet.tablet_version. Update the test
so it truly covers the intended case by either renaming/reframing it to reflect
strongly_consistent=False, or by setting
cluster.metadata.keyspaces['ks'].strongly_consistent to True while keeping
tablet_version=None to verify the V1 tablet path; keep the assertions aligned
with the TokenAwarePolicy routing behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 587d3541-27be-4132-8dda-c90e66cff85c
📒 Files selected for processing (12)
cassandra/cluster.pycassandra/metadata.pycassandra/policies.pycassandra/pool.pycassandra/protocol.pycassandra/protocol_features.pycassandra/query.pycassandra/tablets.pytests/integration/standard/test_tablets_routing_v2.pytests/unit/test_policies.pytests/unit/test_protocol_features.pytests/unit/test_tablets.py
| if isinstance(message, ExecuteMessage): | ||
| # Attach the tablet_version_block only when the *specific connection* | ||
| # we are about to send on negotiated TABLETS_ROUTING_V2. The server | ||
| # reads the trailing tablet_version_block byte only on connections | ||
| # that negotiated V2 (gated on the cluster-wide feature), so keying | ||
| # off the borrowed connection -- which is already in hand here, since | ||
| # borrow_connection() ran above -- is both necessary and sufficient: | ||
| # * a V2 connection always gets the block, even if this pool was | ||
| # created (and any cached flag latched) before the cluster | ||
| # feature was enabled, e.g. mid rolling-upgrade; | ||
| # * a non-V2 connection never gets it, even if a sibling shard | ||
| # connection in the same pool already negotiated V2 -- which can | ||
| # happen transiently while connections opened before and after | ||
| # the feature flip coexist. Attaching the block to a non-V2 | ||
| # connection would leave an unread trailing byte and desync the | ||
| # frame, so a pool-level flag cannot get this right regardless of | ||
| # how it is latched. | ||
| if connection.features.tablets_routing_v2: | ||
| message.tablet_version_block = self.session._compute_tablet_version_block(self.query) | ||
| else: | ||
| message.tablet_version_block = None |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Attach the V2 block on control-connection fallback too.
Line 5072 handles pooled connections, but _query_control_connection() can also send an ExecuteMessage; on a V2 control connection that path omits the mandatory trailing byte and can trigger a protocol error.
Proposed direction
+ def _set_tablet_version_block_for_connection(self, message, connection):
+ if not isinstance(message, ExecuteMessage):
+ return
+ if connection.features.tablets_routing_v2:
+ message.tablet_version_block = self.session._compute_tablet_version_block(self.query)
+ else:
+ message.tablet_version_block = None
+
def _query(self, host, message=None, cb=None):
...
- if isinstance(message, ExecuteMessage):
- ...
- if connection.features.tablets_routing_v2:
- message.tablet_version_block = self.session._compute_tablet_version_block(self.query)
- else:
- message.tablet_version_block = None
+ self._set_tablet_version_block_for_connection(message, connection)Then call the same helper in _query_control_connection() after self._connection = connection and before connection.send_msg(...).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cassandra/cluster.py` around lines 5055 - 5075, The ExecuteMessage
tablet_version_block is only being set in the pooled connection path, so
control-connection fallback can miss the mandatory V2 trailing byte. Update
_query_control_connection() in cassandra/cluster.py to use the same
tablets_routing_v2 check and _compute_tablet_version_block(self.query) logic as
the ExecuteMessage handling in the main send path, placing it after
self._connection = connection and before connection.send_msg(...) so V2 control
connections always include the block and non-V2 ones do not.
| keyspace = self.query.keyspace | ||
| table = self.query.table | ||
| if tablet: | ||
| self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet) | ||
| elif connection.features.tablets_routing_v1 and 'tablets-routing-v1' in self._custom_payload: | ||
| protocol = self.session.cluster.protocol_version | ||
| info = self._custom_payload.get('tablets-routing-v1') | ||
| ctype = ResponseFuture._TABLET_ROUTING_CTYPE | ||
| if ctype is None: | ||
| ctype = types.lookup_casstype('TupleType(LongType, LongType, ListType(TupleType(UUIDType, Int32Type)))') | ||
| ResponseFuture._TABLET_ROUTING_CTYPE = ctype | ||
| tablet_routing_info = ctype.from_binary(info, protocol) | ||
| first_token = tablet_routing_info[0] | ||
| last_token = tablet_routing_info[1] | ||
| tablet_replicas = tablet_routing_info[2] | ||
| tablet = Tablet.from_row(first_token, last_token, tablet_replicas) | ||
| keyspace = self.query.keyspace | ||
| table = self.query.table | ||
| if tablet: | ||
| self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet) |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Cache routing payloads under the effective keyspace.
Line 5212 and Line 5228 use only self.query.keyspace, but _compute_tablet_version_block() looks up tablets with query.keyspace or self.keyspace. Prepared statements executed in a session keyspace can therefore cache under (None, table) and miss the cache on every later request.
Proposed fix
- keyspace = self.query.keyspace
+ keyspace = self.query.keyspace or self.session.keyspace
table = self.query.table
- if tablet:
+ if tablet and keyspace and table:
self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet)
...
- keyspace = self.query.keyspace
+ keyspace = self.query.keyspace or self.session.keyspace
table = self.query.table
- if tablet:
+ if tablet and keyspace and table:
self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| keyspace = self.query.keyspace | |
| table = self.query.table | |
| if tablet: | |
| self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet) | |
| elif connection.features.tablets_routing_v1 and 'tablets-routing-v1' in self._custom_payload: | |
| protocol = self.session.cluster.protocol_version | |
| info = self._custom_payload.get('tablets-routing-v1') | |
| ctype = ResponseFuture._TABLET_ROUTING_CTYPE | |
| if ctype is None: | |
| ctype = types.lookup_casstype('TupleType(LongType, LongType, ListType(TupleType(UUIDType, Int32Type)))') | |
| ResponseFuture._TABLET_ROUTING_CTYPE = ctype | |
| tablet_routing_info = ctype.from_binary(info, protocol) | |
| first_token = tablet_routing_info[0] | |
| last_token = tablet_routing_info[1] | |
| tablet_replicas = tablet_routing_info[2] | |
| tablet = Tablet.from_row(first_token, last_token, tablet_replicas) | |
| keyspace = self.query.keyspace | |
| table = self.query.table | |
| if tablet: | |
| self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet) | |
| keyspace = self.query.keyspace or self.session.keyspace | |
| table = self.query.table | |
| if tablet and keyspace and table: | |
| self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet) | |
| elif connection.features.tablets_routing_v1 and 'tablets-routing-v1' in self._custom_payload: | |
| protocol = self.session.cluster.protocol_version | |
| info = self._custom_payload.get('tablets-routing-v1') | |
| ctype = ResponseFuture._TABLET_ROUTING_CTYPE | |
| if ctype is None: | |
| ctype = types.lookup_casstype('TupleType(LongType, LongType, ListType(TupleType(UUIDType, Int32Type)))') | |
| ResponseFuture._TABLET_ROUTING_CTYPE = ctype | |
| tablet_routing_info = ctype.from_binary(info, protocol) | |
| first_token = tablet_routing_info[0] | |
| last_token = tablet_routing_info[1] | |
| tablet_replicas = tablet_routing_info[2] | |
| tablet = Tablet.from_row(first_token, last_token, tablet_replicas) | |
| keyspace = self.query.keyspace or self.session.keyspace | |
| table = self.query.table | |
| if tablet and keyspace and table: | |
| self.session.cluster.metadata._tablets.add_tablet(keyspace, table, tablet) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cassandra/cluster.py` around lines 5212 - 5231, The tablet-routing cache is
using only self.query.keyspace when adding tablets, which can store entries
under a None key and miss cache hits for session-level keyspaces. Update the
tablet handling in ResponseFuture to use the effective keyspace consistently,
matching _compute_tablet_version_block() by falling back to self.keyspace when
self.query.keyspace is unset, and then pass that resolved keyspace into
metadata._tablets.add_tablet().
| except Exception: | ||
| log.debug("Could not read system_schema.scylla_keyspaces; treating all " | ||
| "keyspaces as eventually consistent", exc_info=True) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Narrow the Scylla-keyspaces fallback exception.
Line 2658 should tolerate the expected “table/column unavailable” schema error, but catching every Exception also hides timeouts, connection failures, and parser bugs; Ruff flags this as BLE001. Please catch the specific expected driver/server exception(s) and let unexpected refresh failures surface.
🧰 Tools
🪛 Ruff (0.15.18)
[warning] 2658-2658: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cassandra/metadata.py` around lines 2658 - 2660, The fallback in metadata
schema refresh is too broad because the except block in the
system_schema.scylla_keyspaces read path catches every Exception, which masks
unexpected failures. Narrow the handler around the code in metadata.py that logs
“Could not read system_schema.scylla_keyspaces” so it only catches the specific
expected driver/server schema-unavailable error(s), and let other refresh errors
propagate. Keep the existing debug log and exc_info behavior for the expected
case, but do not use a blanket Exception catch in this branch.
Source: Linters/SAST tools
| @property | ||
| def tablets_routing_v2(self): | ||
| return any(c.features.tablets_routing_v2 for c in self._connections.values()) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Snapshot live connections before scanning V2 flags.
Line 453 iterates the mutable _connections dict view while replacement/shutdown paths can mutate it, which can raise RuntimeError on the request path; it also lets closed V2 connections keep the pool in V2 mode.
Proposed fix
`@property`
def tablets_routing_v2(self):
- return any(c.features.tablets_routing_v2 for c in self._connections.values())
+ with self._lock:
+ connections = tuple(self._connections.values())
+ return any(
+ not c.is_closed and not c.is_defunct and c.features.tablets_routing_v2
+ for c in connections
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @property | |
| def tablets_routing_v2(self): | |
| return any(c.features.tablets_routing_v2 for c in self._connections.values()) | |
| `@property` | |
| def tablets_routing_v2(self): | |
| with self._lock: | |
| connections = tuple(self._connections.values()) | |
| return any( | |
| not c.is_closed and not c.is_defunct and c.features.tablets_routing_v2 | |
| for c in connections | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cassandra/pool.py` around lines 451 - 453, The tablets_routing_v2 property in
Pool should not iterate the live _connections view directly, because concurrent
replacement/shutdown can mutate it and closed connections can incorrectly keep
V2 enabled. Snapshot the current connections first, then compute the flag only
from still-open/live connections before checking each connection’s
features.tablets_routing_v2.
|
|
||
| _serial_consistency_level = None | ||
| _routing_key = None | ||
| _routing_token = None |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Key the cached token by token_class too.
Line 353 reuses _routing_token regardless of which partitioner class was passed. If the same Statement is executed against another session/cluster with a different token_class, tablet lookup and shard selection can use a token from the wrong ring.
Proposed fix
_serial_consistency_level = None
_routing_key = None
_routing_token = None
+ _routing_token_class = None
@@
if consistency_level is not None:
self.consistency_level = consistency_level
self._routing_key = routing_key
+ self._routing_token = None
+ self._routing_token_class = None
@@
# The memoized ring token is derived from the routing key; invalidate it.
self._routing_token = None
+ self._routing_token_class = None
def _del_routing_key(self):
self._routing_key = None
self._routing_token = None
+ self._routing_token_class = None
@@
token = self._routing_token
- if token is None:
+ if token is None or self._routing_token_class is not token_class:
token = token_class.from_key(routing_key)
self._routing_token = token
+ self._routing_token_class = token_class
return tokenAlso applies to: 318-323, 338-357
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cassandra/query.py` at line 278, The cached routing token in Statement is
being reused without considering token_class, which can cause the wrong
ring/token to be used across sessions or clusters. Update the token cache logic
in Statement-related routing code so _routing_token is keyed by token_class as
well, and make the lookup/reuse path in the affected routing methods distinguish
tokens by the partitioner class used. Ensure the changes cover the Statement
fields and the routing/token selection flow that currently reads and writes
_routing_token.
Placeholder for the time being. I'll edit the cover letter tomorrow.
The PR was tested against scylladb/scylladb#30291.
Pre-review checklist
./docs/source/.Fixes:annotations to PR description.