Add fixed_capacity_map to cudax by srinivasyadav18 · Pull Request #7705 · NVIDIA/cccl

srinivasyadav18 · 2026-02-18T04:14:14Z

Description

Part of #7463

This PR migrates cuCollections static_map's insert and contains operations into cudax as cuda::experimental::cuco::fixed_capacity_map.

Minimal scope: implements insert, contains, clear, and trivial accessors, with capacity validation provided by make_valid_capacity and is_valid_capacity. Tests mirror the cuCollections layout and use a parameterized matrix covering key type, probing scheme, CG size, and bucket size.

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2026-02-18T04:14:18Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

copy-pr-bot · 2026-05-21T20:46:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…ST_DEVICE_API

coderabbitai · 2026-06-03T20:50:32Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a complete open-addressing hash table infrastructure for CUDA Experimental, comprising device reference operations, grid kernels, host orchestration, and a public static_map container with static/dynamic capacity modes and optional key erasure, plus comprehensive test coverage.

Changes

Open-addressing and static_map port

Layer / File(s)	Summary
Type traits and bitwise comparison `cudax/include/cuda/experimental/__cuco/traits.hpp`, `cudax/include/cuda/experimental/__cuco/__detail/bitwise_compare.cuh`	`is_bitwise_comparable`, `is_tuple_like` traits and aligned `__bitwise_compare` template support bitwise-safe type detection and fast equality paths (4/8-byte specializations via reinterpretation, general `memcmp` fallback).
Prime utilities and capacity rounding `cudax/include/cuda/experimental/__cuco/__detail/prime.hpp`, `cudax/include/cuda/experimental/__cuco/capacity.cuh`	Deterministic 64-bit primality testing via trial division + Miller–Rabin, modular arithmetic with `__int128` fast path, and `make_valid_capacity` rounding for linear/double-hashing with overflow guards.
Probing schemes and iterator base `cudax/include/cuda/experimental/__cuco/__detail/probing_scheme_base.cuh`, `cudax/include/cuda/experimental/__cuco/probing_scheme.cuh`	`__probing_scheme_base<CgSize>` and `__probing_iterator` for bucket traversal; public `linear_probing` and `double_hashing` templates with cooperative-group tile-rank stride distribution.
Sentinel types and kernel utilities `cudax/include/cuda/experimental/__cuco/types.cuh`, `cudax/include/cuda/experimental/__cuco/__detail/types.cuh`, `cudax/include/cuda/experimental/__cuco/__detail/utils.cuh`, `cudax/include/cuda/experimental/__cuco/__detail/utils.hpp`	Strong-type sentinel wrappers (`empty_key`, `empty_value`, `erased_key`), mdspan extent aliases, and grid-launch helpers (global thread ID, grid stride, occupancy sizing, tile-size traits).
Equality wrapper for probing `cudax/include/cuda/experimental/__cuco/__detail/equal_wrapper.cuh`	Combines `__bitwise_compare` sentinel checks with key equality, returning three-way results and branching on insert vs. query mode for duplicate control.
Slot storage and device reference core `cudax/include/cuda/experimental/__cuco/__open_addressing/slot_storage_ref.cuh`, `cudax/include/cuda/experimental/__cuco/__open_addressing/open_addressing_ref_impl.cuh`	`__slot_storage_ref` non-owning bucket view and `__open_addressing_ref_impl` device-side operations (probing, CAS-based insert with `packed_cas`/`back_to_back_cas`/`cas_dependent_write` dispatch, contains, cooperative-group variants).
Grid kernels for bulk operations `cudax/include/cuda/experimental/__cuco/__open_addressing/kernels.cuh`	Grid-stride conditional `__insert_if_n`, `__fill`, and `__contains_if_n` kernels with `_CgSize==1` direct vs. `_CgSize!=1` tiled cooperative execution paths.
Host orchestration and memory `cudax/include/cuda/experimental/__cuco/__open_addressing/open_addressing_impl.cuh`	Device-allocated slot buffer, async/sync clear/insert/contains with stream refs, device counter for success counting, bucket-count computation from capacity or load factor.
Public static_map container `cudax/include/cuda/experimental/__cuco/static_map.cuh`, `cudax/include/cuda/experimental/__cuco/static_map_ref.cuh`	SFINAE-selected constructors for static/dynamic capacity and erasure modes; `clear`, `insert`, `contains` forwarding; device-side `static_map_ref` with trivially-copyable ref semantics.
Capacity, insert, and sentinel tests `cudax/test/cuco/static_map/test_capacity.cu`, `cudax/test/cuco/static_map/test_insert_and_contains.cu`, `cudax/test/cuco/static_map/test_key_sentinel.cu`, `cudax/test/cuco/static_map/test_shared_memory.cu`, `cudax/test/cuco/utility/test_capacity.cu`, `cudax/test/CMakeLists.txt`	Validates dynamic/static capacity computation, insert/contains workflows, shared-memory sizing via `capacity_v`, sentinel handling, and load-factor rounding; updates `strong_type.cuh` documentation.

Assessment against linked issues

Objective	Addressed	Explanation
Port OpenAddressing [`#7463`]	✅
Port `static_map` [`#7463`]	✅

Suggested labels

cudax

Suggested reviewers

andralex
pciolkosz
gevtushenko

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 18

🧹 Nitpick comments (3)

cudax/include/cuda/experimental/__cuco/__detail/extent.cuh (1)

131-148: ⚡ Quick win

suggestion: Mark these header variable templates inline. They are namespace-scope constexpr definitions in a header, and the local CCCL rule requires the explicit inline spelling for this pattern.

As per coding guidelines, "All constexpr variables at namespace/global scope must use inline, including template variables."

cudax/include/cuda/experimental/__cuco/probing_scheme.cuh (1)

24-31: ⚡ Quick win

suggestion: Wrap this header with the standard CCCL prologue/epilogue pair. The file enters code directly after its includes and never closes with #include <cuda/std/__cccl/epilogue.h>, unlike the other new headers in this cohort.

As per coding guidelines, "The last included header before code must be #include <cuda/std/__cccl/prologue.h>, and #include <cuda/std/__cccl/epilogue.h> must be at the end of a file."

Also applies to: 264-264

cudax/include/cuda/experimental/__cuco/__static_map/kernels.cuh (1)

119-235: suggestion: Please attach benchmark results for this fast path before merge. This shared-memory kernel adds a new execution path and tuning heuristic, so we need the perf numbers that justify it on the supported toolchains and architectures. As per coding guidelines, "Do not commit SASS code changes without running benchmarks to check for performance regressions."

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d2eb011c-f333-4929-a09b-f09102640ec3

📥 Commits

Reviewing files that changed from the base of the PR and between 75c7b14 and 134736e.

📒 Files selected for processing (22)

cudax/include/cuda/experimental/__cuco/__detail/bitwise_compare.cuh
cudax/include/cuda/experimental/__cuco/__detail/equal_wrapper.cuh
cudax/include/cuda/experimental/__cuco/__detail/extent.cuh
cudax/include/cuda/experimental/__cuco/__detail/prime.hpp
cudax/include/cuda/experimental/__cuco/__detail/probing_scheme_base.cuh
cudax/include/cuda/experimental/__cuco/__detail/types.cuh
cudax/include/cuda/experimental/__cuco/__detail/utils.cuh
cudax/include/cuda/experimental/__cuco/__detail/utils.hpp
cudax/include/cuda/experimental/__cuco/__open_addressing/functors.cuh
cudax/include/cuda/experimental/__cuco/__open_addressing/kernels.cuh
cudax/include/cuda/experimental/__cuco/__open_addressing/open_addressing_impl.cuh
cudax/include/cuda/experimental/__cuco/__open_addressing/open_addressing_ref_impl.cuh
cudax/include/cuda/experimental/__cuco/__open_addressing/slot_storage_ref.cuh
cudax/include/cuda/experimental/__cuco/__open_addressing/types.cuh
cudax/include/cuda/experimental/__cuco/__static_map/kernels.cuh
cudax/include/cuda/experimental/__cuco/__utility/strong_type.cuh
cudax/include/cuda/experimental/__cuco/probing_scheme.cuh
cudax/include/cuda/experimental/__cuco/static_map.cuh
cudax/include/cuda/experimental/__cuco/static_map_ref.cuh
cudax/include/cuda/experimental/__cuco/traits.hpp
cudax/test/CMakeLists.txt
cudax/test/cuco/static_map/test_static_map.cu

PointKernel

Looks good from my end

PointKernel · 2026-06-25T19:25:30Z

/ok to test 40b602a

…tic_map

PointKernel · 2026-06-26T18:15:35Z

/ok to test 058c039

srinivasyadav18 · 2026-06-30T14:43:35Z

/ok to test e986cab

davebayer · 2026-06-30T17:27:24Z

+
+public:
+  //! @brief Constructs an open addressing implementation with the given capacity.
+  _CCCL_HOST __open_addressing_impl(


Please, use _CCCL_API/_CCCL_HOST_API/_CCCL_DEVICE_API instead of _CCCL_HOST_DEVICE/_CCCL_HOST/_CCCL_DEVICE

Good point. Updated to the _API family. This file only used _CCCL_HOST, so it is now _CCCL_HOST_API. For host+device cases I am keeping _CCCL_HOST_DEVICE_API rather than _CCCL_API: cudax was intentionally moved off _CCCL_API to _CCCL_HOST_DEVICE_API in #8955 because _CCCL_API carries _CCCL_TILE, which is not usable in tile mode.

davebayer · 2026-06-30T17:28:59Z

+    [[maybe_unused]] const auto __status = CUB_NS_QUALIFIER::DeviceTransform::Fill(
+      __slots.data(), static_cast<detail::__index_type>(__n), __empty_slot_sentinel, __stream);
+    _CCCL_ASSERT(__status == cudaSuccess, "cuco: failed to clear slot storage");


Shouldn't these _CCCL_THROW(::cuda::cuda_error, ...) instead?

Done via _CCCL_TRY_CUDA_API (cccl's CUCO_CUDA_TRY, throws cuda_error). Dropped noexcept from the three async methods so the throw is valid.

davebayer · 2026-06-30T17:41:07Z

+#ifndef _CUDAX___CUCO_DETAIL_OPEN_ADDRESSING_KERNELS_CUH
+#define _CUDAX___CUCO_DETAIL_OPEN_ADDRESSING_KERNELS_CUH
+
+#include <cuda/__cccl_config>


Use this config instead, please

Suggested change

#include <cuda/__cccl_config>

#include <cuda/std/detail/__config>

<cuda/__cccl_config> is the convention the __cuco headers follow, and usage is roughly even across cudax (~138 vs ~139). Is there a best-practice guideline on which one to use?

davebayer · 2026-06-30T17:52:23Z

+  [[nodiscard]] _CCCL_HOST static ::cuda::device_memory_pool_ref __default_memory_resource()
+  {
+    return ::cuda::device_default_memory_pool(::cuda::device_ref{::cuda::__driver::__ctxGetDevice()});
+  }


Again, this is not ideal

davebayer · 2026-06-30T17:53:20Z

+    const _KeyEqual& __pred                = {},
+    const _ProbingScheme& __probing_scheme = {},
+    _MemoryResource __mr                   = __default_memory_resource(),
+    ::cuda::stream_ref __stream            = cudaStream_t{nullptr})


In cccl runtime, we pretend like there is no default stream and usually put it as the first parameter

The default MR and stream are intentional, aiming for STL canonical ergonomics: a user can write auto m = cuco::static_map{empty_key{-1}, 100}; and get a working device map without thinking about streams or memory resources, much like std::unordered_map<int>{100} hides the allocator, or thrust::sort(thrust::device, ...) handles its own temp allocations and stream internally. That said, I'm not sure how valuable that STL style convenience really is for the GPU case, so I'm open to dropping the defaults and taking the stream as an explicit first parameter if that's the cccl runtime preference. WDYT?

davebayer · 2026-06-30T17:54:06Z

+    empty_value<_Tp> __empty_value_sentinel,
+    const _KeyEqual& __pred                = {},
+    const _ProbingScheme& __probing_scheme = {},
+    _MemoryResource __mr                   = __default_memory_resource(),


The memory resource should be explicit

davebayer · 2026-06-30T17:54:27Z

+  //! @brief Erases all elements from the container. After this call, `size()` returns zero.
+  //!
+  //! @param __stream CUDA stream this operation is executed in
+  void clear(::cuda::stream_ref __stream = cudaStream_t{nullptr})


Suggested change

void clear(::cuda::stream_ref __stream = cudaStream_t{nullptr})

void clear(::cuda::stream_ref __stream)

If we used the same approach as @pciolkosz used in cuda::buffer, this class should store the stream it was created with to perform these operations

I'd avoid storing the stream as a member. The map has to support being used across multiple streams (e.g. built on one stream, then inserts/queries issued on others) since we have downstream use cases relying on that, so binding it to the creation stream would break them. The stream should stay a per-operation argument.

github-actions · 2026-06-30T18:23:16Z

🥳 CI Workflow Results

🟩 Finished in 3h 37m: Pass: 100%/55 | Total: 8h 34m | Max: 56m 50s | Hits: 75%/46418

See results here.

…tic_map

sleeepyjack · 2026-06-30T23:26:50Z

+//! stencil returns true.
+template <int _CgSize, int _BlockSize, class _InputIt, class _StencilIt, class _Predicate, class _Ref>
+_CCCL_KERNEL_ATTRIBUTES void
+__insert_if_n(_InputIt __first, detail::__index_type __n, _StencilIt __stencil, _Predicate __pred, _Ref __ref)


All these kernels are missing launch bounds

Good catch. Fixed

PointKernel · 2026-07-01T00:08:28Z

/ok to test 2d119f5

github-project-automation Bot added this to CCCL Feb 18, 2026

github-project-automation Bot moved this to Todo in CCCL Feb 18, 2026

cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL Feb 18, 2026

PointKernel self-requested a review February 18, 2026 20:10

PointKernel requested changes Feb 18, 2026

View reviewed changes

srinivasyadav18 force-pushed the cuco_static_map branch from 0edb761 to 65368db Compare April 15, 2026 22:53

srinivasyadav18 added 5 commits April 22, 2026 17:30

initial migration of OA and static_map

9a6f714

cleanups

874285c

temporary WAR for call to ~buffer from cuda::counting_iterator[]

a1337e9

use shared memory buffer flushing in kernels;cleanups

f2820a6

refactor extent's to use size_type like std::span

b0d0702

srinivasyadav18 force-pushed the cuco_static_map branch from 65368db to b0d0702 Compare April 23, 2026 00:32

srinivasyadav18 added 3 commits April 22, 2026 17:46

simlify primes usage

1194f02

docs and cleanups

9d38c38

more refactorings

875a47c

PointKernel marked this pull request as ready for review June 3, 2026 18:11

PointKernel requested a review from a team as a code owner June 3, 2026 18:11

PointKernel requested a review from andralex June 3, 2026 18:11

cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL Jun 3, 2026

PointKernel added 7 commits June 3, 2026 18:27

Merge upstream/main into cuco_static_map; align _CCCL_API -> _CCCL_HO…

f169d17

…ST_DEVICE_API

Code formatting

3d2dd24

Update docs for probing scheme

f96351f

Remove outer logic and get rid of count and retrieve APIs for map

edf4fd3

Replace thrust fancy iters with cuda:: ones

aa6b827

Inclusion cleanups + remove a circular inclusion

a8bc527

Fix outdated docs + comments

134736e

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

PointKernel approved these changes Jun 24, 2026

View reviewed changes

PointKernel requested review from fbusato and sleeepyjack June 24, 2026 00:32

sleeepyjack approved these changes Jun 25, 2026

View reviewed changes

Comment thread cudax/include/cuda/experimental/__cuco/detail/open_addressing/kernels.cuh

Merge branch 'main' into cuco_static_map

40b602a

PointKernel added 2 commits June 25, 2026 20:14

Merge remote-tracking branch 'upstream/main' into cuco_static_map

c73b061

Merge remote-tracking branch 'srinivas/cuco_static_map' into cuco_sta…

04378ae

…tic_map

This comment has been minimized.

Sign in to view

PointKernel added 3 commits June 26, 2026 17:34

Fix compiler issues

cb3451e

Merge remote-tracking branch 'upstream/main' into cuco_static_map

82c768c

Fix shared memort init

058c039

This comment has been minimized.

Sign in to view

Merge branch 'main' into cuco_static_map

e986cab

davebayer reviewed Jun 30, 2026

View reviewed changes

PointKernel added 7 commits June 30, 2026 21:15

Throw cuda_error on CUB launch failures and use _CCCL_HOST_API

9540d40

Convert SFINAE to _CCCL_TEMPLATE/_CCCL_REQUIRES

760a3b8

Remove unused __max_occupancy_grid_size helper

f37dc5d

Drop redundant explicit enumerator values in __equal_result

82baf5b

Use cuda::add_overflow in __next_prime overflow check

6fc3358

Merge remote-tracking branch 'upstream/main' into cuco_static_map

950f32c

Merge remote-tracking branch 'srinivas/cuco_static_map' into cuco_sta…

35659af

…tic_map

sleeepyjack mentioned this pull request Jun 30, 2026

[CUDAX] [CUCO] Use cuda::launch and thread hierarchies in cuco #9656

Open

sleeepyjack reviewed Jun 30, 2026

View reviewed changes

PointKernel added 2 commits July 1, 2026 00:07

Add launch bounds to open-addressing kernels

b8a959a

Merge remote-tracking branch 'upstream/main' into cuco_static_map

2d119f5

	#include <cuda/__cccl_config>
	#include <cuda/std/detail/__config>

	void clear(::cuda::stream_ref __stream = cudaStream_t{nullptr})
	void clear(::cuda::stream_ref __stream)

Uh oh!

Conversation

srinivasyadav18 commented Feb 18, 2026 • edited by PointKernel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

copy-pr-bot Bot commented Feb 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Assessment against linked issues

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PointKernel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PointKernel commented Jun 25, 2026

Uh oh!

This comment has been minimized.

PointKernel commented Jun 26, 2026

Uh oh!

This comment has been minimized.

srinivasyadav18 commented Jun 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

srinivasyadav18 commented Feb 18, 2026 •

edited by PointKernel

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading