Skip to content

Refactor RGF kernel for memory optimization and cleanup#30

Merged
AsymmetryChou merged 8 commits into
deepmodeling:mainfrom
AsymmetryChou:rgf_acc
Jun 19, 2026
Merged

Refactor RGF kernel for memory optimization and cleanup#30
AsymmetryChou merged 8 commits into
deepmodeling:mainfrom
AsymmetryChou:rgf_acc

Conversation

@AsymmetryChou

Copy link
Copy Markdown
Collaborator

This pull request focuses on improving memory management and efficiency in the recursive Green's function (RGF) calculations, particularly when running on CUDA devices. The main changes include more aggressive freeing of intermediate tensors during the RGF sweeps, the introduction of an automatic energy batch size selection based on available GPU memory, and user guidance for optimal CUDA allocator settings. These updates help reduce memory fragmentation and improve performance for large energy grids.

Memory management and efficiency improvements:

  • The RGF kernel (recursive_green_cal.py) now aggressively frees per-slot tensors (mat_d, mat_l, mat_u, gr_left) as soon as they're no longer needed, reducing GPU memory fragmentation and peak usage. This includes conditional retention of gr_left only when required for lesser/greater Green's function calculations. [1] [2] [3] [4]
  • The device property class (device_property.py) exposes a release_greenfuncs method to explicitly free Green's function storage and trigger CUDA cache cleanup between energy chunks.

Batch size and resource management:

  • Added an _auto_chunk_size method in the NEGF runner (NEGF.py) to automatically determine a suitable energy batch size based on available GPU memory, optimizing for both performance and memory safety. The batch size logic in negf_compute now uses this when not set by the user. [1] [2]

User guidance for CUDA configuration:

  • The NEGF runner now warns users if the recommended expandable_segments CUDA allocator option is not set, which helps avoid memory fragmentation for long energy grids.

Minor optimizations and code cleanup:

  • Avoids unnecessary tensor copies for L/U blocks in the RGF kernel, saving memory.
  • Cleans up unused code in current calculation and adjusts method signatures for clarity. [1] [2]

These changes collectively improve the scalability and robustness of NEGF calculations, especially for large systems and energy grids on CUDA-enabled hardware.

@AsymmetryChou AsymmetryChou merged commit 560b47f into deepmodeling:main Jun 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant