Refactor RGF kernel for memory optimization and cleanup#30
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request focuses on improving memory management and efficiency in the recursive Green's function (RGF) calculations, particularly when running on CUDA devices. The main changes include more aggressive freeing of intermediate tensors during the RGF sweeps, the introduction of an automatic energy batch size selection based on available GPU memory, and user guidance for optimal CUDA allocator settings. These updates help reduce memory fragmentation and improve performance for large energy grids.
Memory management and efficiency improvements:
recursive_green_cal.py) now aggressively frees per-slot tensors (mat_d,mat_l,mat_u,gr_left) as soon as they're no longer needed, reducing GPU memory fragmentation and peak usage. This includes conditional retention ofgr_leftonly when required for lesser/greater Green's function calculations. [1] [2] [3] [4]device_property.py) exposes arelease_greenfuncsmethod to explicitly free Green's function storage and trigger CUDA cache cleanup between energy chunks.Batch size and resource management:
_auto_chunk_sizemethod in the NEGF runner (NEGF.py) to automatically determine a suitable energy batch size based on available GPU memory, optimizing for both performance and memory safety. The batch size logic innegf_computenow uses this when not set by the user. [1] [2]User guidance for CUDA configuration:
expandable_segmentsCUDA allocator option is not set, which helps avoid memory fragmentation for long energy grids.Minor optimizations and code cleanup:
These changes collectively improve the scalability and robustness of NEGF calculations, especially for large systems and energy grids on CUDA-enabled hardware.