Understanding GPU memory usage is crucial for deep learning practitioners. PyTorch provides several tools to track and manage memory allocation. This article explains how to visualize GPU memory, identify memory bottlenecks, and optimize your models.
Monitoring Memory with torch.cuda
The torch.cuda module offers functions to query memory stats. Use torch.cuda.memory_allocated() and torch.cuda.memory_reserved() to get current usage. torch.cuda.memory_summary() provides a detailed breakdown.
import torch
print(torch.cuda.memory_summary())
This outputs the allocated memory by tensor and the reserved caching allocator pools.
Visualizing Memory with torch.cuda.memory_snapshot()
For a dynamic view, use memory_snapshot() to capture memory events. You can then visualize with tools like torch.utils.tensorboard or custom plots.
Using torch.cuda.set_per_process_memory_fraction()
To limit memory usage, set a fraction of total GPU memory. This helps prevent OOM errors.
torch.cuda.set_per_process_memory_fraction(0.8) # use 80% of total
Practical Tips
- Use
delto free tensors explicitly. - Run
torch.cuda.empty_cache()to release unused memory (but note it may impact performance). - Profile with PyTorch Profiler to see operator memory.
By combining these tools, you can gain deep insight into GPU memory behavior and write more efficient code.