TIL: PyTorch GPU Memory Management
PyTorch GPU Memory Management
Today I learned several techniques for effectively managing GPU memory when working with PyTorch models.
Monitoring Memory Usage
First, it’s important to monitor your GPU memory consumption:
import torch
# Current allocation
print(f"Current GPU memory allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
# Maximum allocation
print(f"Maximum GPU memory allocated: {torch.cuda.max_memory_allocated() / 1e9:.2f} GB")
# Cache size
print(f"Current GPU memory cached: {torch.cuda.memory_reserved() / 1e9:.2f} GB")
Clearing Unused Memory
PyTorch doesn’t automatically free GPU memory when tensors are no longer used. You can manually clear the cache:
# Empty cache
torch.cuda.empty_cache()
# To explicitly free memory of specific tensors
del tensor
torch.cuda.empty_cache()
Using Mixed Precision Training
Mixed precision training uses lower precision datatypes (float16 instead of float32) to reduce memory usage:
from torch.cuda.amp import autocast, GradScaler
# Initialize scaler
scaler = GradScaler()
# In training loop
with autocast():
outputs = model(inputs)
loss = loss_fn(outputs, targets)
# Scale loss and do backward pass
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
Gradient Checkpointing
For very large models, gradient checkpointing trades computation for memory:
# Enable gradient checkpointing
model.gradient_checkpointing_enable()
Efficient Data Loading
The DataLoader can be a source of memory issues:
# Use pin_memory=True for faster GPU transfer
train_loader = DataLoader(
dataset,
batch_size=32,
pin_memory=True,
num_workers=4
)
Layer Freezing
Freezing early layers reduces memory requirements during fine-tuning:
# Freeze early layers
for param in model.feature_extractor.parameters():
param.requires_grad = False
These techniques have helped me train larger models on limited GPU resources, avoiding the dreaded “CUDA out of memory” errors.