Thomaub's Blog

TIL: PyTorch GPU Memory Management

PyTorch GPU Memory Management

Today I learned several techniques for effectively managing GPU memory when working with PyTorch models.

Monitoring Memory Usage

First, it’s important to monitor your GPU memory consumption:

import torch

# Current allocation
print(f"Current GPU memory allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")

# Maximum allocation
print(f"Maximum GPU memory allocated: {torch.cuda.max_memory_allocated() / 1e9:.2f} GB")

# Cache size
print(f"Current GPU memory cached: {torch.cuda.memory_reserved() / 1e9:.2f} GB")

Clearing Unused Memory

PyTorch doesn’t automatically free GPU memory when tensors are no longer used. You can manually clear the cache:

# Empty cache
torch.cuda.empty_cache()

# To explicitly free memory of specific tensors
del tensor
torch.cuda.empty_cache()

Using Mixed Precision Training

Mixed precision training uses lower precision datatypes (float16 instead of float32) to reduce memory usage:

from torch.cuda.amp import autocast, GradScaler

# Initialize scaler
scaler = GradScaler()

# In training loop
with autocast():
    outputs = model(inputs)
    loss = loss_fn(outputs, targets)

# Scale loss and do backward pass
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

Gradient Checkpointing

For very large models, gradient checkpointing trades computation for memory:

# Enable gradient checkpointing
model.gradient_checkpointing_enable()

Efficient Data Loading

The DataLoader can be a source of memory issues:

# Use pin_memory=True for faster GPU transfer
train_loader = DataLoader(
    dataset,
    batch_size=32,
    pin_memory=True,
    num_workers=4
)

Layer Freezing

Freezing early layers reduces memory requirements during fine-tuning:

# Freeze early layers
for param in model.feature_extractor.parameters():
    param.requires_grad = False

These techniques have helped me train larger models on limited GPU resources, avoiding the dreaded “CUDA out of memory” errors.