Thomaub's Blog

TIL: Hyperparameter Tuning with Ray Tune

Hyperparameter Tuning with Ray Tune

Today I learned how to efficiently optimize hyperparameters for machine learning models using Ray Tune, a powerful library for distributed hyperparameter tuning.

Why Ray Tune?

After struggling with manual hyperparameter tuning and grid search approaches that took days to complete, I discovered Ray Tune, which offers several advantages:

  1. Distributed execution: Parallelizes trials across CPU cores and machines
  2. Early stopping: Automatically terminates underperforming trials
  3. Advanced search algorithms: Bayesian optimization, HyperBand, and more
  4. Resource management: Efficiently allocates CPU/GPU resources
  5. Integration: Works with PyTorch, TensorFlow, scikit-learn, and more

Key Components

Search Space Definition

Ray Tune supports various sampling methods for defining hyperparameter search spaces:

search_space = {
    'lr': tune.loguniform(1e-4, 1e-1),              # Log-uniform distribution
    'hidden_units': tune.choice([64, 128, 256, 512]), # Discrete choices
    'dropout': tune.uniform(0.1, 0.5),               # Uniform distribution
    'activation': tune.grid_search(['relu', 'tanh'])  # Grid search these values
}

Search Algorithms

Ray Tune integrates with various optimization libraries:

  1. Bayesian Optimization (via HyperOpt): Builds a probabilistic model of the objective function
  2. Population-Based Training: Evolves a population of models via genetic algorithm principles
  3. HyperBand/ASHA: Efficiently allocates resources to promising configurations

Schedulers for Early Stopping

Schedulers determine which trials should be terminated early:

from ray.tune.schedulers import ASHAScheduler

scheduler = ASHAScheduler(
    metric='val_loss',
    mode='min',
    max_t=100,      # Maximum number of training iterations
    grace_period=10, # Minimum iterations before stopping
    reduction_factor=2
)

Complete Example

Here’s how I implemented a complete hyperparameter tuning workflow:

import ray
from ray import tune
from ray.tune.schedulers import ASHAScheduler

# Define the objective function
def objective(config):
    # Create and train model with the hyperparameters in config
    model = create_model(
        learning_rate=config['lr'],
        hidden_units=config['hidden_units'],
        dropout=config['dropout']
    )

    # Train and evaluate
    for epoch in range(10):
        train_loss = train_epoch(model)
        val_loss = validate(model)

        # Report metrics to Ray Tune
        tune.report(loss=val_loss, training_loss=train_loss)

# Run hyperparameter search
result = tune.run(
    objective,
    config=search_space,
    scheduler=ASHAScheduler(metric='loss', mode='min'),
    num_samples=50
)

# Get best configuration
best_config = result.get_best_config(metric='loss', mode='min')

Using Ray Tune reduced my hyperparameter optimization time from days to hours while finding better configurations than my manual tuning efforts. The early stopping feature alone saved approximately 70% of computation time by terminating unpromising trials.