PyTorch Lightning Hyperparameter Optimization with Optuna

PyTorch Lightning Hyperparameter Optimization with Optuna
Picture by Writer | Ideogram

PyTorch Lightning was launched in recent times as a high-level different to the classical PyTorch library for deep studying modeling. It simplifies the method of coaching, validating, and deploying fashions. In terms of hyperparameter optimization, that’s, the method of discovering the optimum set of mannequin parameters that maximize efficiency on a given activity, Optuna is usually a useful gizmo for use together with PyTorch Lightning because of its seamless integration course of and the environment friendly search algorithms it supplies to search out one of the best setting in your mannequin amongst a ton of doable configurations.

This text exhibits easy methods to collectively use PyTorch Lightning and Optuna to information the hyperparameter optimization course of for a deep studying mannequin. It is suggested to have a primary information of sensible building and coaching of neural networks, ideally with PyTorch.

Step-by-Step Course of

The method begins by putting in and importing a sequence of essential libraries and modules, together with PyTorch Lightning and Optuna. The preliminary set up course of could take a while to finish.

pip set up pytorch_lightning pip set up optuna pip set up optuna-integration[pytorch_lightning]

pip set up pytorch_lightning

pip set up optuna

pip set up optuna–integration[pytorch_lightning]

Now, the numerous imports:

import os import torch import torch.nn as nn import torch.nn.purposeful as F from torch.utils.information import DataLoader, random_split from torchvision import datasets, transforms import pytorch_lightning as pl from pytorch_lightning.callbacks import EarlyStopping from pytorch_lightning.loggers import TensorBoardLogger import optuna from optuna.integration import PyTorchLightningPruningCallback

import os

import torch

import torch.nn as nn

import torch.nn.purposeful as F

from torch.utils.information import DataLoader, random_split

from torchvision import datasets, transforms

import pytorch_lightning as pl

from pytorch_lightning.callbacks import EarlyStopping

from pytorch_lightning.loggers import TensorBoardLogger

import optuna

from optuna.integration import PyTorchLightningPruningCallback

When constructing neural community fashions with PyTorch Lightning, it’s a frequent follow to set a random seed for reproducibility. You are able to do this by including pl.seed_everything(42) at first of your code, proper after the imports.

Subsequent, we outline our neural community mannequin structure by creating a category that inherits pl.LightningModule: Lightning’s counterpart to PyTorch’s Module class.

class MNISTClassifier(pl.LightningModule): def __init__(self, layer_1_size=128, layer_2_size=256, learning_rate=1e-3, dropout_rate=0.5): tremendous().__init__() self.save_hyperparameters() # Neural community structure self.layer_1 = nn.Linear(28 * 28, self.hparams.layer_1_size) self.layer_2 = nn.Linear(self.hparams.layer_1_size, self.hparams.layer_2_size) self.layer_3 = nn.Linear(self.hparams.layer_2_size, 10) self.dropout = nn.Dropout(self.hparams.dropout_rate) def ahead(self, x): # Flatten layer batch_size, _, _, _ = x.measurement() x = x.view(batch_size, -1) # Ahead go x = F.relu(self.layer_1(x)) x = self.dropout(x) x = F.relu(self.layer_2(x)) x = self.dropout(x) x = self.layer_3(x) return F.log_softmax(x, dim=1) def training_step(self, batch, batch_idx): x, y = batch logits = self(x) loss = F.nll_loss(logits, y) self.log(‘train_loss’, loss, prog_bar=True) return loss def validation_step(self, batch, batch_idx): x, y = batch logits = self(x) loss = F.nll_loss(logits, y) preds = torch.argmax(logits, dim=1) acc = accuracy(preds, y) self.log(‘val_loss’, loss, prog_bar=True) self.log(‘val_acc’, acc, prog_bar=True) return {‘val_loss’: loss, ‘val_acc’: acc} def test_step(self, batch, batch_idx): x, y = batch logits = self(x) loss = F.nll_loss(logits, y) preds = torch.argmax(logits, dim=1) acc = accuracy(preds, y) self.log(‘test_loss’, loss, prog_bar=True) self.log(‘test_acc’, acc, prog_bar=True) return {‘test_loss’: loss, ‘test_acc’: acc} def configure_optimizers(self): optimizer = torch.optim.Adam(self.parameters(), lr=self.hparams.learning_rate) return optimizer

class MNISTClassifier(pl.LightningModule):

def __init__(self,

layer_1_size=128,

layer_2_size=256,

learning_rate=1e–3,

dropout_rate=0.5):

tremendous().__init__()

self.save_hyperparameters()

# Neural community structure

self.layer_1 = nn.Linear(28 * 28, self.hparams.layer_1_size)

self.layer_2 = nn.Linear(self.hparams.layer_1_size, self.hparams.layer_2_size)

self.layer_3 = nn.Linear(self.hparams.layer_2_size, 10)

self.dropout = nn.Dropout(self.hparams.dropout_rate)

def ahead(self, x):

# Flatten layer

batch_size, _, _, _ = x.measurement()

x = x.view(batch_size, –1)

# Ahead go

x = F.relu(self.layer_1(x))

x = self.dropout(x)

x = F.relu(self.layer_2(x))

x = self.dropout(x)

x = self.layer_3(x)

return F.log_softmax(x, dim=1)

def training_step(self, batch, batch_idx):

x, y = batch

logits = self(x)

loss = F.nll_loss(logits, y)

self.log(‘train_loss’, loss, prog_bar=True)

return loss

def validation_step(self, batch, batch_idx):

x, y = batch

logits = self(x)

loss = F.nll_loss(logits, y)

preds = torch.argmax(logits, dim=1)

acc = accuracy(preds, y)

self.log(‘val_loss’, loss, prog_bar=True)

self.log(‘val_acc’, acc, prog_bar=True)

return {‘val_loss’: loss, ‘val_acc’: acc}

def test_step(self, batch, batch_idx):

x, y = batch

logits = self(x)

loss = F.nll_loss(logits, y)

preds = torch.argmax(logits, dim=1)

acc = accuracy(preds, y)

self.log(‘test_loss’, loss, prog_bar=True)

self.log(‘test_acc’, acc, prog_bar=True)

return {‘test_loss’: loss, ‘test_acc’: acc}

def configure_optimizers(self):

optimizer = torch.optim.Adam(self.parameters(), lr=self.hparams.learning_rate)

return optimizer

We named our class MNISTClassifier as a result of it’s a easy feed-forward neural community classifier we’ll prepare on the MNIST dataset for low-resolution picture classification. It consists of a small variety of linear layers preceded by an enter layer that flattens the two-dimensional picture information, and ReLU activation operate in between. The category additionally defines each the ahead() methodology for ahead passes, and different strategies that emulate coaching, check, and validation steps.

Outdoors the newly outlined class, the next operate may also be helpful to calculate the imply accuracy throughout a set of many predictions:

def accuracy(preds, y): return (preds == y).float().imply()

def accuracy(preds, y):

return (preds == y).float().imply()

The next operate establishes a knowledge preparation pipeline that we’ll later apply to the MNIST dataset. It converts the dataset to tensors, normalizes it primarily based on a priori identified statistics of this dataset, downloads and splits the coaching information into coaching and validation, and creates one DataLoader object for every of the three information subsets.

def prepare_data(): rework = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) # Mean and Stdev of MNIST dataset ]) # Obtain the dataset mnist_train = datasets.MNIST(‘information’, prepare=True, obtain=True, rework=rework) mnist_test = datasets.MNIST(‘information’, prepare=False, obtain=True, rework=rework) # Break up authentic coaching information into coaching and validation units mnist_train, mnist_val = random_split(mnist_train, [55000, 5000]) # Create DataLoaders for PyTorch information administration train_loader = DataLoader(mnist_train, batch_size=64, shuffle=True) val_loader = DataLoader(mnist_val, batch_size=64) test_loader = DataLoader(mnist_test, batch_size=64)

def prepare_data():

rework = transforms.Compose([

transforms.ToTensor(),

transforms.Normalize((0.1307,), (0.3081,)) # Mean and Stdev of MNIST dataset

])

# Obtain the dataset

mnist_train = datasets.MNIST(‘information’, prepare=True, obtain=True, rework=rework)

mnist_test = datasets.MNIST(‘information’, prepare=False, obtain=True, rework=rework)

# Break up authentic coaching information into coaching and validation units

mnist_train, mnist_val = random_split(mnist_train, [55000, 5000])

# Create DataLoaders for PyTorch information administration

train_loader = DataLoader(mnist_train, batch_size=64, shuffle=True)

val_loader = DataLoader(mnist_val, batch_size=64)

test_loader = DataLoader(mnist_test, batch_size=64)

The goal() operate is the core aspect Optuna supplies for outlining a hyperparameter optimization framework. We outline it by first defining a search area: hyperparameters and doable values for them to attempt. There will be architectural hyperparameters associated to the layers of the neural community, or hyperparameters associated to the coaching algorithm, like the educational charge and dropout charge, for preventing points like overfitting.

Inside this operate, we attempt to initialize and prepare a mannequin for every doable hyperparameter setting, with a callback included for early stopping if the mannequin stabilizes early. Similar to normal PyTorch, a Coach is used to mannequin the coaching course of.

# Outline the target operate for Optuna def goal(trial): # Set the hyperparameters to optimize layer_1_size = trial.suggest_int(‘layer_1_size’, 64, 256) layer_2_size = trial.suggest_int(‘layer_2_size’, 128, 512) learning_rate = trial.suggest_float(‘learning_rate’, 1e-4, 1e-2, log=True) dropout_rate = trial.suggest_float(‘dropout_rate’, 0.2, 0.7) # Create the mannequin with trial hyperparameters mannequin = MNISTClassifier( layer_1_size=layer_1_size, layer_2_size=layer_2_size, learning_rate=learning_rate, dropout_rate=dropout_rate ) # Early stopping callback early_stop_callback = EarlyStopping( monitor=”val_loss”, persistence=5, verbose=False, mode=”min” ) # Optuna pruning callback pruning_callback = PyTorchLightningPruningCallback(trial, monitor=”val_loss”) # Logger logger = TensorBoardLogger(save_dir=os.getcwd(), identify=f”optuna_logs/trial_{trial.quantity}”) # Create coach coach = pl.Coach( max_epochs=10, callbacks=[early_stop_callback, pruning_callback], logger=logger, enable_progress_bar=False, enable_model_summary=False ) # Getting ready the information train_loader, val_loader, test_loader = prepare_data() # Coaching the mannequin coach.match(mannequin, train_loader, val_loader) # Ultimate validation loss return coach.callback_metrics[‘val_loss’].merchandise()

# Outline the target operate for Optuna

def goal(trial):

# Set the hyperparameters to optimize

layer_1_size = trial.suggest_int(‘layer_1_size’, 64, 256)

layer_2_size = trial.suggest_int(‘layer_2_size’, 128, 512)

learning_rate = trial.suggest_float(‘learning_rate’, 1e–4, 1e–2, log=True)

dropout_rate = trial.suggest_float(‘dropout_rate’, 0.2, 0.7)

# Create the mannequin with trial hyperparameters

mannequin = MNISTClassifier(

layer_1_size=layer_1_size,

layer_2_size=layer_2_size,

learning_rate=learning_rate,

dropout_rate=dropout_charge

)

# Early stopping callback

early_stop_callback = EarlyStopping(

monitor=‘val_loss’,

persistence=5,

verbose=False,

mode=‘min’

)

# Optuna pruning callback

pruning_callback = PyTorchLightningPruningCallback(trial, monitor=‘val_loss’)

# Logger

logger = TensorBoardLogger(save_dir=os.getcwd(), identify=f“optuna_logs/trial_{trial.quantity}”)

# Create coach

coach = pl.Coach(

max_epochs=10,

callbacks=[early_stop_callback, pruning_callback],

logger=logger,

enable_progress_bar=False,

enable_model_summary=False

)

# Getting ready the information

train_loader, val_loader, test_loader = prepare_data()

# Coaching the mannequin

coach.match(mannequin, train_loader, val_loader)

# Ultimate validation loss

return coach.callback_metrics[‘val_loss’].merchandise()

One other Optuna key operate to control the optimization course of is run_optimization, the place we point out the variety of random trials to run, primarily based on the specs outlined within the earlier optimization operate.

def run_optimization(n_trials=20): pruner = optuna.pruners.MedianPruner(n_startup_trials=5, n_warmup_steps=10) research = optuna.create_study(route=’decrease’, pruner=pruner) research.optimize(goal, n_trials=n_trials) print(“Finest trial:”) trial = research.best_trial print(f” Worth: {trial.worth}”) print(” Params: “) for key, worth in trial.params.objects(): print(f” {key}: {worth}”) return research

def run_optimization(n_trials=20):

pruner = optuna.pruners.MedianPruner(n_startup_trials=5, n_warmup_steps=10)

research = optuna.create_study(route=‘decrease’, pruner=pruner)

research.optimize(goal, n_trials=n_trials)

print(“Finest trial:”)

trial = research.best_trial

print(f” Worth: {trial.worth}”)

print(” Params: “)

for key, worth in trial.params.objects():

print(f” {key}: {worth}”)

return research

As soon as the hyperparameter optimization course of has been accomplished and one of the best mannequin configuration is recognized, one other operate is required to take these outcomes and consider that mannequin’s efficiency on a check set for last validation.

def test_best_model(research): # Getting one of the best hyperparameters best_params = research.best_trial.params # Creating the mannequin with one of the best hyperparameters mannequin = MNISTClassifier( layer_1_size=best_params[‘layer_1_size’], layer_2_size=best_params[‘layer_2_size’], learning_rate=best_params[‘learning_rate’], dropout_rate=best_params[‘dropout_rate’] ) # Creating coach occasion coach = pl.Coach(max_epochs=10) # Getting ready the information train_loader, val_loader, test_loader = prepare_data() # Coaching the mannequin with one of the best hyperparameters coach.match(mannequin, train_loader, val_loader) # Testing the mannequin with the check information outcomes = coach.check(mannequin, test_loader) return outcomes

def test_best_model(research):

# Getting one of the best hyperparameters

best_params = research.best_trial.params

# Creating the mannequin with one of the best hyperparameters

mannequin = MNISTClassifier(

layer_1_size=best_params[‘layer_1_size’],

layer_2_size=best_params[‘layer_2_size’],

learning_rate=best_params[‘learning_rate’],

dropout_rate=best_params[‘dropout_rate’]

)

# Creating coach occasion

coach = pl.Coach(max_epochs=10)

# Getting ready the information

train_loader, val_loader, test_loader = prepare_data()

# Coaching the mannequin with one of the best hyperparameters

coach.match(mannequin, train_loader, val_loader)

# Testing the mannequin with the check information

outcomes = coach.check(mannequin, test_loader)

return outcomes

Now that we’ve got all of the lessons and features we want, we finalize with a demo that places all of it collectively.

research = run_optimization(n_trials=5) # Visualize the outcomes attempt: # Plot optimization historical past optuna.visualization.plot_optimization_history(research) # Plot parameter importances optuna.visualization.plot_param_importances(research) # Plot parallel coordinate plot optuna.visualization.plot_parallel_coordinate(research) besides ImportError: print(“Visualization requires plotly. Set up with: pip set up plotly”) # Check one of the best mannequin outcomes = test_best_model(research) print(f”Check outcomes with greatest hyperparameters: {outcomes}”)

research = run_optimization(n_trials=5)

# Visualize the outcomes

attempt:

# Plot optimization historical past

optuna.visualization.plot_optimization_history(research)

# Plot parameter importances

optuna.visualization.plot_param_importances(research)

# Plot parallel coordinate plot

optuna.visualization.plot_parallel_coordinate(research)

besides ImportError:

print(“Visualization requires plotly. Set up with: pip set up plotly”)

# Check one of the best mannequin

outcomes = test_best_model(research)

print(f“Check outcomes with greatest hyperparameters: {outcomes}”)

Here’s a TL;DR breakdown of the workflow:

Run a research (set of Optuna experiments) specifying the variety of trials
Carry out a sequence of visualizations of the optimization course of alongside coaching procedures
As soon as one of the best mannequin is discovered, expose it to the check information to additional consider it

Wrapping Up

This text illustrates easy methods to use PyTorch Lightning and Optuna collectively to carry out environment friendly and efficient hyperparameter optimization for neural community fashions. Optuna supplies enhanced algorithms for mannequin tuning, and PyTorch Lightning is constructed on prime of PyTorch to additional simplify the method of neural community modeling at the next degree.

Advertise here

Source link