Mastering Hyperparameter Tuning with TensorFlow and GCP | ML Study Jams Day 5

4 min readJul 29, 2024

Function Approximation in ML

Function approximation is a key concept in machine learning. It refers to the process of finding a function f that approximates a target function f* as closely as possible. The target function f* represents the true relationship between inputs and outputs in the data.

To find the error between actual output and predicted output, for a single instance is called Loss Function. Calculating Loss function for all instances becomes a Cost Function (hence the ∑ below).

Main Components Of Machine Learning

Optimization Algorithm
Parameters
Hyperparameters

Optimization Algorithm

An optimization algorithm in machine learning is a method used to adjust the parameters of a model in order to minimize (or maximize) a predefined objective function, typically a loss function. The purpose of optimization in machine learning is to find the best possible parameters that make the model perform well on given tasks.

Common Optimization Algorithms

Gradient Descent: Most widely used optimization algorithm. It involves updating the parameters in the direction opposite to the gradient of the objective function with respect to the parameters.

For Complete Understanding: Gradient-Descent

Stochastic Gradient Descent (SGD): A variation of gradient descent that updates the parameters using only one or a few training examples at a time. This makes the updates noisy but allows the algorithm to potentially escape local minima and converge faster

For Complete Understanding: SGD

Mini-batch Gradient Descent: A compromise between batch gradient descent and SGD. It updates the parameters using a small batch of training examples.

For Complete Understanding: Mini-batch

Parameters are the internal variables of a model that are learned from the training data. These parameters are crucial as they define the model’s behavior and influence its predictions. They are adjusted during the training process to minimize the difference between the predicted outputs and the actual outputs, usually by minimizing a loss function.

Consider you’re making a dish. So just like you might adjust the temperature, cooking time, or seasoning levels to achieve the best flavor in a recipe, you adjust hyperparameters to get the best performance from your machine learning model.

Some Examples

Regularization Parameters (λ): Parameters like L1 or L2 regularization that prevent overfitting by penalizing large weights.

Dropout Rate: The fraction of neurons to drop during training to prevent overfitting in neural networks.

Learning Rate (η): Controls how much to change the model’s parameters with respect to the loss gradient.

Problems With Hyperparameters

▸Choosing the right hyperparameters can be complex and time-consuming due to the large number of possible combinations.

▸Different models and datasets require different hyperparameter values, and there is no one-size-fits-all solution.

▸Hyperparameter tuning, especially with methods like grid search or random search, can be computationally expensive.

▸Hyperparameters tuned for one dataset might not generalize well to another dataset.

Solutions

1. Automated Hyperparameter Tuning

Grid Search: Systematically tries all possible combinations within a predefined set of hyperparameters. While exhaustive, it’s computationally expensive.
Random Search: Randomly samples hyperparameters within a specified range. Often more efficient than grid search in high-dimensional spaces.

Bayesian Optimization: Uses a probabilistic model to find the best hyperparameters by learning from previous evaluations. Libraries like Hyperopt, Optuna, and Scikit-optimize implement this approach.

Automated Machine Learning (AutoML): Platforms like AutoKeras, TPOT, and Google Cloud AutoML automatically search for the best model and hyperparameters.

2. Cross-Validation

Cross-validation helps in assessing how the model generalizes to an independent dataset and is useful for comparing the performance of different hyperparameter settings. Common strategies include:

k-Fold Cross-Validation: Splits the data into k subsets and trains the model k times, each time using a different subset as the validation set and the remaining data as the training set.
Stratified k-Fold Cross-Validation: Ensures that each fold has a representative proportion of classes, useful for imbalanced datasets.

Resources:

An Intuitive Understanding of the Basics: Medium (Recommended)
Kaggle: Hyperparameter-Notebook (Python Code For Everything Discussed)
TensorFlow Tutorial: Intro to Keras Tuner
Session Code: GitHub-Repo
Scikit-Learn Documentation: Hyperparameters
TensorFlow Documentation: Automated Hyperparameter

About Us

TFUG Islamabad is the premier community for TensorFlow enthusiasts. Our group is dedicated to bringing together professionals, researchers, and enthusiasts from diverse backgrounds who share a common interest in AI.

References:

🎙️ Featured Speaker: Imran us Salam
📽️ Video Link: Watch the session here

Written By:

Muhammad Ali