Developing machine learning models to solve business problems involves trying different ML models to find the one that fits the problem best, as well as different model architectures specific to the selected model. In this article, we will explore the process of selecting hyperparameters or parameters that define the architecture of a model.
What are model hyperparameters?
In machine learning, a hyperparameter refers to a parameter that is not learned from the training data but is set by the practitioner before the training process.
Model hyperparameters and model parameters are sometimes used interchangeably but they are not the same:
- Model parameters are the properties of the training data that are learned by the ML model during training. So, parameters are internal to the model. For instance, the coefficients in regression or the weights and biases in a neural network are model parameters since they are estimated through the training process.
- Model hyperparameters are configurations of the model that are set before and determine the training process. Hyperparameters are external to the model. The “hyper-” prefix implies that they are higher-level parameters that control the learning process. Some examples of hyperparameters include:
- Number of hidden layers in a neural network
- Number of leaves of a decision tree
- Learning rate of a gradient descent
- The ratio between the training set and test set
What is hyperparameter optimization?
Hyperparameter optimization, also called hyperparameter tuning, is the process of searching for a set of hyperparameters that gives the best model results on a given dataset.
Why is hyperparameter optimization important?
Tuning hyperparameters helps machine learning models generalize well. Generalization refers to the ability of the model to perform well on training data as well as on new data. A model fails to generalize due to:
- Overfitting: The model learns the specific patterns of the training dataset so well that it may performs poorly on the test dataset. This means that the model is useful only with the training dataset and does not generalize to new data.
- Underfitting: The model performs poorly both on training data and test data.
For instance, in a decision tree model, the maximum number of splits a tree can make before making predictions (the depth of the tree) is a hyperparameter.
- If the depth is too high, the model will yield a large number of categories that are specific to the training set (overfitting).
- If the depth is too low, the model will classify the data into a small number of broad categories that are not useful (underfitting).
- Optimal depth can be obtained through tuning the depth hyperparameter by trying different depths and running the model.
What are hyperparameter optimization techniques?
Apart from manual trial and error, there are three main methods to find the optimal set of values for hyperparameters:
- Grid search is determining a set of values for each hyperparameter, running the model with each possible combination of these values, and picking the set of values that produce the best results. Grid search involves guesswork since the values to be tried are set manually by the practitioner.
- Random search involves picking random combinations of hyperparameter values from given statistical distributions to find an optimal set of values. The advantage of random search over grid search is that it allows searching in a wider range of values without increasing the number of trials.
- Bayesian search is a sequential method that uses the results of the previous sets of hyperparameters to improve the next search process. Bayesian search reduces the optimization time, especially for models trained on a large amount of data.
What are the tools for hyperparameter optimization?
There are both open source and commercial tools for hyperparameter tuning. Some tools are:
MLOps platforms that provide end-to-end machine learning lifecycle management also include tools for hyperparameter optimization. You can check our article on MLOps tools both for MLOps platforms and for tools that carry out specific tasks within MLOps practices. Also, feel free to check our sortable/filterable list of MLOps platforms.
If you have questions about hyperparameter optimization and its tools, we can help:
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month.
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
To stay up-to-date on B2B tech & accelerate your enterprise:Follow on
Next to Read
Your email address will not be published. All fields are required.