Automated machine learning (AutoML) has the potential to increase the productivity of data scientists significantly and democratize machine learning tools. According to Gartner, more than 40% of data science tasks will be automated by 2020. As the need for data scientists is increasing, autoML tools/services become more popular and help companies use machine learning successfully to extract business insights in an effective and scalable manner. It can be a powerful solution to the well documented scarcity of data scientists.
What is automated machine learning?
Automated Machine Learning (AutoML) is an emerging technology to automate manual and repetitive machine learning tasks. Automation of these tasks will accelerate processes, reduce errors and costs, and provide more accurate results, as it enables businesses to select the best-performing algorithm. Here is Wikipedia’s definition of autoML:
Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.
Which machine learning processes to automate?
AutoML services aim to automate some or all steps of the machine learning process which includes:
- Data pre-processing: This process includes improving data quality and converting unstructured, raw data to a structured format with methods like data cleaning, data integration, data transformation, and data reduction.
- Feature engineering: AutoML can automate this method to create features that are more compatible with machine learning algorithms by analyzing the input data.
- Feature extraction: This process includes combining different features, or datasets to generate new features that will enable more accurate results and reduce the size of data being processed.
- Feature selection: AutoML can automate the task of selecting only useful features for processing.
- Algorithm selection & hyperparameter optimization: AutoML tools can choose optimal hyperparameters and algorithms without human intervention.
Since accuracy of machine learning solutions can be measured, automated systems can fine-tune data, features, algorithms and hyperparameters of algorithms to generate accurate models relying on established machine learning knowledge and trial-and-error.
Please see the image below by DataRobot, a leading autoML vendor, where areas highlighted in gray illustrate which parts of the machine learning process are automated via autoML.
Why is it important now?
Need for more data scientists
As data science becomes a more integrated part of our lives, businesses need more solutions in this field and demand more data scientists to build these solutions. Without data science methods, companies might be unable to understand their processes, monitor performance levels, or take certain actions to prevent huge losses.
A 2017 IBM report shares that the demand for data scientists will increase by 28% by 2020. The same report also indicates that it takes 43-51 days on average to fill a data scientist position. Considering the scarcity of data scientists and the amount of time for building data science solutions, autoML solutions can help businesses satisfy their demand for data scientists.
Errors in applying machine learning algorithms
It is up to data scientists to implement machine learning algorithms and choose a method that works best for the business case. However, the implementation process is prone to human made errors and bias. AutoML tools can automate this process and also run a broader set of machine learning algorithms to select the best one, which might not be considered by data scientists before.
Today, Facebook trains around 300,000 machine learning models to improve its machine learning processes and even created its AutoML engineer named Asimo to generate improved versions of existing models automatically.
As these capabilities will accelerate machine learning processes, autoML solutions will improve the return on investment (ROI) of machine learning projects.
Have we reached peak autoML?
When we look at the interest in autoML, we observe an increasing trend since the beginning of 2017. The advances in AI algorithms and the increasing popularity of automation technologies might be the reasons for this growth.
As it becomes popular in only a few years, the autoML market has generated a revenue of $270 million in 2019 and is expected to reach $14,512 million by 2030, advancing at a CAGR of 43.7% during the forecast period (2020–2030). Considering that, we believe autoML hasn’t reached a peak, and that interest in autoML will continue to grow.
What are the benefits of autoML?
- Cost reductions
- Increased productivity for data scientists
- Democratization of machine learning reduces demand for data scientists
- Increased revenues and customer satisfaction
- Rolling out more models with increased accuracy can improve other, less tangible business results as well. For example, models lead to automation which improves employee engagement allowing them to focus on more interesting tasks
Why do we rely so much on data scientists while there are auto ML approaches?
Data scientists have 2 advantages in model building when compared to current auto ML approaches:
- Conformance to custom specifications: Most autoML tools optimize for model performance however that is just one of the specifications of real life machine learning projects. For example:
- If a model needs to be embedded in edge devices, computing and storage requirements force companies to choose simpler models.
- If explanability is desirable, only certain types of models can be used.
- Model performance: On Kaggle, the community of machine learning competitions, humans are still easily beating models generated by autoML tools. autoML tools have yet to win any data science competitions.
Over time, it is likely that autoML tools will grow stronger and these advantages will diminish or disappear. More importantly, data scientists and their managers are responsible for important tasks beyond modeling:
- Identify the models to be built. The first step in data science is also the most important one. It requires understanding the business, data accessible through internal and external resources, data quality issues, privacy and computing requirements, organizational challenges.
- Manage the human aspects of model implementation. They convince subject matter experts and executives of the superiority of the model compared to the current solution. They explain possible shortcomings of the model and take steps to overcome those shortcomings.
These are more fundamental strengths of data scientists and solving these problems will not come into the realm of machines for quite some time. However, that does not mean that data scientists are uniquely qualified to handle these challenges.
We expect the basics of data science to become as common knowledge as the basics of statistics today. While not everyone is familiar with advanced statistics, critical concepts like distributions, variance and mean are common knowledge and inform corporate decisions. Like excel democratized data storage and manipulation and augmented all white collar workers, autoML tools have the potential to democratize data science for companies.
Which autoML tool should we start with?
We analyzed the ecosystem of autoML providers in this comprehensive article. You can also see our sortable list of the most recent AutoML vendors in our website. Here are some of the leading vendors:
- Google Cloud AutoML
If you are interested feel free to read our AutoML case studies article. AutoML is an important part of future of AI, for more on trends shaping AI, feel free to read our research on future of AI.
If you have questions about how you can integrate AutoML solutions into your business, don’t hesitate to contact us:
Featured image source: gooddata.com