AIMultiple ResearchAIMultiple Research

What is Model Drift? Types & 4 Ways to Overcome in 2024

Cem Dilmegani
Updated on Jan 11
3 min read

Changes in the business environment are always to be expected. These can be changing customer habits, economic pressures, or natural disasters such as Covid-19. Therefore, it is also to be expected that the predictive accuracy of deployed machine learning models will decrease over time. 

In this article, we’ll explore why machine learning models degrade over time, or “drift”, and how you can detect and prevent it.

What is model drift?

Model drift, also called model decay, refers to the degradation of machine learning model performance over time. This means that the model suddenly or gradually starts to provide predictions with lower accuracy compared to its performance during the training period.

What are the types of model drift?

There are two main types of model drift:

Concept drift

Concept drift happens when the relationship between input variables (or independent variables) and the target variable (or dependent variable) changes. This means that the definition of what we are trying to predict changes so that our algorithm provides inaccurate predictions. This change can be gradual, sudden, or recurring.

  • Gradual concept drift: The change in fraudulent behavior is an example of gradual concept drift. As fraud detection methods become more sophisticated, fraudsters adapt to evade fraud detection systems by developing new strategies. An ML model trained on historical fraudulent transaction data would be unable to classify a new strategy as fraud. This means that the performance of the model would degrade because what is classified as fraud has changed over time.
  • Sudden concept drift: The Covid-19 pandemic suddenly changed consumer behavior. For instance, consumer spending on recreational durable goods such as home fitness equipment increased by 18%, while spending on transportation services decreased by 23% in 2020. A demand forecasting model trained with pre-pandemic data would not predict these changes in consumer habits.
  • Recurring concept drift: This is also called seasonality. For instance, retail sales increase significantly during the Christmas season or on Black Friday. An ML model that does not take these known recurring trend changes into account would provide inaccurate predictions for these periods.

Data drift

Figure 1. Changing age distribution can cause data drift.

Changing age distribution can cause model drift

Source: Evidently.ai 

Data drift occurs when the statistical properties of the input data change. For instance, as an online platform grows, the age distribution of its users may change over time. Since the usage habits of young and old people are not the same, a model trained on young people’s usage data would provide inaccurate predictions for old people’s behavior.

How to deal with model drift?

Monitor the performance of the model

In order to deal with model drift, data scientists should first be able to detect it before it causes major problems for end-users. Determining model performance metrics and continuously monitoring the performance of your model against them is therefore key to the long-term success of ML models. Feel free to check our article on model monitoring for different types of metrics and methods.

There are specialized model monitoring tools and also MLOps platforms that provide model monitoring capabilities for drift detection. You can check our article on MLOps tools for a selection of tools, and also our data-driven list of MLOps platforms.

Check data quality

Some rapid performance changes can be due to problems in training data quality such as biases in data rather than concept or data drift. If that is the case, the problem would reveal itself early when you apply your model in a real-world use case. You can check our article on bias in AI on different methods to fix it.

As an example, Google Health developed a deep learning model to detect a retina disease from patients’ eye scans. The model had 90% accuracy during its training phase but it failed to provide accurate results in real-life. This is because the model is trained with high-quality eye scans while real-world eye scans were lower in quality.

Ignoring a known seasonality is also a data quality issue. If your training data does not include recurring changes in the data, such as soaring retail sales during the Christmas season, this is a data quality issue that can be easily fixed.

Retrain the model

Figure 2. Periodically retraining the model can keep model quality high.

Periodically retraining the model can keep model quality high

Source: Databricks 

If you detect a concept or data drift, you can retrain your model with more recent data. Depending on the nature of the drift, there are different approaches:

  • Use only recent data if old data has become outdated,
  • Use all available data if the old data wouldn’t cause inaccurate model predictions,
  • If the deployed model allows weighting, use all available data but assign higher weights to recent data so that the model pays less attention to old data.

Another option is online learning where the model continuously learns in real-time with the data feed. This will enable the model to keep itself up to date with evolving datasets.

Tune the model

If retraining the model doesn’t suffice, rebuilding the model can also help. This is because you have built your model with old training data in mind. Running multiple experiments with different features, hyperparameters, model architectures, etc. can help you update your model to keep in line with new data.

If you have other questions about data science, machine learning models or artificial intelligence, feel free to ask:

Find the Right Vendors
Access Cem's 2 decades of B2B tech experience as a tech consultant, enterprise leader, startup entrepreneur & industry analyst. Leverage insights informing top Fortune 500 every month.
Cem Dilmegani
Principal Analyst
Follow on

Cem Dilmegani
Principal Analyst

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

To stay up-to-date on B2B tech & accelerate your enterprise:

Follow on

Next to Read

Comments

Your email address will not be published. All fields are required.

0 Comments