AIMultiple ResearchAIMultiple Research

Reproducible AI: Why it Matters & How to Improve it in 2024?

The ability to replicate an experiment or a study and obtain the same results by using the same methodology is a crucial part of scientific method. This is called reproducibility in scientific research and it is also important for artificial intelligence (AI) and machine learning (ML) applications. However:

  • ~5% of AI researchers share source code and less than a third of them share test data in their research papers.
  • Less than a third of AI research is reproducible, i.e. verifiable.

This is commonly referred to as the reproducibility or replication crisis in AI. In this article, we’ll explore why reproducibility is important for AI and how businesses can improve reproducibility in their AI applications.

What is reproducibility in artificial intelligence?

In the context of AI, reproducibility refers to the ability to achieve the same or similar results using the same dataset and AI algorithm within the same environment. Here,

  • The dataset refers to the training dataset that the AI algorithm takes as input to make predictions,
  • The AI algorithm consists of model type, model parameters and hyperparameters, features, and other code.
  • The environment refers to the software and hardware used to run the algorithm.

To achieve reproducibility in AI systems, changes in all three components must be tracked and recorded.

Why is reproducibility important in AI?

Reproducibility is crucial for both AI research and AI applications in the enterprise because:

  • For AI / ML research, scientific progress depends on the ability of independent researchers to scrutinize and reproduce the results of a study. Machine learning cannot be improved or applied in other areas if its essential components are not documented for reproducibility. A lack of reproducibility blurs the line between scientific production and marketing
  • For AI applications in business, reproducibility would enable
    • building AI systems that less error-prone. Fewer errors would benefit businesses and their customers
    • increased reliability and predictability since businesses can understand which components lead to certain results. This is necessary to convince decision makers to scale AI systems and enable more users to benefit from them
    • enable improved communication and collaboration between different teams

How to improve reproducibility in AI?

Best way to achieve AI reproducibility in the enterprise is by leveraging MLOps best practices. MLOps involves streamlining artificial intelligence and machine learning lifecycle with automation and a unified framework within an organization.

Some MLOps tools and techniques that facilitate reproducibility are: 

  • Experiment tracking: Developing AI and ML models is an iterative process where practitioners experiment with different model components such as datasets, model parameters, and codes. Experiment tracking tools help keep track of important information about these experiments in a structured manner. 
  • Data:
    • Lineage: Data lineage keeps track of where the data originates, what happens to it, and where it goes over the data lifecycle with recordings and visualizations.
    • Versioning: AI systems are often trained on dynamic datasets that reflect the changes in the underlying environment. Data versioning tools help companies store different versions of data that were created or changed at specific points in time.
  • Model
    • Versioning: Similarly, data versioning tools help keep track of different versions of AI models with different model types, parameters, hyperparameters etc. and allow companies to compare them.
    • Registry: Model registry is a central repository for all models and their metadata. This helps data scientists to access different models and their properties in different times.
    • Feature stores: Features are attributes of training data that is relevant to the problem that you would like to solve with your AI model. After feature engineering, feature stores standardize and store different features of data for easier reuse.

Feel free to check our article on MLOps tools and our data-driven list of MLOps platforms for more on MLOps tools. 

Apart from the tools, MLOps also helps businesses improve reproducibility by facilitating communication between data scientists, IT staff, subject matter experts, and operations professionals.

If you have other questions about AI/ML reproducibility or MLOps, feel free to reach out:

Find the Right Vendors
Access Cem's 2 decades of B2B tech experience as a tech consultant, enterprise leader, startup entrepreneur & industry analyst. Leverage insights informing top Fortune 500 every month.
Cem Dilmegani
Principal Analyst
Follow on

Cem Dilmegani
Principal Analyst

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

To stay up-to-date on B2B tech & accelerate your enterprise:

Follow on

Next to Read

Comments

Your email address will not be published. All fields are required.

2 Comments
Richard Rudd-Orthner
Oct 04, 2023 at 09:14

I have been working on this and have achieved it with on CPU. Repeatable determinism or reproducibility is a key stone of dependable systems and when applied in convolutional network can have higher accuracy.

These are some of the academically peer-reviewed publications made in the IEEE.

• [1] R. Rudd-Orthner and L. Mihaylova, “Non-Random weight initialisation in deep learning networks for repeatable determinism,” in Peer Reviewed Proc. of the 10th IEEE International Conference Dependable Systems Services and Technologies (DESSERT-19), Leeds, UK, 2019.
o This conference paper proved that an alternative to the random initialisation was possible and provided an almost equal performance but with reproducibility. Presented at the UK Ukraine and Northen Island IEEE branches conference in Leeds.

• [2] R. Rudd-Orthner and L. Milhaylova, “Repeatable determinism using non-random weight initialisations in smart city applications of deep learning,” Journal of Reliable Intelligent Environments in a Smart Cities special edition, vol. 6, no. 1, pp. 31-49, 2020.
o This Journal paper enhanced the performance to an equivalent performance by using the limits from He and Xavier and made the previous reproducibility a more general case for general use, although it was limited to Dense layers.

• [3] R. Rudd-Orthner and L. Milhaylova, “Non-random weight initialisation in deep convolutional networks applied to safety critical artificial intelligence,” in Peer Reviewed Proc. of the 13th International Conference on Developments in eSystems Engineering (DeSe), Liverpool, UK, 2020.
o This conference paper proved an approach to Convolutional layers that as alternative to the random initialisation and provided a higher performance with reproducibility. Presented at the UK and UAE IEEE branches conference in Liverpool held virtually.

• [4] R. Rudd-Orthner and L. Milhaylova, “Deep convnet: non-random weight initialization for repeatable determinism with FSGM,” Sensors, vol. 21, no. 14, p. 4772, 2021.
o This Journal paper extended the work into colour images proofs and used the cyber FSGM attack as a method for measuring effect in transferred learning.

• [5] R. Rudd-Orthner and L. Milhaylova, “Multi-type aircraft of remote sensing images: MTARSI2,” Zenodo, 30 June 2021. [Online]. Available: https://zenodo.org/record/5044950#.YcWalmDP2Ul. [Accessed 30 June 2021].
o This was the colour dataset used.

• [6] R. Rudd-Orthner, “Artificial Intelligence Methods for Security and Cyber Security Systems,” University of Sheffield, Sheffield, UK, 2022.
o This is the final full write up in the context and with other approaches.

Richard Rudd-Orthner
Oct 04, 2023 at 09:13

I have been working on this and have achieved it with on CPU. Repeatable determinism or reproducibility is a key stone of dependable systems and when applied in convolutional network can have higher accuracy.

These are some of the academically peer-reviewed publications made in the IEEE etc about Safety Critical AI.
• [1] R. Rudd-Orthner and L. Mihaylova, “Non-Random weight initialisation in deep learning networks for repeatable determinism,” in Peer Reviewed Proc. of the 10th IEEE International Conference Dependable Systems Services and Technologies (DESSERT-19), Leeds, UK, 2019.
o This conference paper proved that an alternative to the random initialisation was possible and provided an almost equal performance but with reproducibility. Presented at the UK Ukraine and Northen Island IEEE branches conference in Leeds.

• [2] R. Rudd-Orthner and L. Milhaylova, “Repeatable determinism using non-random weight initialisations in smart city applications of deep learning,” Journal of Reliable Intelligent Environments in a Smart Cities special edition, vol. 6, no. 1, pp. 31-49, 2020.
o This Journal paper enhanced the performance to an equivalent performance by using the limits from He and Xavier and made the previous reproducibility a more general case for general use, although it was limited to Dense layers.

• [3] R. Rudd-Orthner and L. Milhaylova, “Non-random weight initialisation in deep convolutional networks applied to safety critical artificial intelligence,” in Peer Reviewed Proc. of the 13th International Conference on Developments in eSystems Engineering (DeSe), Liverpool, UK, 2020.
o This conference paper proved an approach to Convolutional layers that as alternative to the random initialisation and provided a higher performance with reproducibility. Presented at the UK and UAE IEEE branches conference in Liverpool held virtually.

• [4] R. Rudd-Orthner and L. Milhaylova, “Deep convnet: non-random weight initialization for repeatable determinism with FSGM,” Sensors, vol. 21, no. 14, p. 4772, 2021.
o This Journal paper extended the work into colour images proofs and used the cyber FSGM attack as a method for measuring effect in transferred learning.

• [5] R. Rudd-Orthner and L. Milhaylova, “Multi-type aircraft of remote sensing images: MTARSI2,” Zenodo, 30 June 2021. [Online]. Available: https://zenodo.org/record/5044950#.YcWalmDP2Ul. [Accessed 30 June 2021].
o This was the colour dataset used.

• [6] R. Rudd-Orthner, “Artificial Intelligence Methods for Security and Cyber Security Systems,” University of Sheffield, Sheffield, UK, 2022.
o This is the final full write up in the context and with other approaches.

Related research