AIMultiple ResearchAIMultiple ResearchAIMultiple Research
Accelerator
Updated on Apr 2, 2025

AI Chips: A Guide to Cost-efficient AI Training & Inference

In last decade, machine learning, especially deep neural networks have played a critical role in the emergence of commercial AI applications. Deep neural networks were successfully implemented in early 2010s thanks to the increased computational capacity of modern computing hardware. AI hardware is a new generation of hardware custom built for machine learning applications.

As AI and its applications grow, the race to develop cheaper, faster chips will intensify among tech giants. Companies can rent hardware from cloud providers like AWS Sagemaker or purchase their own. Owning hardware can lower costs if utilization is high, but if not, relying on cloud vendors is a better option.

What are AI chips?

AI chips (also called AI hardware or AI accelerator) are specially designed accelerators for artificial neural network (ANN) based applications. Most commercial ANN applications are deep learning applications.

ANN is a subfield of artificial intelligence. ANN is a machine learning approach inspired by the human brain. It includes layers of artificial neurons which are mathetical functions inspired by how human neurons work. ANNs can be built as deep networks with multiple layers. Machine learning applications using such networks is called deep learning. Deep learning has 2 main use cases:

  • Training: A deep ANN is fed thousands of labeled data so it can identify patterns. Training is time consuming and intensive for computing resources
  • Inference: As a result of the training process, ANN is able to make predictions based on new inputs.

General-purpose chips can run ANN applications, but they are not the most efficient solution. AI chips are customized for different ANN applications. For example, in IoT applications with battery-operated devices, AI chips must be small and energy-efficient. This requires chip manufacturers to make different architectural choices based on the specific needs of each application.

What are AI chip’s components?

The hardware infrastructure of an AI chip consists of three parts: computing, storage and networking. While computing or processing speed have been developing rapidly in recent years, it seems like some more time is needed for storage and networking performance upgrades. Hardware giants like Intel, IBM, Nvidia are competing to improve the storage and networking modules of the hardware infrastructure.

Source: McKinsey

Why are AI chips higher performing than general purpose chips?

General purpose hardware uses arithmetic blocks for basic in-memory calculations. The serial processing does not give sufficient performance for deep learning techniques.

  • Neural networks need many parallel/simple arithmetic operations
  • Powerful general purpose chips can not support a high number of simple, simultaneous operations
  • AI optimized HW includes numerous less powerful chips which enables parallel processing

The AI accelerators bring the following advantages over using general purpose hardware:

  • Faster computation: Artificial intelligence applications typically require parallel computational capabilities in order to run  sophisticated training models and algorithms. AI hardware provides more parallel processing capability that is estimated to have up to 10 times more competing power in ANN applications compared to traditional semiconductor devices at similar price points.
  • High bandwidth memory: Specialized AI hardware is estimated to allocate 4-5 times more bandwidth than traditional chips. This is necessary because due to the need for parallel processing, AI applications require significantly more bandwidth between processors for efficient performance.

Why is AI chip important now?

Deep neural network powered solutions make up most commercial AI applications. The number and importance of these applications have been growing strongly since 2010s and are expected to keep on growing at a similar pace. For example, McKinsey predicts AI applications to generate $4-6 trillions of value annually.

Another recent Mckinsey research states that AI-associated semiconductors will see approximately 18 percent growth annually over the next few years. This is five times more than the growth of semiconductors used in non-AI applications. The same research states that AI hardware will be estimated to become a $67 billion market in revenue.

5 AI chips design approaches

AI chips use novel architectures for improved performance. We sorted them some of these approaches from most commonplace to emerging approaches:

  • GPUs: Graphical Processing Units were originally designed for accelerating graphical processing via parallel computing. The same approach is also effective at training deep learning applications and is currently one of the most common hardware used by deep learning software developers.
  • Wafer chips: For example Cerebras is building wafer chips by producing a 46,225 square millimeter (~72 square inch) silicon wafers containing 1.2 trillion transistors on a single chip. Thanks to its high volume, the chip has 400,000 processing cores on it. Such large chips exhibit economies of scale but present novel materials science and physics challenges.
  • Reconfigurable neural processing unit (NPU): The architecture provides parallel computing and pooling to increase overall performance. It is specialized in Convolution Neural Network (CNN) applications which are a popular architecture for Artificial Neural Networks (ANNs) in image recognition. San Diego and Taipei based low power edge AI startup Kneron licences the architecture on which their chips are based; a reconfigurable neural processing unit (NPU). The fact that this architecture can be reconfigured to switch between models in real time allows creating an optimized hardware depending on the needs of the application.  US National Institute of Standards and Technology (NIST) recognized Kneron’s facial recognition model as the best performing model under 100 MB.
  • Neuromorphic chip architectures: These are an attempt to mimic brain cells using novel approaches from adjacent fields such as materials science and neuroscience. These chips can have an advantage in terms of speed and efficiency on training neural networks. Intel has been producing such chips for the research community since 2017 under the names Loihi and Pohoiki.
  • Analog memory-based technologies: Digital systems built on 0’s and 1’s dominate today’s computing world. However, analog techniques contain signals that are constantly variable and have no specific ranges. IBM research team demonstrated that large arrays of analog memory devices achieve similar levels of accuracy as GPUs in deep learning applications.

5 Important criteria in assessing AI Hardware

The needs of the team are the top priority. Cloud solutions like AWS Sagemaker allow quick model training by scaling to numerous GPUs, but they come with higher costs compared to on-premise models. While cloud is ideal for initial testing, it may not be suitable for large teams building mature applications with high utilization of the company’s own AI hardware.

Once you decide that your company needs to buy its own AI chips, these are the important characteristics of chips to be used in an assessment:

  • Processing speed: AI hardware enables faster training and inference using neural networks.
    • Faster training enables the machine learning engineers to try different deep learning approaches or on optimizing the structure of their neural network (hyperparameter optimization)
    • Faster inferences (e.g. predictions) are critical for applications like autonomous driving  
  • Development platforms: Building applications on a standalone chip is challenging as the chip needs to be supported by other hardware and software for developers to build applications on them using high level programming languages. An AI accelerator which lacks a development board would make this device challenging to use in the beginning and difficult to benchmark.
  • Power requirements: Chips that will function on battery need to be able to work with limited power consumption to maximize device lifetime.
  • Size: In IoT applications, device size may be important in applications like mobile phones  or small devices. 
  • Cost: As always, Total Cost of Ownership of the device is critical for any procurement decision. 

Top AI hardware benchmarks

An objective performance benchmark of an AI hardware on deep learning application is difficult to obtain. The existing benchmarks tend to compare two different AI hardware in terms of speed and power consumption.

Here is a medium article which compares several AI hardware with both an apple computer and some developments boards.

Comparison of chips

This benchmark compares several devices with simple computer and a Macbook Pro. Two first six rows are different AI hardware. While the performance of Macbook is better than some AI accelerators, its power consumption and price make it prohibitively expensive.

Finding an objective benchmark for general deep learning usage is challenging. Users of both cloud and on-premise AI hardware should benchmark these systems with their own applications to assess performance. Benchmarking cloud services is easier, while testing own hardware can be more time-consuming. If using common AI hardware, companies can benchmark it through cloud services that share hardware specs. If not, they may need to request sample hardware from the supplier for testing.

Leading providers for AI Hardware

The technical complexity of producing a working semiconductor device does not allow startups or small teams to build AI hardware. This is why hardware tech leaders dominate AI hardware industry. According to Forbes, even Intel with numerous world class engineers and a strong research background, needed 3 years of work to build the Nervana neural network processor.

We have prepared a comprehensive, sortable list of companies working on AI chips. You can also see some of the companies working on AI hardware below:

  • Advanced Micro Devices(AMD)
  • Apple
  • Arm
  • Baidu
  • Google(Alphabet)
  • Graphcore
  • Huawei
  • IBM
  • Intel
  • Microsoft
  • Nvidia
  • Texas instruments
  • Qualcomm
  • Xilinx

For in-depth analysis regarding AI chip makers you can read our Top 10 AI Chip Makers: In-depth Guide article.

 Also, don’t forget to read our article on TinyML.

If you have questions about how ai hardware can help your business, we can help:

Find the Right Vendors
Share This Article
MailLinkedinX
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Next to Read

Comments

Your email address will not be published. All fields are required.

0 Comments