AI inference chips vs. training chips

What is AI inference?

AI inference is the process of processing data using a trained AI model. For example, using a trained image recognition model to recognize photos or a trained speech model to transcribe speech. Once a model is deployed, its algorithmic logic (such as the convolutional layers of a CNN or the attention mechanism of a Transformer) and computational flow (input and output formats, accuracy requirements) are permanently fixed and rarely require adjustment.

AI Chip Type

Based on their application, AI chips can be divided into two categories: AI training chips and AI inference chips.

Based on their chip architecture, AI chips can be divided into two categories: GPUs and ASICs. This blog has already covered this topic extensively. For details, please refer to my posts of “Top vendors and uses of GPU“, “Comparison of AI chips GPU and ASIC“, “ASIC market is getting bigger, and related listed companies in the US and Taiwan“

AI inference chips

AI Training Market

The AI training chip market has few competitors. Nvidia alone holds over 90% of the market share. Its Blackwell architecture supports 1.8-Million-parameter model training, and NVLink 6 technology enables seamless interconnection of 72-card clusters.

AMD is the only other company besides Nvidia with a significant market share in the AI training market, but its market share is on a different scale and cannot be compared. Intel’s Gaudi chip has virtually no market presence, with a market share of less than 1%.

AI inference chips are primarily ASICs

Because AI inference involves unique algorithms designed by each manufacturer, it must be customized. Customized chips are essentially ASICs, so AI inference chips are primarily ASICs.

AI Inference Chip Market size

According to Verified Market Research, the AI inference chip market size was $15.8 billion in 2023 and is expected to reach $90.6 billion by 2030, with a compound annual growth rate of 22.6% during the 2024-2030 forecast period.

Key Advantages of ASICs

Suitable for Inference

As mentioned earlier, AI inference involves unique algorithms designed by each manufacturer and must be customized to maximize the efficiency of the algorithms and the unique features of each manufacturer to meet specific needs.

This customized chip requires an ASIC. This is why, in addition to purchasing large quantities of general-purpose GPUs, each manufacturer must develop its own ASIC chips to achieve the AI inference functions it requires.

Removing flexibility leads to faster performance

“Fixedness” is the core advantage of ASICs—customizing the hardware architecture for a single task. The computational logic and data paths of the inference algorithm can be directly “hardened” into the chip, eliminating all irrelevant general-purpose computing units (such as the dynamic scheduling module and general-purpose memory controller used for training in GPUs), allowing hardware resources to be 100% dedicated to inference computing.

Cost Efficiency

Inference scenarios are much more sensitive to “energy efficiency ratio” (computing power per watt of power consumption) and “cost” than training, and ASICs have a crushing advantage in both these areas.

In terms of energy efficiency, the Google TPU v5e TPU is three times more energy efficient than the NVIDIA H100.

In terms of cost, AWS’s Trainium 2 offers a 30%-40% better price-performance ratio than the H100 for inference tasks. Google’s TPUv5 and Amazon’s Trainium 2 have unit computing power costs that are only 70% and 60% of Nvidia’s H100, respectively.

A large model may only require dozens to hundreds of chips for training (e.g., GPUs), but the inference phase may require tens or even hundreds of thousands of chips (for example, ChatGPT’s inference cluster is over 10 times the size of its training cluster). Therefore, customized ASIC designs can reduce the cost per chip.

Disadvantages of ASICs

Time to market is time-consuming

The ASIC design cycle can take up to one to two years, while AI model iteration is extremely rapid (for example, the transition from GPT-3 to GPT-4 for large models took only one year). If the model targeted at the time of ASIC design becomes outdated (e.g., the Transformer replaces CNN), the chip may become ineffective.

ASICs are less suitable for AI training

Similarly, ASICs are relatively weak in training tasks. Because training algorithms evolve rapidly and demand is flexible, using ASICs for training exposes them to the risk of chip failure during algorithm updates, making them much less cost-effective.

Top Inference Chips on the Market

Famous inference chips

Almost every world-renowned tech giant you’re familiar with, including Apple, Amazon, Alphabet, Meta, Microsoft, Tencent, ByteDance, Alibaba, and OpenAI, has deployed, is in the process of deploying, or is commissioning chip designers to develop inference chips.

Mostly Outsourced Design

In the ASIC market, major AI companies are mostly software companies and lack a deep bench of chip design talent, so they must outsource chip design.

Currently, Broadcom holds the top spot with a 55%-60% market share, and Marvell is second with a 13%-15% share.

Notable Deployed Inference Chips

The following is a list of notable deployed inference chips, excluding those currently under design.

Company name	Product	Architecture	Application
Alphabet	TPU series	ASIC	Inference, training
Amazon	Inferentia, Trainium	ASIC	Inference chip Inferentia; traning chip Trainium
Microsoft	Maia 100	ASIC	Inference, training
Meta	MTIA series	ASIC	Inference, training
Huawei HiSilicon	Ascend 910 series	ASIC	Inference, training
Cambricon Tech	MLU arch series	ASIC	Inference, training

Other vendors

Note that AI chips from Nvidia, AMD, and Intel can also be used for inference, though the performance isn’t as impressive as when used for training.

In addition, several smaller startups, including SambaNova, Cerebras Systems, Graphcore, Groq, Tenstorrent, Hailo, Mythic, and KAIST’s C-Transformer, have also launched AI chips that can also be used for inference. However, their shipments are relatively small and cannot compare to the AI inference chips designed in-house by the tech giants.

Disclaimer

The content of this site is the author’s personal opinions and is for reference only. I am not responsible for the correctness, opinions, and immediacy of the content and information of the article. Readers must make their own judgments.
I shall not be liable for any damages or other legal liabilities for the direct or indirect losses caused by the readers’ direct or indirect reliance on and reference to the information on this site, or all the responsibilities arising therefrom, as a result of any investment behavior.

Andy Lin

Stock Investment Books Published
English Blog
English Simple Bio
English Website
Facebook
Twitter Account
LinkedIn Account
股票投資著作書籍
 中文部落格
 中文簡歷
 中文網站
 臉書
 推特帳號
 領英帳號

Author: Andy Lin

Stock Investment Books Published English Blog English Simple Bio English Website Facebook Twitter Account LinkedIn Account 股票投資著作書籍中文部落格中文簡歷中文網站臉書推特帳號領英帳號 View all posts by Andy Lin

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31