AI inference chips vs. training chips

inference chip

What is AI inference?

AI inference is the process of processing data using a trained AI model. For example, using a trained image recognition model to recognize photos or a trained speech model to transcribe speech. Once a model is deployed, its algorithmic logic (such as the convolutional layers of a CNN or the attention mechanism of a Transformer) and computational flow (input and output formats, accuracy requirements) are permanently fixed and rarely require adjustment.

AI Chip Type

Based on their application, AI chips can be divided into two categories: AI training chips and AI inference chips.

Based on their chip architecture, AI chips can be divided into two categories: GPUs and ASICs. This blog has already covered this topic extensively. For details, please refer to my posts of “Top vendors and uses of GPU“, “Comparison of AI chips GPU and ASIC“, “ASIC market is getting bigger, and related listed companies in the US and Taiwan

AI inference chips

AI Training Market

The AI ​​training chip market has few competitors. Nvidia alone holds over 90% of the market share. Its Blackwell architecture supports 1.8-Million-parameter model training, and NVLink 6 technology enables seamless interconnection of 72-card clusters.

AMD is the only other company besides Nvidia with a significant market share in the AI ​​training market, but its market share is on a different scale and cannot be compared. Intel’s Gaudi chip has virtually no market presence, with a market share of less than 1%.

AI inference chips are primarily ASICs

Because AI inference involves unique algorithms designed by each manufacturer, it must be customized. Customized chips are essentially ASICs, so AI inference chips are primarily ASICs.

AI Inference Chip Market size

According to Verified Market Research, the AI ​​inference chip market size was $15.8 billion in 2023 and is expected to reach $90.6 billion by 2030, with a compound annual growth rate of 22.6% during the 2024-2030 forecast period.

Key Advantages of ASICs

Suitable for Inference

As mentioned earlier, AI inference involves unique algorithms designed by each manufacturer and must be customized to maximize the efficiency of the algorithms and the unique features of each manufacturer to meet specific needs.

This customized chip requires an ASIC. This is why, in addition to purchasing large quantities of general-purpose GPUs, each manufacturer must develop its own ASIC chips to achieve the AI ​​inference functions it requires.

Removing flexibility leads to faster performance

“Fixedness” is the core advantage of ASICs—customizing the hardware architecture for a single task. The computational logic and data paths of the inference algorithm can be directly “hardened” into the chip, eliminating all irrelevant general-purpose computing units (such as the dynamic scheduling module and general-purpose memory controller used for training in GPUs), allowing hardware resources to be 100% dedicated to inference computing.

Cost Efficiency

Inference scenarios are much more sensitive to “energy efficiency ratio” (computing power per watt of power consumption) and “cost” than training, and ASICs have a crushing advantage in both these areas.

In terms of energy efficiency, the Google TPU v5e TPU is three times more energy efficient than the NVIDIA H100.

In terms of cost, AWS’s Trainium 2 offers a 30%-40% better price-performance ratio than the H100 for inference tasks. Google’s TPUv5 and Amazon’s Trainium 2 have unit computing power costs that are only 70% and 60% of Nvidia’s H100, respectively.

A large model may only require dozens to hundreds of chips for training (e.g., GPUs), but the inference phase may require tens or even hundreds of thousands of chips (for example, ChatGPT’s inference cluster is over 10 times the size of its training cluster). Therefore, customized ASIC designs can reduce the cost per chip.

Disadvantages of ASICs

Time to market is time-consuming

The ASIC design cycle can take up to one to two years, while AI model iteration is extremely rapid (for example, the transition from GPT-3 to GPT-4 for large models took only one year). If the model targeted at the time of ASIC design becomes outdated (e.g., the Transformer replaces CNN), the chip may become ineffective.

ASICs are less suitable for AI training

Similarly, ASICs are relatively weak in training tasks. Because training algorithms evolve rapidly and demand is flexible, using ASICs for training exposes them to the risk of chip failure during algorithm updates, making them much less cost-effective.

Top Inference Chips on the Market

Famous inference chips

Almost every world-renowned tech giant you’re familiar with, including Apple, Amazon, Alphabet, Meta, Microsoft, Tencent, ByteDance, Alibaba, and OpenAI, has deployed, is in the process of deploying, or is commissioning chip designers to develop inference chips.

Mostly Outsourced Design

In the ASIC market, major AI companies are mostly software companies and lack a deep bench of chip design talent, so they must outsource chip design.

Currently, Broadcom holds the top spot with a 55%-60% market share, and Marvell is second with a 13%-15% share.

Notable Deployed Inference Chips

The following is a list of notable deployed inference chips, excluding those currently under design.

Company nameProductArchitectureApplication
AlphabetTPU seriesASICInference, training
AmazonInferentia, TrainiumASICInference chip Inferentia; traning chip Trainium
MicrosoftMaia 100ASICInference, training
MetaMTIA seriesASICInference, training
Huawei HiSiliconAscend 910 seriesASICInference, training
Cambricon TechMLU arch seriesASICInference, training

Other vendors

Note that AI chips from Nvidia, AMD, and Intel can also be used for inference, though the performance isn’t as impressive as when used for training.

In addition, several smaller startups, including SambaNova, Cerebras Systems, Graphcore, Groq, Tenstorrent, Hailo, Mythic, and KAIST’s C-Transformer, have also launched AI chips that can also be used for inference. However, their shipments are relatively small and cannot compare to the AI ​​inference chips designed in-house by the tech giants.

inference chip

Related articles

Disclaimer

  • The content of this site is the author’s personal opinions and is for reference only. I am not responsible for the correctness, opinions, and immediacy of the content and information of the article. Readers must make their own judgments.
  • I shall not be liable for any damages or other legal liabilities for the direct or indirect losses caused by the readers’ direct or indirect reliance on and reference to the information on this site, or all the responsibilities arising therefrom, as a result of any investment behavior.
error: Content is protected !!