What is AI inference?
AI inference is the process of processing data using a trained AI model. For example, using a trained image recognition model to recognize photos or a trained speech model to transcribe speech. Once a model is deployed, its algorithmic logic (such as the convolutional layers of a CNN or the attention mechanism of a Transformer) and computational flow (input and output formats, accuracy requirements) are permanently fixed and rarely require adjustment.
AI Chip Type
Based on their application, AI chips can be divided into two categories: AI training chips and AI inference chips.
Based on their chip architecture, AI chips can be divided into two categories: GPUs and ASICs. This blog has already covered this topic extensively. For details, please refer to my posts of “Top vendors and uses of GPU“, “Comparison of AI chips GPU and ASIC“, “ASIC market is getting bigger, and related listed companies in the US and Taiwan“
AI inference chips
AI Training Market
The AI training chip market has few competitors. Nvidia alone holds over 90% of the market share. Its Blackwell architecture supports 1.8-Million-parameter model training, and NVLink 6 technology enables seamless interconnection of 72-card clusters.
AMD is the only other company besides Nvidia with a significant market share in the AI training market, but its market share is on a different scale and cannot be compared. Intel’s Gaudi chip has virtually no market presence, with a market share of less than 1%.
AI inference chips are primarily ASICs
Because AI inference involves unique algorithms designed by each manufacturer, it must be customized. Customized chips are essentially ASICs, so AI inference chips are primarily ASICs.
AI Inference Chip Market size
Key Advantages of ASICs
Suitable for Inference
As mentioned earlier, AI inference involves unique algorithms designed by each manufacturer and must be customized to maximize the efficiency of the algorithms and the unique features of each manufacturer to meet specific needs.
This customized chip requires an ASIC. This is why, in addition to purchasing large quantities of general-purpose GPUs, each manufacturer must develop its own ASIC chips to achieve the AI inference functions it requires.
Removing flexibility leads to faster performance
“Fixedness” is the core advantage of ASICs—customizing the hardware architecture for a single task. The computational logic and data paths of the inference algorithm can be directly “hardened” into the chip, eliminating all irrelevant general-purpose computing units (such as the dynamic scheduling module and general-purpose memory controller used for training in GPUs), allowing hardware resources to be 100% dedicated to inference computing.
Cost Efficiency
Inference scenarios are much more sensitive to “energy efficiency ratio” (computing power per watt of power consumption) and “cost” than training, and ASICs have a crushing advantage in both these areas.
In terms of energy efficiency, the Google TPU v5e TPU is three times more energy efficient than the NVIDIA H100.
In terms of cost, AWS’s Trainium 2 offers a 30%-40% better price-performance ratio than the H100 for inference tasks. Google’s TPUv5 and Amazon’s Trainium 2 have unit computing power costs that are only 70% and 60% of Nvidia’s H100, respectively.
A large model may only require dozens to hundreds of chips for training (e.g., GPUs), but the inference phase may require tens or even hundreds of thousands of chips (for example, ChatGPT’s inference cluster is over 10 times the size of its training cluster). Therefore, customized ASIC designs can reduce the cost per chip.
Disadvantages of ASICs
Time to market is time-consuming
The ASIC design cycle can take up to one to two years, while AI model iteration is extremely rapid (for example, the transition from GPT-3 to GPT-4 for large models took only one year). If the model targeted at the time of ASIC design becomes outdated (e.g., the Transformer replaces CNN), the chip may become ineffective.
ASICs are less suitable for AI training
Similarly, ASICs are relatively weak in training tasks. Because training algorithms evolve rapidly and demand is flexible, using ASICs for training exposes them to the risk of chip failure during algorithm updates, making them much less cost-effective.
Top Inference Chips on the Market
Famous inference chips
Almost every world-renowned tech giant you’re familiar with, including Apple, Amazon, Alphabet, Meta, Microsoft, Tencent, ByteDance, Alibaba, and OpenAI, has deployed, is in the process of deploying, or is commissioning chip designers to develop inference chips.
Mostly Outsourced Design
In the ASIC market, major AI companies are mostly software companies and lack a deep bench of chip design talent, so they must outsource chip design.
Currently, Broadcom holds the top spot with a 55%-60% market share, and Marvell is second with a 13%-15% share.
Notable Deployed Inference Chips
The following is a list of notable deployed inference chips, excluding those currently under design.
Company name | Product | Architecture | Application |
Alphabet | TPU series | ASIC | Inference, training |
Amazon | Inferentia, Trainium | ASIC | Inference chip Inferentia; traning chip Trainium |
Microsoft | Maia 100 | ASIC | Inference, training |
Meta | MTIA series | ASIC | Inference, training |
Huawei HiSilicon | Ascend 910 series | ASIC | Inference, training |
Cambricon Tech | MLU arch series | ASIC | Inference, training |
Other vendors
Note that AI chips from Nvidia, AMD, and Intel can also be used for inference, though the performance isn’t as impressive as when used for training.
In addition, several smaller startups, including SambaNova, Cerebras Systems, Graphcore, Groq, Tenstorrent, Hailo, Mythic, and KAIST’s C-Transformer, have also launched AI chips that can also be used for inference. However, their shipments are relatively small and cannot compare to the AI inference chips designed in-house by the tech giants.

Related articles
- “AI inference chips vs. training chips“
- “China ditch US AI chips and decides to go its own way“
- “A must-read for Nvidia investors"The Thinking Machine"“
- “How GPU farms CoreWeave make money?“
- “Geoffrey Hinton, 2024 Nobel Physics winner, inadvertently helped Nvida transform to AI overlord“
- “Chinese AI progress and top companies“
- “DeepSeek routed the global AI and stock“
- “Top vendors and uses of GPU“
- “How does CUDA strengthen the moat of Nvidia’s monopoly?“
- “Comparison of AI chips GPU and ASIC“
- “ASIC market is getting bigger, and related listed companies in the US and Taiwan“
- “Significant changes in Broadcom’s business approach“
- “The reasons behind Broadcom share price consistantly outperformance“
- “How low-key Marvell makes money?“
- “How does nVidia make money, Nvidia is changing the gaming rules“
- “The reasons for Nvidia’s monopoly and the challenges it faces“
- “Why nVidia failed to acquire ARM?“
- “Revisiting Nvidia: The Absolute Leader in Artificial Intelligence, Data Center, and Graphics“
- “Data center, a rapidly growing semiconductor field“
Disclaimer
- The content of this site is the author’s personal opinions and is for reference only. I am not responsible for the correctness, opinions, and immediacy of the content and information of the article. Readers must make their own judgments.
- I shall not be liable for any damages or other legal liabilities for the direct or indirect losses caused by the readers’ direct or indirect reliance on and reference to the information on this site, or all the responsibilities arising therefrom, as a result of any investment behavior.