Comparison of AI chips GPU and ASIC

Suggest you read my previous post of “ASIC market is getting bigger, and related listed companies in the US and Taiwan” as and affiliated article of this post.

Development Trend of AI Chips

Customized AI chip types

AI chip includs GPU and ASIC. Customized AI chip types, ASIC includes multiple types of chips, such as TPU (Tensor Processing Unit), LPU (language processor), NPU (neural network processor), etc.

Reasoning is the next wave of demand

But today’s data is like fossil fuel for model training, which will eventually run out. This is an important reason why the big model competition has shifted from pre-training to reasoning. Ilya Sutskever, a former Open AI executive, recently talked about this in a public speech. He predicted , the next generation of AI models will be true AI Agents with reasoning capabilities.

Another report from Barclays predicts that AI inference demand is expected to account for more than 70% of the total computing demand for general artificial intelligence, and even reach 4.5 times the training computing demand.

Special vs. general purpose

The current advantages and disadvantages of GPUs and ASICs are obvious. The advantage of GPU is that it is universal and can run many algorithms. NVIDIA’s CUDA ecosystem is mature and easy to use. The disadvantage is that general-purpose GPUs will have a certain waste of computing power and power consumption. ASIC is relatively specialized, and its design for specific algorithms may result in better computing power and power consumption performance.

Take Groq’s LPU as an example. The company claims that the LPU is ten times faster than Nvidia’s GPU, but costs only one-tenth the price and consumes only one-tenth the power of the latter. However, the more specialized the ASIC is, the less it can tolerate too many algorithms. It may not be easy to migrate a large model originally running on a GPU to an ASIC, and its overall ease of use is lower than that of a GPU.

From the perspective of usage scenarios, an industry insider told reporters that GPUs will still be used in a large number of parallel general use cases. Other needs can use lower-cost ASICs, such as using low-power AISCs on the inference side. . McKinsey’s research also believes that in the future, AI workloads will mainly shift to inference, and by 2030, AI accelerators equipped with ASIC chips will handle most AI workloads.

However, there may still be variables as to how much of the AI chip market share ASIC will capture in the future. This variable comes from GPUs absorbing the advantages of ASIC chips. Bao Minqi, product director of Arm Technology, told reporters that GPUs will not necessarily be replaced by other chips.

GPU is mainly used in AI cloud. GPU is easier to connect to software program ecosystem models such as openCLcuda or SYCL, which is convenient. From the perspective of energy efficiency, GPU will bring more multi-thread context switching overhead, which cannot be ignored. From this point of view, in the future edge scenarios, GPUs and other chips will gradually merge rather than replace each other. Just as NVIDIA H100’s TensorCore (tensor processing unit) has introduced more Tensor-specific technologies, chips have been taking each other’s strengths to gradually make up for their own weaknesses.

From this perspective, ASIC also needs to consider the risk of abandoning universality. The importance of GPU versatility is indeed such that when Transformer changes, GPU will have an advantage. Taking NPU as an example, on the one hand, the original DSA (domain-specific architecture) may not be able to cope with changes in algorithm processes, so it is necessary to consider introducing more general capabilities for some vector operations.

On the other hand, while having general computing capabilities, the chip may not be optimized for a specific type of computing, resulting in performance bottlenecks. Therefore, when designing, it is necessary to introduce more general computing power to adapt to changes in algorithms, etc., while also balancing general computing power and the performance of performing specific tasks.

Need both

The battle between GPU and ASIC is more like a battle between general-purpose and special-purpose camps. Before AI is finally finalized, neither chip will completely replace the other, and this game will not necessarily end in a win-lose situation.

Especially after ChatGPT became popular, Nvidia’s GPU product output could not keep up with the huge demand from customers. Many companies relied on server CPU + ASIC to meet users’ computing power needs for AI training and reasoning.

This shows the role of ASIC in the AI era. Mark Kuemerle, vice president of technology for Marvell’s Computing and Customization Group, observed: “The interesting thing about these data center customers is that if they have a minor bottleneck in their system, the problem will be magnified 1,000 times or more. Because they are deployed in hyperscale). Such bottleneck points can cause the NIC to get stuck. Off-the-shelf machine learning equipment may not match the workload or meet the flexibility or programmability requirements.

AI Chip Market

ASIC Market Size

Let’s see how to capture new markets. The potential market for customized chips for data centers is not small. According to research firm 650 Group, the market for customized chips for data centers will grow to US$10 billion in 2024 and double by 2025.

Needham analyst Charles Shi said the broader customized chip market could be worth about $30 billion in 2023, or about 5% of annual global chip sales.

The market share of GPU and ASIC in AI is 3:1

Citi said: At present, the market share of GPU and ASIC in AI is about 3 to 1. Citi believes that Nvidia’s biggest advantage is that its GPU can be reprogrammed through CUDA support software and adapt to different workloads.

As chipmakers Marvell and Broadcom recently reported earnings that far exceeded expectations, investors began to worry whether Nvidia’s GPUs would be replaced by customized ASIC chips.

Citi reiterated that “the two chips will coexist” and predicted that by 2028, the total market size (TAM) of AI accelerators will reach US$380 billion, of which AI GPUs will dominate with a 75% share, and ASICs will account for only 25%.

Nvidia’s biggest advantage is that its GPU can be reprogrammed through CUDA support software and adapt to different workloads. Citi also said that although the share of ASIC unit portfolio may reach more than 35% in 2028, the sales share of ASIC may be limited to about 25% due to the higher average selling price (ASP) of AI GPUs.

In addition, Nvidia’s CoWOS foundry capacity allocation is expected to increase from 56% in 2024 to 60% in 2025, indicating that GPU will continue to maintain strong growth momentum in 2025.

Inhouse-developed AI chips

Technology giants all have their own AI chips

ASIC can embed the functions that individual customers expect but GPU cannot provide───This is the biggest advantage of ASIC; Moreover, the super-large users of GPU are all technology giants, budget will not be a problem, and they generally do not want Nvidia to borrow money. Having the GPU supply at its throat. Currently, Google, Amazon, Tesla and Meta have all launched ASIC chips.

Among cloud vendors, Google has deployed TPU many years ago, and its sixth-generation TPU will be officially open to customers in December 2024; Meta launched MTIA v2, a custom chip designed for AI training and reasoning, in 2024; Amazon has Trainium 2.0 , and plans to release Trainium 3.0 in 2025; Microsoft has its own AI chip Maia.

Because they do not sell chips to the outside world, the AI chips of these cloud vendors have received less attention from the market. But in fact, these cloud vendors have already deployed ASIC chips in their data centers and are working to expand the use of these chips.

Google

Google released its first ASIC chip TPU v1 in 2015, and it has been upgraded to v5 in 2023. According to official data, each TPU v5 combines 8,960 chips in a three-dimensional ring topology at a speed of 4,800 Gbps/chip through the highest-bandwidth inter-chip interconnect (ICI). Compared with TPU v4, The TPU v5’s FLOPS and high-bandwidth memory (HBM) have increased by 2x and 3x respectively.

Google has the money, technology, and application scenarios. It can be said that it is the technology giant that has gone the furthest in developing its own AI chips. Other manufacturers are still struggling. Google continues to collect money for Nvidia’s account, but it has already made preparations for both scenarios.

Taking Google as an example, Tech Insights data shows that in 2023, Google has quietly become the world’s third largest data center processor design company, ranking behind CPU overlord Intel and GPU overlord Nvidia. Google runs TPUs for internal workloads and does not sell the chips externally.

Amazon

Amazon has made multiple investments in OpenAI’s competitor Anthropic, deepening its ties with the company. Anthropic uses Amazon’s Trainium. Amazon recently revealed that the plan to build the Rainier supercomputer cluster for Anthropic will be completed soon, and Amazon is also building more production capacity to meet the needs of other customers using Trainium.

The relevant orders from customized chip manufacturers Broadcom and Marvell come from these cloud manufacturers. Among them, Google and Meta’s ASIC chips are customized in cooperation with Broadcom. In addition to Google, JPMorgan analysts predict that Meta is expected to become the next ASIC customer to generate $1 billion in revenue for Broadcom. Amazon is working with chip maker Marvell.

In early December 2024, Amazon AWS just reached a five-year agreement with Marvell. The two parties intend to expand their cooperation in AI and data center connection products so that Amazon can deploy semiconductor product portfolios and dedicated network hardware.

Other Behemoths

In addition to cloud vendors such as Google, Meta, and Amazon, OpenAI and Apple have also been reported to be cooperating with such ASIC customized chip manufacturers. Recently, there have been reports that Apple is developing AI server chips and is working with Broadcom to develop the chip’s network technology. OpenAI was previously reported to have been working with Broadcom for several months to build AI inference chips.

Practical considerations

Efficiency

Because from the performance perspective, ASIC chips designed for specific scenarios or applications will have more advantages than the general-purpose GPUs sold by Nvidia. In the past, GPUs were used all the time, which boosted the performance and stock prices of manufacturers such as Nvidia. But later, people found that with the development of machine learning and edge computing, algorithms became more mature and stable, and they had enough computing needs to share the cost of ASICs.

Power consumption

Generally speaking, ASICs are more than 50% more efficient than general-purpose accelerators in the most core computing tasks such as matrix operations that are frequently used in AI computing, and their power consumption is reduced by about 30%.

Taking Nvidia’s H100 chip, which is the most widely used chip on the market, as an example, its power consumption is as high as 700W, and the memory wall problem limits its cost-effectiveness in inference scenarios.

Features

ASIC can play the role that GPU lacks or is weak in in terms of artificial intelligence, such as reasoning, but GPU is catching up rapidly in reasoning, and the gap between the two is narrowing. Another advantage of ASIC is that it can embed the functions that individual customers expect but cannot be provided by GPU into the ASIC chip designed by themselves.

Software

CUDA now has a near monopoly, and ASIC does not have the advantage of software compatibility. Since the functions of the ASIC designed by each customer are different and the details vary greatly, the programming must be redesigned in order to port the program to the ASIC chips used by different customers. This will be a huge project. .

For details about Cuda, please see my post of “How does CUDA strengthen the moat of Nvidia’s monopoly?“

High-speed transmission

In addition to chips, Nvidia’s trump cards that enable it to dominate the AI training market include CUDA and NV LINK. The former is a software development tool that is highly compatible with chips, and the latter is a dedicated network protocol that can provide a high-speed, low-latency working environment. The

One of the key reasons why Broadcom and Marvell were able to beat other competitors and become the top two manufacturers of ASIC design for AI chips is that both of them have mastered high-speed transmission technology.

Deployment

GPU has the advantage of being the first mover, has a long history of technology development, has low cost of use, is highly compatible, and has been proven by the market, so it has an advantage in large-scale deployment; however, the most obvious disadvantage is its high power consumption and Poor efficiency.

Because ASIC is a customized chip designed for specific user needs, it has advantages in terms of efficiency such as throughput, power consumption, and computing power level.

Cost

Citi pointed out that the cost of an ASIC on the market is about US$5,000 per chip, and the cost of a GPU is about US$20,000 to US$30,000 per chip. However, the report said that GPUs have more high-bandwidth memory (HBM) and can be reprogrammed to adapt to different workloads. The latter, with the assistance of Nvidia’s CUDA software tools, is the biggest advantage of GPUs and Nvidia.

Key vendors

GPU

Nvidia has almost monopolized more than 90% of the GPUs used in artificial intelligence worldwide, and this number has increased from 80% to 95% in the past two years(For details about Nvidia’s monopoly on GPUs used in artificial intelligence, please see my post of: “The reasons for Nvidia’s monopoly and the challenges it faces“), and may rise further in the future, especially with the latest and most advanced GPUs. , Nvidia has no rivals at all.

For more information about GPU, please refer to my post of:”Top vendors and uses of GPU“

ASIC

JPMorgan Chase estimates that Broadcom currently holds 55% to 60% of the customized chip market. Marvell is considered the second largest player in the ASIC market, with an estimated share of 13% to 15%.

Needham’s statistics in 2023: There are two giants in the market for customized chips for data centers: Broadcom and Marvell. In the high-end ASIC market, Broadcom holds the leading position with a market share of 35%, followed by Marvel with a market share of 12%. Broadcom and Marvell both believe that as data center processors become more diverse, the customized chip model will be revitalized.

The McKinsey report by global consultancy firm specifically states that Broadcom is expected to dominate the market with a share of 55-60%. It is expected that by 2030, the majority of AI workloads will be handled by ASICs. Marvell is rapidly emerging as the second largest player in the custom ASIC market, with a 13-15% share.

Taiwanese, Chinese, and Japanese manufacturers are taking up the smaller market space left by the two giants. Among these companies, Alchip has grown the fastest, with sales increasing by about 7 times during the same period. It has set new records for six consecutive years, with revenue, profit, net profit and earnings per share all reaching record highs. Although the company’s stock price has recently It is in a correction phase, but it is still one of the highest-priced stocks in Taiwan, China, and its share price is already 10 times that of four years ago.

Broadcom’s 2024 Q3 result

Broadcom generated $12.2 billion in revenue in fiscal 2024 (ending November 3) from sales of customized AI chips and network processors. That’s a staggering 220% increase from the $3.8 billion in revenue the company generated from AI chips in fiscal 2023.

The good news is that Broadcom expects its potential market size related to artificial intelligence to grow to $60 billion to $90 billion by fiscal 2027. market share in this opportunity and expects this to drive strong growth in the $12.2 billion AI revenue base in 2024.

In terms of performance, in fiscal 2024, Broadcom’s revenue increased by 44% year-on-year to a record US$51.6 billion. In that fiscal year, Broadcom’s artificial intelligence revenue grew 220% year-over-year to $12.2 billion, driving the company’s semiconductor revenue to a record $30.1 billion. Broadcom also expects revenue to grow 22% year-over-year in the first quarter of fiscal 2025.

Marvell’s 2024 Q3 result

According to Marvell’s third-quarter financial report for fiscal year 2025, the company’s data center revenue in the third quarter of fiscal year 2025 (ending November 2) increased by 98% year-on-year to US$1.1 billion. This strong growth offset weak performance in the company’s other business units. The company’s revenue for the quarter was US$1.516 billion, up 7% year-on-year and 19% quarter-on-quarter.

The company said the sequential growth was above the midpoint of its previous guidance and forecast next quarter’s revenue to also grow 19% from the previous quarter. Marvell said that the third-quarter performance and expectations for strong fourth-quarter performance were mainly driven by customized AI chip projects, which have begun mass production and are expected to continue to have strong demand in fiscal 2026. momentum.

For more information about the development of the global ASIC industry, please refer to my post of: “ASIC market is getting bigger, and related listed companies in the US and Taiwan“

AI Chip Startups

Cloud vendors develop their own big models and have tied up with some big model startups through investment. The self-developed chips developed in cooperation with ASIC custom manufacturers are used for the training and inference of these big models, without relying on external sales. ASIC startups are different. They choose different chip foundries and need to find customers themselves.

Among them, Cerebras, which launched the wafer-level chips, outsourced the chips to TSMC for production, and Etched’s Sohu chip adopted TSMC’s 4nm process. The GroqLPU chip, which uses a near-memory computing architecture, has less demanding process requirements and uses GlobalFoundries’ 14nm process.

These ASIC startups are recruiting customers all over the world, and searching for customers in Middle Eastern countries that are increasing their investment in AI has become a common choice for some ASIC startups. According to Cerebras’ public data, Cerebras’ net sales in 2023 were nearly US$79 million, and reached US$136.4 million in the first half of this year. In 2023, the company’s revenue from Abu Dhabi’s G42 accounted for 83% of its total revenue, and G42 also pledged to purchase $1.43 billion worth of Cerebras products and services next year.

Cerebras, Groq and another AI chip startup SambaNova were also seen at the Saudi AI Summit. Cerebras signed a memorandum of understanding with Saudi Aramco at the time, and Saudi Aramco planned to use Cerebras’ products to train and deploy large models.

Groq is working with Saudi Aramco’s digital and technology subsidiary to build the world’s largest inference data center in Saudi Arabia, which will be completed and put into operation by the end of 2024. It will initially include 19,000 Groq LPUs and is expected to be expanded to 10,000 in the future. 200,000 LPUs. According to SymbaNova’s official website, the company is also working with Dubai-based SolidusAI Tech to provide SymbaNova Cloud to high-performance computing data centers in Europe, and is working with Canvass AI, which operates in the Middle East, South Asia, Europe, and Africa, to provide Enterprises provide AI solutions.

In addition, according to the company’s official website, SymbaNova cooperates with the Argonne National Laboratory in the United States. Groq is working with Carahsoft, a company that provides IT solutions to US and Canadian government departments, and with Earth Wind and Power in the energy sector, and plans to build an AI computing center in Norway.

Why does Broadcom dominate ASIC?

Broadcom is involved in the customized chips of many companies. Why do major companies choose Broadcom? Not just chip design
Broadcom has its own “moat” in terms of chip-to-chip communication capabilities.

Broadcom is the undisputed monopoly giant in SerDes (serializer/deserializer) communication technology. The SerDes interface converts low-speed parallel data into high-speed serial data before transmission, and then converts it back into parallel data at the receiving end. The purpose is to allow data to move from one TPU to another at high speed, thereby improving signal transmission efficiency. In the global 50GB/S SerDes market, Broadcom accounts for 76% of the market.

Disclaimer

The content of this site is the author’s personal opinions and is for reference only. I am not responsible for the correctness, opinions, and immediacy of the content and information of the article. Readers must make their own judgments.
I shall not be liable for any damages or other legal liabilities for the direct or indirect losses caused by the readers’ direct or indirect reliance on and reference to the information on this site, or all the responsibilities arising therefrom, as a result of any investment behavior.

Andy Lin

Stock Investment Books Published
English Blog
English Simple Bio
English Website
Facebook
Twitter Account
LinkedIn Account
股票投資著作書籍
 中文部落格
 中文簡歷
 中文網站
 臉書
 推特帳號
 領英帳號

Author: Andy Lin

Stock Investment Books Published English Blog English Simple Bio English Website Facebook Twitter Account LinkedIn Account 股票投資著作書籍中文部落格中文簡歷中文網站臉書推特帳號領英帳號 View all posts by Andy Lin