DeepSeek routed the global AI and stock

DeepSeek’s independently developed MLA architecture and DeepSeek MOE architecture have played a key role in reducing its model training costs.

DeepSeek

DeepSeek’s breakthrough has routed global AI industry, venture capital, and stock markets.

This incident triggered violent fluctuations in the U.S. capital market and hit U.S. stocks hard: the S&P 500 index fell nearly 10% in 10 days, and the market value of technology giants such as Nvidia, Microsoft, Alphabet, Broadcom, Marvell, TSMC, ARM, and Oracle evaporated by more than one trillion U.S. dollars in total.

Impact on global stock market

During the Chinese New Year on January 27, 2025, all stock markets around the world got routed:

  • The Philadelphia Semiconductor Index, which represents U.S. semiconductor stocks, fell 9.2%.
  • Nvidia, the world’s largest artificial intelligence chip maker, plunged more than 16.8%, and its market value evaporated by more than $589 billion, setting a record for the largest market value evaporation in the history of U.S. stocks.
  • Broadcom, the world’s largest designer of custom AI chips, plunged 17.3%
  • TSMC, the world’s largest foundry, plunged 13%
  • Marvell, the world’s second-largest custom AI chip designer, plunged 19.1%
  • Arista, a major manufacturer of network switches and routers, plunged 22.35%
  • ARM plunged 10.19%
  • OpenAI’s data center partner Oracle plunged 13.79%
  • Google parent Alphabet fell 4.2%

How popular is it?

100 million new users in 7 days

Sanyan Technology (sycaijing.com) reported on February 8, 2025 that the AI ​​product list showed that after the release of the DeepSeek R1 model on January 20, 2025, the number of DeepSeek users increased by 125 million in January (including websites and applications without deduplication). Among them, more than 80% of the users came from the last week of January.

According to calculations, DeepSeek achieved 100 million user growth in 7 days without any advertising or market promotion. It broke the record set by ChatGPT. To achieve the growth of 100 million users: ChatGPT took 2 months, second only to DeepSeek; TikTok took 9 months and ranked third; Pinduoduo took 10 months; WeChat took 1 year and 2 months, ranking fourth and fifth respectively.

Number one download app

On January 27, 2025, DeepSeek surpassed ChatGPT to top Apple’s free application list in the United States. On the same day, the free list of Apple’s Chinese App Store also showed that DeepSeek ranked first.

The second most popular AI chatbot

The number of DeepSeek website users exceeded Gemini, and its visits in the United States reached a record high of 49 million on January 28, 2025, a surge of 614% from the previous week, making it the second most popular AI chatbot in the world. On January 31, the daily visits in the United States were 2.4 million, 60% higher than the 1.5 million visits of Google’s chatbot Gemini, while OpenAI’s ChatGPT was still 8 times that of DeepSeek, with 19.3 million visits on the same day.

Outside the United States, the gap between DeepSeek and Gemini is even wider. According to SimilarWeb data, DeepSeek had 29.2 million visits worldwide excluding China on January 31, more than three times that of Gemini.

A month ago, DeepSeek.com had an average of 300,000 daily visits, but by January 27, that number had surged to 33.4 million, shaking U.S. technology stocks that day.

Tech behemoths rush to jump on the DeepSeek train

On February 1, DeepSeek’s AI assistant topped the list of most downloaded mobile apps in 140 markets. Foreign super-large technology companies such as Microsoft, Nvidia, and Amazon have all deployed online support for users to access the DeepSeek-R1 model.

OpenAI is under tremendous pressure

The rise of DeepSeek has put some pressure on OpenAI. OpenAI has changed its past mysterious and noble feeling and charging policy and was forced to announce the official launch of its latest lightweight artificial intelligence model o3-mini on February 1, opening the reasoning function to free users for the first time. Free users can directly experience the reasoning function of o3-mini in ChatGPT.

At the same time, in order to reduce the impact of DeepSeek and share Google’s market, OpenAI even announced on February 6 that it would open the functions of OpenAI’s newly launched ChatGPT online search engine to all users for free, without registration, hoping to attract users back to the world of OpenAI.

Moreover, OpenAI never advertised and believed it didn’t need to. However, it was forced by DeepSeek to take a radical approach and spent $14 million on advertising in the Super Bowl, the most expensive annual advertising event in the United States.

Finally, OpenAI CEO Altman announced that the company would no longer launch the large inference model o3 separately, but would launch the GPT-4.5 and GPT-5 models within a few months to integrate and streamline its AI models and solve the current overly complex product route situation. And because DeepSeek is so cheap and shocking, the most advanced GPT-5 to be released in a few months will be “available for free.”

What are the current rates? The standard version costs $20 per user per month, and the enterprise version costs $200 per user per month. As recently as September 2024, OpenAI also stated that it would increase the price for standard edition users to US$40 in the future, and for enterprise users to US$2,000 or several thousand dollars. Without DeepSeek, consumers around the world will have no choice but to be slaughtered by OpenAI.

Google surrends too

In addition to OpenAI, Demis Hassabis, CEO of Google DeepMind, a subsidiary of Google, first called DeepSeek’s model “an impressive work”, and then changed his tone and said: “From a technical point of view, this is not a major change”, and also emphasized that “the hype is a bit exaggerated.” “Despite a lot of hype, there are actually no new scientific breakthroughs and it uses known AI techniques.”

But Hassabis also said that the Gemini 2.0 Flash model, which Google opened to everyone this week, is more efficient than DeepMind’s model. The

You should know, especially if you have used Gemini, that Google’s latest and more powerful versions were all charged before DeepSeek appeared. There was no such thing as free. But why is it free now?

Therefore, it is not difficult to understand Hassabis’s doubts about DeepSeek. DeepSeek has in fact become a powerful rival of DeepMind.

Just relying on low cost?

Deny any of China’s achievements

Many white people in Europe and America who are used to demonizing China will definitely respond with a knee-jerk reaction, “It’s because China’s labor is cheap!”, “It’s because of the US embargo that they rely on low-cost chips!”, “It’s just luck, and there’s only this one!”

I advise these ignorant frogs in the well to check the annual salary of Chinese software engineers and the salary paid by DeepSeek before coming back to read this article.

Moreover, China already has many large AI companies whose achievements are no less than those of their American competitors, but you just don’t know them: For more information, please see my post of “Chinese AI progress and top companies

Demonizing and smearing China

Whenever any achievement of China threatens Europe, America or its dependent vassal states (such as Taiwan, Japan, and South Korea), these countries will begin to use the media they have established over decades of world monopoly to brainwash people into believing that Chinese products are of poor quality (this is obviously specious: first measure how much money you spend on your so-called Chinese products. Why don’t you spend a lot of money on so-called high-quality goods?)

Chinese software will transmit data back to China, and China will conduct censorship of speech (the ignorant and ridiculous thing is that most people do not know that their own country, including the so-called democratic country that you are proud of, and all the programs you use to connect to the Internet actually have censorship of speech).

Then why can Europe, the United States or their vassal states (such as Taiwan, Japan, and South Korea) do the same thing, but China cannot? The mobile phone, tablet and computer you use are not produced in China? How come I haven’t heard you complain after using it for decades?

Is it okay for Google search engines, Apple, Facebook, and Microsoft software to transfer your data back to the United States? Are you sure they are saints? Then why are white people in Europe and America crazy about TikTok? Why are Americans trying every possible means to buy the problematic TikTok?

Even Trump himself has changed his attitude, saying that TikTok is very useful and that DeepSeek is more beneficial than harmful to American companies. Or is it that I misunderstood his intention because of my poor English?

What does it mean that the officials are allowed to set fires but the common people are not allowed to light lamps? Two sets of standards, ideology, or ignorance is.

Why can’t others do it?

What I want to say is that if it only relies on “cheap labor” and “low-cost chips”, then the entry threshold should be very low. Why have small and medium-sized enterprises in the United States, and even small and medium-sized enterprises in Taiwan, whose software capabilities are at the bottom of the world, been able to achieve the achievements announced by DeepSeek long ago? But why didn’t it happen? This is logic that even elementary school students understand.

What about Taiwan’s AI model?

If this is the consistent theory of some sour grapes, then where is Taiwan’s own AI large language model? If so, how many users use it? The Taiwan government’s technology development plan for 2024 has a total budget of NT$132.8 billions (USD 4.15 billions), including the Ministry of Digital Economy, the National Science Council, the National Communications Commission, and the Ministry of Economic Affairs, which are four departments directly related to information technology development. After spending taxpayers’ NT$132.8 billion, where is Taiwan’s own AI large language model?

Note: TAIDE, the AI ​​model officially launched by the National Science Council of Taiwan in 2023, has outdated functions after spending a large amount of taxpayers’ budget. Few people are interested in using it. It has stopped updating for a long time and dares not announce how many people are using it. The self-made large-scale traditional Chinese language model launched by the Academia Sinica, the largest official national research institution, in 2023 has made its debut. It was revealed that the data set used is mainly simplified Chinese data in China, and then simplified to traditional Chinese is converted separately. A typical copycat.

On November 11, 2024, in order to take advantage of the popularity of artificial intelligence, Taiwan’s Minister of Economic Affairs, in the meeting with Chinese National Federation of Industries, bite off more than Taiwan can chew, publicly declared :“we will strive to achieve a 50% penetration rate of artificial intelligence in the manufacturing industry by 2028, and Taiwan’s software will be among the top three in the world.”

Where’s the bargain?

Another thing worth mentioning is the so-called low-cost advantage of DeepSeek: it does not only refer to the world-shaking creation of the artificial intelligence DeepSeek-R1 model at a cost of approximately US$5.6 million (the cost is only 5% of OpenAI GPT-4) — it should refer to its low cost, price, and excellent performance. It should refer to its low cost, price and excellent performance.

More importantly, DeepSeek charges incredibly low fees for programmers to access its API services. If you are a customer and can achieve the same or even better results, is there any reason to use an API service that is so expensive and almost monopolized by a few American AI companies?

Breaking US monopoly

In fact, the United States is more afraid of the latter, because more manufacturers around the world will adopt DeepSeek and then push it to their customers, weakening the influence of the United States.

Why is DeepSeek successful?

Three main reasons for success

DeepSeek’s success is mainly due to three aspects below:

  • Technical level: DeepSeek core independently developed two models, DeepSeek-V3 and DeepSeek-R1, whose performance is comparable to OpenAI’s 4o and o1 models.
  • Low cost: The two models developed by DeepSeek cost only about one-tenth of OpenAI 4o and o1 models.
  • Open Source Models: DeepSeek has made these two powerful model technologies open source, allowing a broad range of AI teams to use these most advanced and lowest-cost models to develop more innovative AI applications.

Note: China’s three best-performing AI models are DeepSeek, Alibaba, and Baidu, all of which use open source models. In contrast, among the United States’ major competitors, only Meta adopts open source model.

Next Steps

According to statistics from mainland media, at least 20 mainland China chip manufacturers have announced cooperation with DeepSeek, breaking the limitations of Nvidia’s CUDA ecosystem by combining “domestic computing power and domestic large models”. These include well-known mainland AI chip manufacturers such as Huawei Ascend, Baidu Kunlun Core, Hygon, MetaX Tech, and Moore Thread.

Take advantages of Chinese

The inherent advantages of Chinese

Chinese has technical advantages in artificial intelligence, which is mainly related to three characteristics of Chinese. First of all, Chinese characters are very neat and regular in writing size and pronunciation length. During speech recognition, each word is composed of an initial consonant and a final vowel, making it relatively easy to distinguish the pronunciation of each word in a sentence. However, English words vary in length, and many English sentences are pronounced in a connected manner, which requires a larger amount of computing power, and this problem exists in almost all alphabetic scripts.

Chinese is good for AI training

Chinese characters are ideographic characters, and their information density is generally higher than that of alphabetic characters. They are rich in connotation and can express rich meanings with only a few Chinese characters such as phrases, idioms, and classical Chinese. For artificial intelligence, in most cases with the same content, Chinese training requires less storage and computing. But he also emphasized that this is not completely absolute, because English also has its own advantages. For example, it is relatively rigorous. For example, in rigorous papers or legal documents, the information density of English is not that different from that of Chinese.

Chinese word’s stability

The stability of Chinese means that the Chinese characters used today are actually very similar to the oracle bone inscriptions from thousands of years ago. For new things, Chinese characters are used to create new phrases through commonly used characters. The stability of Chinese characters means that the training parameters of artificial intelligence can be reduced, and effective repeated training can be more frequent and more accurate. In contrast, English adds a large number of new words every year, and commonly used words may change.

What is DeepSeek’s breakthrough?

Two key technologies

The cost of DeepSeek involves two key technologies: one is MoE and the other is MLA. DeepSeek’s independently developed MLA architecture and DeepSeek MOE architecture have played a key role in reducing its model training costs.

“DeepSeek’s strength lies in its ability to train MoEs. It has become the first company in public MoE model training to successfully train such a large MoE.”

MOE

DeepSeek solves the performance problem of “very large and very sparse MoE models”, which is also the most critical reason for DeepSeek’s low training cost.

The advantage of the MOE architecture is that, on the one hand, the model can embed data into a larger parameter space; on the other hand, during training or inference, the model only needs to activate a part of the parameters, which greatly improves efficiency.

The DeepSeek model has more than 600 billion parameters, compared to Llama 405B’s 405 billion. In terms of parameter scale, the DeepSeek model has a larger information compression space and can accommodate more world knowledge. But at the same time, the DeepSeek model only starts with about 37 billion parameters each time. That is to say, during training or inference, only 37 billion parameters need to be calculated. In comparison, the Llama 405B model requires 405 billion parameters to be activated for each inference.

MLA is mainly used to reduce memory usage during inference, as well as during training. It uses some clever low-rank approximation mathematical techniques, and DeepSeek uses rotational position encoding (RoPE). Using RoPE in conjunction with traditional MoE successfully integrates these techniques, which means that DeepSeek is ahead of its competitors in efficient language model training.

MLA

“MLA mainly compresses the KV Cache size by modifying the attention operator, so that more KV Cache can be stored under the same capacity. This architecture is combined with the FFN layer modification in the DeepSeek-V3 model to achieve a very large sparse MoE layer. This is the most critical reason for the low training cost of DeepSeek.”

KV Cache is an optimization technology that is often used to store key-value pairs (i.e. key-value values) of tokens generated when artificial intelligence models are running to improve computing efficiency. During the model operation process, KV cache will act as a memory library to store the token key values ​​previously processed by the model, calculate the attention score through model operation, and effectively control the input and output of the stored tokens. By “storage-for-computation”, it avoids the repeated calculations of most large model operations that start from the first token each time, thereby improving the efficiency of computing power.

Without Cuda

Instead of using Nvidia’s Cuda library, it directly adopts Parallel Thread Excution (PTX), which uses a large number of fine-tuned parallel thread execution codes to achieve high-performance AI model performance using lower-specification hardware.

The feedback from rivals

What’s OpenAI’s comments on DeepSeek?

OpenAI CEO Altman, who even considers DeepSeek as its biggest rival, admitted his mistake and said he would learn from DeepSeek to make the thinking process of the inference model public. OpenAI’s closed-source strategy has put them on the wrong side of history. He will rethink OpenAI’s open source strategy and admit that their leading advantage is no longer as strong as before.

Altman also praised DeepSeek as a very good model and said “we will maintain a smaller lead than in previous years.” Altman also reflected on OpenAI’s closed-source strategy and revealed that OpenAI is discussing the release of some model weights, etc. “I personally think we are on the wrong side of history here and we have to figure out a different open source strategy, but not everyone at OpenAI agrees with that view, and it’s not our top priority right now.” Altman also said OpenAI will follow DeepSeek’s lead.

What do professionals say?

The CEOs of the world’s top artificial intelligence companies, including OpenAI, Apple, Microsoft, Amazon, Alphabet’s Google, META, and Nvidia, have all praised DeepSeek’s achievements. Apple pointed out in its most recent earnings conference that it does not rule out the use of open source models similar to DeepSeek in the future. I have sat in on every single one of the company’s earnings calls during the time this article was published, and there wasn’t a single CEO who wasn’t asked by a Wall Street analyst about his or her views on DeepSeek.

Please note: Each of the companies mentioned above is a direct competitor to DeepSeek, especially OpenAI, Google, and META.

The most insightful comments about a business come from its direct competitors.

Sweeping the world

Apple, Microsoft, Amazon, Alphabet’s Google, Nvidia, and China’s own major cloud computing companies have all, without exception, launched DeepSeek on their platforms.

European companies even said that DeepSeek provides them with the best opportunity to break free from the control of American manufacturers with high cost-effectiveness. DeepSeek’s latest AI product R1 is indeed an impressive achievement, providing a cheaper and more efficient alternative to models developed by American companies such as OpenAI.

Schmidt retracts his opinion from six months ago

In a column published in The Washington Post on February 28, 2024, former Google CEO Eric Schmidt said that the rise of DeepSeek marked a “turning point” in the global artificial intelligence competition, proving that China can compete with large technology companies with fewer resources and reflecting China’s growing strength in the field of artificial intelligence.

As recently as 2024, Schmidt asserted that the United States would be two to three years ahead of China in the development of artificial intelligence. However, the emergence of DeepSeek and its cost-effective and efficient AI model challenges this notion.

Schmidt’s recognition of DeepSeek’s impact highlights the dynamic and rapidly evolving nature of the AI ​​industry, where new players can disrupt established norms and drive innovation. Schmidt’s latest remarks are equivalent to indirectly retracting his comments on China’s AI development made in the past six months, and proving his pessimistic view on the prospects of open source code.

Note: Schmidt is the most representative figure across Silicon Valley, the US government, and the venture capital industry. His speech represents the direction of US technology industry policy, the views of the Silicon Valley technology community, and the direction of the venture capital industry.

For Schmidt’s views on the development of artificial intelligence, venture capital, U.S. technology industrial policy, and the technological strength of major countries around the world, please refer to my post for a detailed analysis: “Schmidt’s removed speech deserves investors to read. What did it talk about?

A wake-up call for the United States

China has a way to do it

When talking about China’s AI in a recent interview, Dalio said that “Chinese ‘Are a Bit Behind in the Chips, But They’re Ahead in the Applications“. I personally agree with his opinion. Because many examples have proven that this is an indisputable fact, it’s just what Dalio said.

This has put DeepSeek in the ranks of communications equipment giant Huawei, electric car maker BYD and e-commerce giant Alibaba, once again igniting Chinese people’s sense of national pride. What these companies have in common is that they are adept at taking existing technologies (usually developed in the United States and other Western countries) and rapidly scaling them up for mass production or consumption.

The United States can’t stop it

Bloomberg reporter Catherine Thorbecke said “DeepSeek’s Breakthroughs Are Too Big for the US to Ban“: DeepSeek is most different from other Chinese technology products that have been banned by the United States. The development team of DeepSeek chose to open source its large model and even published a paper, sharing in great detail and transparency how they built this large model. This means that even if the United States can block DeepSeek’s mobile phones and websites, “it is almost impossible for Washington to eliminate the influence of DeepSeek.”

What she meant was that the United States couldn’t ban it even if it wanted to!

Blind spot of Silicon Valley and Wall Street

DeepSeek’s greatest contribution is to uncover the fig leaf behind the US’s technological embargo on China and its leading position in artificial intelligence! Over the past few years, everyone has been brainwashed by the following rhetoric from the investment firms and Silicon Valley:

  • To create a breakthrough artificial intelligence big model, big numbers are beautiful.
  • Only super large technology companies such as Apple, Microsoft, Amazon, Alphabet’s Google, and Nvidia have such resources, qualifications and capabilities.
  • In order to maintain the United States’ lead in artificial intelligence, huge capital expenditures will be necessary.
  • In order to curb China’s development in artificial intelligence, a chip embargo on China must be imposed.
  • In order to prevent business secrets from being stolen by competitors, closed source code must be used. Using open source code is foolish.

DeepSeek’s achievements prove that all the above arguments have no point to stand.

Closing words

Marc Andreessen, the founder of A16Z, a top venture capital firm on Wall Street and known as the godfather of venture capital. On January 27, 2025, Marc Andreessen posted on social media that “DeepSeek is the Sputnik moment of AI.” The so-called “Sputnik moment” refers to the successful launch of the Soviet Union’s first artificial satellite, Sputnik 1, in 1957. This metaphor fully illustrates that the subsequent shock and impact that DeepSeek will bring in the era of generative artificial intelligence will be unimaginable.

DeepSeek
credit: DeepSeek

Related articles

Disclaimer

  • The content of this site is the author’s personal opinions and is for reference only. I am not responsible for the correctness, opinions, and immediacy of the content and information of the article. Readers must make their own judgments.
  • I shall not be liable for any damages or other legal liabilities for the direct or indirect losses caused by the readers’ direct or indirect reliance on and reference to the information on this site, or all the responsibilities arising therefrom, as a result of any investment behavior.
error: Content is protected !!