How does CUDA strengthen the moat of Nvidia's monopoly?

Introduction to CUDA

What is CUDA?

CUDA is a parallel computing platform and programming model that allows general-purpose computing on Nvidia’s GPUs. The platform enables developers to harness the power of GPUs for parallel computing to accelerate computationally demanding applications.

Release of CUDA

Nvidia released CUDA in 2007. CUDA is a programming environment provided by Nvidia GPU to customers, allowing customers to call and access all functions provided by Nvidia GPU through this software interface.

How was CUDA invented?

Huang Jensen has said: Because in the field of video games we choose, you not only want it to be beautiful, but also want it to be dynamic and be able to create a virtual world. We extend it step by step and introduce it to scientific computing. One of the first applications was molecular dynamics simulations and another was seismic processing, which is basically inverse physics. Seismic processing is very similar to CT reconstruction and is another form of inverse physics. So we solved the problem step by step, expanded to adjacent industries, and finally solved these problems.

Please note: Jensen Huang is not just talking about the computer or video game industry.

Use

General processing

CUDA is Nvidia’s parallel computing platform and application programming interface. Because of CUDA, Nvidia’s GPU is not limited to computer display, but can be used for other purposes and general processing.

GPU and CUDA Mainly used for acceleration

Unlike a general-purpose computer, once the processor is built in, everything eventually works. But GPUs are accelerated computers, which means you need to ask yourself, what are you trying to accelerate? There is no such thing as a universal accelerator.

Algorithms are different for different purposes. If you create a processor that specializes in these algorithms and supplements the CPU with the tasks it is good at, then in theory, you can greatly speed up the operation of the application. The reason is that typically 5% to 10% of the code takes up 99.99% of the running time.

If you use Huida’s GPU and CUDA to process those 5% of the program code, on the accelerator, technically, you can increase the speed of the application by 100 times.

Almost everything related to machine learning is evolving. It can be SQL data processing, Spark type data processing, or vector database type processing, processing unstructured or structured data, which are all data frames. We accelerate these tremendously, but in order to do that, you need a top-level library — and that’s CUDA.

Biggest disadvantage

CUDA’s biggest drawback is its lack of portability, as it only runs on Nvidia’s chips.

Closed but not open

Just like Apple’s approach, CUDA is Nvidia’s unique software interface and can only be used on Nvidia’s hardware. It is a closed software interface that is not open to the public.

Precisely because of its closure, it has strengthened its uniqueness, strengthened Nvidia’s competitiveness, and increased Nvidia’s overall monopoly.

Why does CUDA form a moat?

Better execution efficiency

It is generally considered faster, better supported through a wide range of libraries and software tools, and is generally considered a more mature platform with a wider user base than OpenCL.

Covers a wide range of functions

For example, CUDA provides cuDNN (a library for neural network operations), cuOpt (a library for combinatorial optimization), and cuQuantum (a library for quantum simulations and simulations). , and many other libraries, such as cuDF for data frame processing, SQL-like functionality. So all these different libraries needed to be invented that could reorganize the algorithms in the application so that the Nvidia GPU accelerators could work. If you use these libraries, you can achieve 100 times acceleration and get more speed, which is amazing.

Widely accepted

We noticed that CUDA seemed to have greater traction within the deep learning software community and was a more attractive skill for job seekers overall. It is the only standard supported by Google’s TensorFlow and Microsoft’s CNTK, and is the main standard for most other deep learning frameworks.

Today, many artificial intelligence deep learning frameworks (including Caffe2, Chainer, Databricks, H2O.ai, Keras, MATLAB, MXNet, PyTorch, Theano, and Torch) rely on CUDA to provide support for GPUs.

Broad usage base

The biggest advantage of Nvidia chips is that it took nearly 20 years to develop and accelerate graphics chip operations for AI applications–CUDA, this forms a strong moat that is difficult for competitors to cross.

Since most AI systems and applications already run on Nvidia’s CUDA, developers have to rewrite these systems for other processors (such as AMD’s MI 300, Intel’s Gaudi 3, or Amazon’s Trainium) and application, time-consuming and risky.

In short, Nvidia’s CUDA currently dominates the back-end architecture. Replacing the development environment of CUDA is more difficult than replacing chips and channels.

How many customers use it?

During COMPUTEX 2023, Nvidia revealed that CUDA has more than 4 million developers, more than 3,000 applications, and an astonishing 40 million CUDA downloads, reaching an astonishing 25 million times in 2022 alone. In addition, 15,000 new startups have been established on the Nvidia platform, and 40,000 large enterprises around the world are using CUDA for accelerated computing.

Competitors

Closed solutions

AMD’s ROCm

Nvida released CUDA in 2007, and AMD released ROCm as a peer-to-peer solution as late as 2016. CUDA has supported Linux and Windows platforms since its inception. The latter has only supported Linux systems for a long time and does not support updates to some Linux systems. It will only support Windows platforms in April 2023.

Huawei’s CANN

CANN (Compute Architecture for Neural Networks) is a heterogeneous computing architecture launched by Huawei for AI scenarios. It supports multiple AI frameworks, serves AI processors and programming, and supports the computing efficiency of Huawei’s Ascend AI processor. key platform.

Open solutions

OpenCL

OpenCL is CUDA’s first better-known competitor, launched in 2009. However, although the richness of OpenCL looks attractive, it does not perform as well as CUDA on Nvidia GPUs, making the latter increasingly popular. Today, most deep learning frameworks either lack OpenCL support, or provide CUDA first, and then provide an OpenCL version later.

OpenAI’s Triton

OpenAI also developed the artificial intelligence application software Triton in 2019. Engineers from many companies, including Meta, Microsoft and Google, are involved in developing the open source Triton.

Triton was initially only available for Favida GPUs, but now also supports Favorite Gaudi and AMD’s MI300 GPUs. Among them, Meta’s self-developed AI chip MTIA also uses Triton, becoming a potential competitor in the market.

oneAPI

Intel, Alphabet, ARM and Qualcomm are all members of the UXL Foundation, which is developing a CUDA alternative based on Intel’s open source platform oneAPI.

oneAPI is designed for use across different computing accelerator (coprocessor) architectures, including GPUs, AI accelerators and field programmable gate arrays (FPGAs). The main purpose is to eliminate the need for developers to maintain separate code libraries, multiple programming languages, tools and workflows for each architecture; the program only needs to be written once and can be used on different hardware architectures.

ZLUDA

The emergence of ZLUDA, an open source porting project, allows Nvidia’s CUDA and AMD’s ROCm two computing architectures to be used together, and ultimately supports GPU computing.

Mojo

Chris Lattner, a well-known senior engineer who has worked at Apple, Tesla and Alphabet, has launched Mojo, a programming language for AI developers. It focuses on writing AI programming languages across hardware platforms without using CUDA, easing the Programming compatible stress.

Emulator

Scale

A British start-up company has launched a CUDA program compilation tool for AMD, which is free for commercial use. The original program code does not require any modification or conversion, and the AMD chip can also execute programs specifically written for CUDA. This set of software tools is called SCALE, and the developers position it as a GPGPU (general-purpose GPU) programming toolset.

Nvidia’s Countermeasures

No emulation or compatibility schemes

Starting in 2021, Nvidia has prohibited other hardware platforms from using the analog layer to run CUDA software, but only issued a warning in the online EULA user agreement.

Nvida will update the EULA agreement of CUDA 11.6 version in March 2024. One of them states, “You may not reverse engineer, decompile, or disassemble any results generated using this SKD and translate them on non-Nvidia platforms.” There is speculation that this move is aimed at third-party projects such as ZLUDA, in which Intel and AMD are participating, as well as compatibility solutions from Chinese manufacturers such as Denglin Technology GPU+ and MetaX Technology.

Continuous enhancement of functions

Huang Renxi said at the second quarter teaching conference in August 2024: Accelerated operations start with the CUDA-X function library. The new database opens up new markets for Huida, and Huida has launched many new databases, including CUDA-X Accelerated Polars, Pandas and the leading data science and data processing library Spark, as well as CUVI- for vector databases. S.

Disclaimer

The content of this site is the author’s personal opinions and is for reference only. I am not responsible for the correctness, opinions, and immediacy of the content and information of the article. Readers must make their own judgments.
I shall not be liable for any damages or other legal liabilities for the direct or indirect losses caused by the readers’ direct or indirect reliance on and reference to the information on this site, or all the responsibilities arising therefrom, as a result of any investment behavior.

Andy Lin

Stock Investment Books Published
English Blog
English Simple Bio
English Website
Facebook
Twitter Account
LinkedIn Account
股票投資著作書籍
 中文部落格
 中文簡歷
 中文網站
 臉書
 推特帳號
 領英帳號

Author: Andy Lin

Stock Investment Books Published English Blog English Simple Bio English Website Facebook Twitter Account LinkedIn Account 股票投資著作書籍中文部落格中文簡歷中文網站臉書推特帳號領英帳號 View all posts by Andy Lin

How does CUDA strengthen the moat of Nvidia’s monopoly?