Introduction to CUDA
What is CUDA?
CUDA is a parallel computing platform and programming model that allows general-purpose computing on Nvidia’s GPUs. The platform enables developers to harness the power of GPUs for parallel computing to accelerate computationally demanding applications.
Release of CUDA
Nvidia released CUDA in 2007. CUDA is a programming environment provided by Nvidia GPU to customers, allowing customers to call and access all functions provided by Nvidia GPU through this software interface.
How was CUDA invented?
Huang Jensen has said: Because in the field of video games we choose, you not only want it to be beautiful, but also want it to be dynamic and be able to create a virtual world. We extend it step by step and introduce it to scientific computing. One of the first applications was molecular dynamics simulations and another was seismic processing, which is basically inverse physics. Seismic processing is very similar to CT reconstruction and is another form of inverse physics. So we solved the problem step by step, expanded to adjacent industries, and finally solved these problems.
Please note: Jensen Huang is not just talking about the computer or video game industry.
Use
General processing
CUDA is Nvidia’s parallel computing platform and application programming interface. Because of CUDA, Nvidia’s GPU is not limited to computer display, but can be used for other purposes and general processing.
GPU and CUDA Mainly used for acceleration
Unlike a general-purpose computer, once the processor is built in, everything eventually works. But GPUs are accelerated computers, which means you need to ask yourself, what are you trying to accelerate? There is no such thing as a universal accelerator.
Algorithms are different for different purposes. If you create a processor that specializes in these algorithms and supplements the CPU with the tasks it is good at, then in theory, you can greatly speed up the operation of the application. The reason is that typically 5% to 10% of the code takes up 99.99% of the running time.
If you use Huida’s GPU and CUDA to process those 5% of the program code, on the accelerator, technically, you can increase the speed of the application by 100 times.
Almost everything related to machine learning is evolving. It can be SQL data processing, Spark type data processing, or vector database type processing, processing unstructured or structured data, which are all data frames. We accelerate these tremendously, but in order to do that, you need a top-level library — and that’s CUDA.
Biggest disadvantage
CUDA’s biggest drawback is its lack of portability, as it only runs on Nvidia’s chips.
Closed but not open
Just like Apple’s approach, CUDA is Nvidia’s unique software interface and can only be used on Nvidia’s hardware. It is a closed software interface that is not open to the public.
Precisely because of its closure, it has strengthened its uniqueness, strengthened Nvidia’s competitiveness, and increased Nvidia’s overall monopoly.
Why does CUDA form a moat?
Better execution efficiency
It is generally considered faster, better supported through a wide range of libraries and software tools, and is generally considered a more mature platform with a wider user base than OpenCL.
Covers a wide range of functions
For example, CUDA provides cuDNN (a library for neural network operations), cuOpt (a library for combinatorial optimization), and cuQuantum (a library for quantum simulations and simulations). , and many other libraries, such as cuDF for data frame processing, SQL-like functionality. So all these different libraries needed to be invented that could reorganize the algorithms in the application so that the Nvidia GPU accelerators could work. If you use these libraries, you can achieve 100 times acceleration and get more speed, which is amazing.
Widely accepted
We noticed that CUDA seemed to have greater traction within the deep learning software community and was a more attractive skill for job seekers overall. It is the only standard supported by Google’s TensorFlow and Microsoft’s CNTK, and is the main standard for most other deep learning frameworks.
Today, many artificial intelligence deep learning frameworks (including Caffe2, Chainer, Databricks, H2O.ai, Keras, MATLAB, MXNet, PyTorch, Theano, and Torch) rely on CUDA to provide support for GPUs.
Broad usage base
The biggest advantage of Nvidia chips is that it took nearly 20 years to develop and accelerate graphics chip operations for AI applications–CUDA, this forms a strong moat that is difficult for competitors to cross.
Since most AI systems and applications already run on Nvidia’s CUDA, developers have to rewrite these systems for other processors (such as AMD’s MI 300, Intel’s Gaudi 3, or Amazon’s Trainium) and application, time-consuming and risky.
In short, Nvidia’s CUDA currently dominates the back-end architecture. Replacing the development environment of CUDA is more difficult than replacing chips and channels.
How many customers use it?
During COMPUTEX 2023, Nvidia revealed that CUDA has more than 4 million developers, more than 3,000 applications, and an astonishing 40 million CUDA downloads, reaching an astonishing 25 million times in 2022 alone. In addition, 15,000 new startups have been established on the Nvidia platform, and 40,000 large enterprises around the world are using CUDA for accelerated computing.
Competitors
Closed solutions
AMD’s ROCm
Nvida released CUDA in 2007, and AMD released ROCm as a peer-to-peer solution as late as 2016. CUDA has supported Linux and Windows platforms since its inception. The latter has only supported Linux systems for a long time and does not support updates to some Linux systems. It will only support Windows platforms in April 2023.
Huawei’s CANN
CANN (Compute Architecture for Neural Networks) is a heterogeneous computing architecture launched by Huawei for AI scenarios. It supports multiple AI frameworks, serves AI processors and programming, and supports the computing efficiency of Huawei’s Ascend AI processor. key platform.
Open solutions
OpenCL
OpenCL is CUDA’s first better-known competitor, launched in 2009. However, although the richness of OpenCL looks attractive, it does not perform as well as CUDA on Nvidia GPUs, making the latter increasingly popular. Today, most deep learning frameworks either lack OpenCL support, or provide CUDA first, and then provide an OpenCL version later.
OpenAI’s Triton
OpenAI also developed the artificial intelligence application software Triton in 2019. Engineers from many companies, including Meta, Microsoft and Google, are involved in developing the open source Triton.
Triton was initially only available for Favida GPUs, but now also supports Favorite Gaudi and AMD’s MI300 GPUs. Among them, Meta’s self-developed AI chip MTIA also uses Triton, becoming a potential competitor in the market.
oneAPI
Intel, Alphabet, ARM and Qualcomm are all members of the UXL Foundation, which is developing a CUDA alternative based on Intel’s open source platform oneAPI.
oneAPI is designed for use across different computing accelerator (coprocessor) architectures, including GPUs, AI accelerators and field programmable gate arrays (FPGAs). The main purpose is to eliminate the need for developers to maintain separate code libraries, multiple programming languages, tools and workflows for each architecture; the program only needs to be written once and can be used on different hardware architectures.
ZLUDA
The emergence of ZLUDA, an open source porting project, allows Nvidia’s CUDA and AMD’s ROCm two computing architectures to be used together, and ultimately supports GPU computing.
Mojo
Chris Lattner, a well-known senior engineer who has worked at Apple, Tesla and Alphabet, has launched Mojo, a programming language for AI developers. It focuses on writing AI programming languages across hardware platforms without using CUDA, easing the Programming compatible stress.
Emulator
Scale
A British start-up company has launched a CUDA program compilation tool for AMD, which is free for commercial use. The original program code does not require any modification or conversion, and the AMD chip can also execute programs specifically written for CUDA. This set of software tools is called SCALE, and the developers position it as a GPGPU (general-purpose GPU) programming toolset.
Nvidia’s Countermeasures
No emulation or compatibility schemes
Starting in 2021, Nvidia has prohibited other hardware platforms from using the analog layer to run CUDA software, but only issued a warning in the online EULA user agreement.
Nvida will update the EULA agreement of CUDA 11.6 version in March 2024. One of them states, “You may not reverse engineer, decompile, or disassemble any results generated using this SKD and translate them on non-Nvidia platforms.” There is speculation that this move is aimed at third-party projects such as ZLUDA, in which Intel and AMD are participating, as well as compatibility solutions from Chinese manufacturers such as Denglin Technology GPU+ and MetaX Technology.
Continuous enhancement of functions
Huang Renxi said at the second quarter teaching conference in August 2024: Accelerated operations start with the CUDA-X function library. The new database opens up new markets for Huida, and Huida has launched many new databases, including CUDA-X Accelerated Polars, Pandas and the leading data science and data processing library Spark, as well as CUVI- for vector databases. S.
Related articles
- “How does CUDA strengthen the moat of Nvidia’s monopoly?“
- “Nvidia’s new business set to grow its share price“
- “Top vendors and uses of GPU“
- “ASIC market is getting bigger, and related listed companies in the US and Taiwan“
- “How does nVidia make money, Nvidia is changing the gaming rules“
- “The reasons for Nvidia’s monopoly and the challenges it faces“
- “Why nVidia failed to acquire ARM?“
- “Revisiting Nvidia: The Absolute Leader in Artificial Intelligence, Data Center, and Graphics“
- “Will Intel go bankrupt?“
- “How does Intel make money? and the benefits to invest in it“
- “Intel’s current difficult dilemma“
- “How AMD makes money? A rare case of turning defeat into victory“
- “Why is AMD’s performance so jaw-dropping?“
- “Qualcomm diversifies success, no nonger highly dependend on phone“
- “How does the ubiquitous Arm make money?“
- “Arm relisted successfully and it’s ushering in its best days“
- “Data center, a rapidly growing semiconductor field“
- “Top five lucrative artificial lucrative intelligence listed companies“
- “The artificial intelligence bubble in the capital market is forming“
- “Artificial intelligence benefits industries“
- “OpenAI, the Generative Artificial Intelligence rising star and ChatGPT“
- “Major artificial intelligence companies in US stocks market“
Disclaimer
- The content of this site is the author’s personal opinions and is for reference only. I am not responsible for the correctness, opinions, and immediacy of the content and information of the article. Readers must make their own judgments.
- I shall not be liable for any damages or other legal liabilities for the direct or indirect losses caused by the readers’ direct or indirect reliance on and reference to the information on this site, or all the responsibilities arising therefrom, as a result of any investment behavior.