NVIDIA’s Ultimate Chip Arrives, Huang’s Dream Realized, CUDA Rocks?

Article | AI Technology

“Don’t Miss the Decisive Moment of AI!”

When Huang Renxun presented this viewpoint at the NVIDIA GTC 2023, many ordinary people like me might not have paid much attention to it. After all, at that time, AI applications were scarce, with only a few digital enthusiasts deploying local AI painting at home. The large language models available to the general public only brought a momentary novelty.

Of course, now it seems that that time point may indeed have been a decisive moment for AI.

Early this morning, NVIDIA’s GTC 2024, hailed as an AI trendsetter, took place at the SAP Center in San Jose, California. The much-anticipated NVIDIA’s founder and CEO, Huang Renxun, took the stage to deliver the keynote speech “Witnessing the Moment of AI Revolution.” Old Huang did not deceive us; indeed, we witnessed a moment of AI revolution.


(Image Source: techovedas)

During this two-hour speech, Huang Renxun unveiled the company’s most powerful AI accelerator card to date — the Blackwell GB200 super chip system, a fully configured full-version GB200 server system, and the latest developments from NVIDIA in AI software (NIM microservices), Omiverse Cloud (Earth simulation), and embodied intelligence (robots), among other technologies.

How powerful is this wave of AI chip infrastructure brought by Old Huang? What changes will it bring to the AI large model industry? Let’s find out.

Blackwell GB200: The Most Powerful AI Accelerator Card

In the field of artificial intelligence, computing speed is crucial. To perform a large amount of parallel computation on homogeneous data in the shortest time possible, a GPU needs a large amount of data input to train a complex neural network model. It can be said that the GPU is the cornerstone of AI large model training platforms and even the decisive computing base.

Therefore, the protagonist of this speech was naturally NVIDIA’s core product “Blackwell B200” GPU chip.


(Image Source: NVIDIA, on-site comparison of Blackwell architecture and Grace Hopper architecture of GPUs)

As the first new product of NVIDIA’s Blackwell architecture, Blackwell B200 is built on TSMC’s 4nm process and features a dual-core design that connects two dies into one GPU, resulting in 20.8 billion transistors per GPU chip.

Compared to the 8 billion transistors on the previous generation GH100 GPU, Blackwell B200 GPU represents a significant breakthrough, even aligning with Moore’s Law, which states that “the number of transistors that can be integrated on an integrated circuit will double approximately every 18 months.”


(Image Source: NVIDIA)

Huang Renxun stated that with such architecture upgrades, the AI performance of Blackwell B200 can reach 20 PFLOPS, while the H100 is only 4 PFLOPS. Theoretically, this could improve the efficiency of inferencing for Large Language Models (LLMs) by 30 times, enabling AI companies to train larger and more complex models.

What’s even more remarkable is that based on B200, Old Huang also launched a complete AI chip set — Blackwell GB200, consisting of two Blackwell B200 GPUs and a Grace CPU based on Arm.

In a GPT-3 LLM benchmark test with 175 billion parameters, NVIDIA claims that the performance of GB200 is 7 times that of H100, and the training speed is 4 times that of H100.


(Image Source: NVIDIA)

This AI performance has reached a differential level.

Of course, if GB200 still doesn’t meet your needs, NVIDIA has prepared a series of server arrays composed of Blackwell GB200, with the highest achieving the GB200 NVL72 system composed of 72 B200 GPUs, providing up to 720 PFlops of FP8 precision training power, reaching the level of the previous generation DGX SuperPod supercomputer cluster.

(Image Source: Nvidia)

Nvidia’s ultimate chip has arrived, fulfilling Huang’s dream – CUDA rocks!

Compared to the H100, it can reduce costs and energy consumption by a quarter.

Earlier this year, the renowned American magazine “The New Yorker” reported that ChatGPT consumes over 500,000 kilowatt-hours per day, equivalent to 17,000 times the average electricity consumption of a U.S. household. As Musk mentioned, in the foreseeable future, the shortage of electricity will become a major factor limiting the development of AI.

(Image Source: Business Insider)

Huang expressly stated that previously, training a 1.8 trillion parameter model required 8000 H100 GPUs and about 15 megawatts of power. Now, this can be achieved with only 2000 B200 GPUs, consuming just 4 megawatts of power.

Such remarkable parameters have led foreigners to exclaim, “Moore’s Law has been rewritten!”

It can be anticipated that in order to continue attracting customers in the domestic market, Huang Renxun is likely to introduce a special edition AI accelerator card Blackwell B20 GPU based on the new generation AI graphics processing architecture.

However, given the explicit export restrictions on computing power by the U.S. Department of Commerce, the extent to which this Chinese special edition GPU can increase production capacity and whether it can engage in healthy competition with domestically produced alternative AI accelerator cards remains unknown at the moment.

From simulating Earth to humanoid robots

Judging by the global enthusiasm, the advent of generative AI has garnered a wide consensus. The question now arises: what exactly can we do with AIGC? Today, Huang provided some standard answers to this query.

Have any of you played a game called “SimEarth”? The developer MAXIS created a miniature Earth on relatively underpowered computers, allowing players to play the role of gods, managing aspects like the Earth’s terrain, atmosphere, biology, civilizations, and constructing a thriving planet.

(Image Source: MAXIS Studio)

Now, Nvidia is leveraging the capabilities of large models to create a digital twin of Earth—Earth-2.

Earth-2 is an AI physics environment created by Modulus running in Nvidia’s Omniverse at a million-fold speed, aiming to achieve data center-scale global simulation environments. Ultimately, cloud computing and artificial intelligence technologies will be used to simulate and visualize weather conditions.

(Image Source: Nvidia)

By combining traditional weather models with Nvidia’s meteorological models, it is possible to explore forecast ranges of hundreds to thousands of square kilometers, providing information such as the path of hurricanes to minimize property damage. This technology is expected to be open to more countries and regions in the future.

Yes, the joke about simulating Earth on the PS3 years ago seems to be coming true now.

(Image Source: PS3)

Next, let’s talk about humanoid robots.

In recent years, humanoid robots have become a popular research trend in the scientific industry. Apart from Musk’s highly anticipated Tesla Optimus, companies like Boston Dynamics, Agility Robotics, UBTECH, Xiaomi, iFlytek, Zhuyuan Robotics, and Science Robotics, both domestic and foreign, are exploring this path.

With the continuous iteration and upgrade of large models, the rapid improvement in intelligent generalization abilities, many in the industry have envisioned the potential prospects of humanoid robots. Instead of driving robots with various data through repeated debugging, it is suggested to use large models as the brain and let robots serve as shells. This way, the large model can perceive, move, interact with the environment, obtain information, make judgments, and take action.

And this is considered one of the ultimate forms of artificial intelligence—embodied intelligence.

(Source: NVIDIA)

Today, NVIDIA has launched the world’s first humanoid robot universal base model – Project GR00T. Robots powered by this model will be capable of understanding natural language and imitating actions by observing human behavior. Users can teach them a variety of skills on this foundation to quickly learn and adapt to the real world for interactive purposes.

Jensen Huang firmly believes that embodied intelligence will lead the next wave of artificial intelligence.

Seeing this, it makes Xiao Lei want to say, “You guys at YuBiXuan, hurry up and collaborate with NVIDIA. Your robot’s ‘body’ must be empowered by NVIDIA’s Project GR00T ‘brain’ in order to become a true intelligent robot.” With the advent of Project GR00T, the era of real robots may be approaching, representing the ultimate application of AI: transforming artificial intelligence into something “human-like.”

After a decade of longing, NVIDIA’s CUDA is truly “cool.”

During the opening keynote at GTC 2024, Jensen looked back on NVIDIA’s history.

In 2014, Jensen Huang first emphasized the importance of machine learning and introduced the concept of CUDA (Compute Unified Device Architecture). While many still viewed NVIDIA merely as a manufacturer of “gaming graphics cards,” they were already at the forefront of the AI revolution.

However, back then, CUDA’s main applications were in scientific computing fields such as climate simulation, physics modeling, and bioinformatics – valuable but narrow in scope. Because of this, CUDA by NVIDIA had not yet penetrated the market, leaving returns not commensurate with the significant research and development investments. Every year, Jensen Huang had to explain to the board why NVIDIA persisted with CUDA. Perhaps at that time, even Jensen himself did not know that NVIDIA’s CUDA would experience a surge in various computational scenarios over the following years, such as blockchain mining and AI large-scale model computations, leading to remarkable success.

(Source: NVIDIA)

In a mere two years, NVIDIA built a trillion-dollar AI empire through the H100 and H200 chips, surpassing the market value of traditional giants like Amazon. With this momentum, it is entirely possible for NVIDIA to surpass Apple and Microsoft to become the world’s top leader in the foreseeable future.

Currently, there is an overwhelming demand for NVIDIA’s “cards.” Not only tech giants in China like ByteDance and Baidu are hoarding cards in response to extreme situations, but Silicon Valley tech titans such as Microsoft and Meta are also all reaching out to Jensen Huang to purchase cards.

Even though the AI and AI chip market is becoming increasingly crowded, and trade policy conflicts are somewhat constraining for Jensen, judging from his entire conference speech, he still holds great confidence in the newly released B200 and GB200. His belief in the vision of the whole world being empowered by AI remains unwavering.

In the so-called “AI Application Year” of 2024, NVIDIA’s CUDA (Compute Unified Device Architecture), as its name suggests, is becoming more universal. From foundational technologies like large language models, conversational AI, edge computing to applications like intelligent cockpits, autonomous driving, humanoid robots, and even AI smartphones, AI PCs, AI home appliances, AI search engines, AI art, and future applications like climate prediction, computational lithography, and 6G networks – AI is omnipresent, and NVIDIA’s computing power is ubiquitous, becoming “universal computing.”

NVIDIA’s CUDA is truly “cool.”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.