This year, Nvidia’s annual GTC event returned to an in-person format. It was undoubtedly the biggest one I have ever attended, and I’ve been to every GTC since 2009, when it started. One of the biggest reasons for the growth of GTC is that the company now has a presence in more markets than ever before.
The other reason is that Nvidia is the undisputed market leader in AI computing for the cloud, where most AI computation occurs, both in training and inference. That said, Nvidia does have more competition this year than it did a year ago, and I suspect that the competition will only grow over time. That made this year’s GTC an important one for the company to reassert itself as the market leader to the world and its partners.
Blackwell GPUs — B100, B200, GB200 And More
First and foremost, Nvidia needed to reestablish itself as the leader in GPU technology with market-leading hardware. That’s why Nvidia announced the Blackwell family of products that scale from a single GPU to an entire datacenter of GPUs interconnected with Nvidia’s Mellanox InfiniBand technology, which the company acquired back in 2019. That acquisition has been critical to Nvidia’s ability to build hyperscale and HPC-scale systems with low latency and high bandwidth.
The 208B transistor Blackwell GPU comes in two flavors, combining two 104B transistor GPU dies into a single chip. Both Blackwell’s B100 and B200 variants feature 8 Gbps HBM3E memory on a 4,096-bit bus with 8 TB/s of memory bandwidth derived from 192GB of VRAM. The differences come in performance, with Nvidia quoting a peak FP4 Dense Tensor performance of 7 PFLOPS for the B100 and 9 PFLOPS for the B200. Both GPUs support NVLink 5’s 1800 GB/s of interconnect bandwidth, as well as PCIe 6.0.
Both GPUs are manufactured using TSMC’s 4NP process node. The B100 consumes 700 watts of power and the B200 consumes a hefty 1 kilowatt. Nvidia designed the B100 to be a drop-in replacement for the H100, hence the same 700-watt TDP. The B100 is roughly 80% faster than the H100, which gives you a good idea of how much faster the Blackwell architecture is than Nvidia’s Hopper architecture. This is also Nvidia’s slowest Blackwell part; the B200 is more than 10% faster in most scenarios.
The GB200 “superchip” combines two B200 GPUs paired with a Grace Arm server CPU connected with NVLink interconnects that have 900 GB/s of bandwidth. The GB200 claims 20 PFLOPS of FP4 tensor performance (40 PFLOPS with sparsity), which is more than double that of a single B200. As a combined single unit, the GB200 also has 384GB of HBM3E memory. This combined “superchip” has more than 496 billion transistors; each of the Blackwell dies has 104 billion transistors, and there are four of them on each “superchip,” along with an 80-billion-transistor Grace server chip. The GB200 has a 2,700-watt TDP and comes in two flavors, one for rack mounting and another for more compact DGX/HGX systems.
These GB200s will be interconnected with one another using NVLink into a complete rack that Nvidia is calling the GB200 NVL36 and NVL72. The NVL36 uses 36 of the B200 chips in one rack and 18 single GB200 compute nodes, while the NVL72 uses 18 dual GB200s. This system uses fifth-generation NVLink and NVLink switching systems to interconnect the rack’s GPUs. Nvidia claims 1.33 exaFLOPs of FP4 inference with sparsity, which is incredible when you consider what it took to get an exaFLOP of any kind of performance only a few years ago.
While the Blackwell architecture seems power-hungry, it is huge on space savings and power savings at scale, which reflects the customers that these systems are targeted to serve. Nvidia talked about training a GPT-MoE 1.8-trillion-parameter model, which would require 8,000 GPUs and 15 megawatts in about 90 days using its last-generation Hopper GPUs. By comparison, a Blackwell GB200 NVL72 system training the same 1.8-trillion-parameter model would require only 2,000 GPUs and 4 megawatts of power with the new system, which is important because power is becoming such a serious sticking point for both AI computing and cloud computing in general. Nvidia is definitely telling the right power and performance story for its customers and partners.
NIM For Generative AI
As a leader in the AI market, Nvidia must make sure it can accelerate the adoption of AI services and applications as quickly as possible. One way to achieve that is to ensure that all the latest AI models are optimized for Nvidia hardware—or, in short, to make implementing those models as easy as possible for developers. This is why Nvidia just announced a new catalog of NIM microservices and cloud endpoints for pretrained AI models, all optimized for CUDA-capable Nvidia GPUs. Enterprises can use these NIM microservices for a host of tasks including LLM customization, inference, retrieval-augmented generation and guardrails.
Nvidia will make these microservices available on its website at no charge and integrate them with its AI Enterprise 5.0 software suite. I believe that Nvidia is taking this approach to ensure that generative AI is not slowed down by poorly optimized models or inaccurate results due to a lack of guardrails or RAGs. Nvidia is clearly focusing on improving both time-to-market and quality of results, and I think that’s a positive for the industry and its growth.
MediaTek Auto Cockpit Lineup
At GTC 2024, Nvidia and MediaTek announced the next phase of the two companies’ automotive partnership. Dimensity Auto Cockpit combines MediaTek’s SoC-building capabilities and Nvidia’s GPUs that run Drive OS. At GTC 2024, MediaTek announced four separate 3nm products ranging from mainstream to high-end cockpit solutions. The CX-1 and CY-1 are MediaTek’s premium products that are pin-compatible with one another, while the CM-1 and CV-1 are MediaTek’s more mainstream products that are also pin-compatible with one another. This is an excellent approach because it allows MediaTek’s OEM customers to mix and match designs with the appropriate SoC based on the vehicle’s needs and its price segment.
All of these Dimensity Auto Cockpit chips combine an Arm v9-A CPU with a licensed Nvidia “next-gen” GPU to enable AI and RTX graphics onboard the vehicle. There is also a multi-camera HDR ISP for the many camera features that vehicles will need in the future, as well as an integrated audio DSP for the latest voice assistants to ensure a smooth natural language processing experience. While it’s unclear exactly what kind of AI performance these chipsets will have, AI seems to be a major focus of this partnership. Also, MediaTek will support QNX, Linux and Android Automotive OS from the start. This seems to be a great platform for many of Nvidia’s existing customers such as BYD, which recently became the number-one EV manufacturer in the world. While I do believe that MediaTek’s initial designs will be picked up by Chinese OEMs such as Geely, NIO, SAIC Motor and XPeng, there is a good chance that momentum could bring MediaTek and Nvidia other OEM customers down the road.
Omniverse And Vision Pro
For Nvidia, the Omniverse platform is a powerful tool for many applications. At GTC, Nvidia briefly—and with very little detail—announced that it would support the Apple Vision Pro headset on Omniverse. We can only assume that this will utilize the USD 3-D framework capability of both Omniverse and Apple Vision Pro. Nvidia also announced a partnership with Siemens to power the industrial giant’s Xcelerator platform in the cloud through Nvidia’s Omniverse API, which builds on Siemens’ recent partnership with Sony to create a purpose-built headset for its many enterprise collaboration and creation tools. (My recent article about alliances in spatial computing goes into more detail on the Siemens-Sony partnership.) In fact, I recently got to try this new platform from Sony, which includes custom controllers and a new mixed-reality and VR headset. Considering how important Siemens’ software is to the enterprise space, this announcement is a huge vote of confidence from a very conservative company and an indication that Siemens is getting ready to make spatial computing a standard workflow for its customers.
In addition to the Siemens partnership, Nvidia also announced a partnership with Cadence for Cadence’s Reality Digital Twin tool to improve data center planning and the efficiency of new designs based on Blackwell. Nvidia spoke very highly of Cadence’s simulation tools; Nvidia found these to be critical for enabling new liquid-cooled data centers needed to cool the Blackwell GPUs, and for simulating how they might need to be set up to optimize cooling. Cadence also said that its Reality tool is integrated with Nvidia’s Omniverse platform, which Cadence claims to enable up to 30x faster design and simulation workflows. Cadence also claims that its Reality Digital Twin platform can improve data center energy efficiency by as much as 30%, which will be critical as more data centers consume increasingly more power in the AI boom.
Nvidia did not stop there with its Omniverse announcements—or digital twins. Nvidia also announced its Earth-2 climate digital twin cloud platform for simulating and visualizing weather and climate. This includes a new generative AI model called CorrDiff, which is supposed to generate images with 12.5x higher resolution at astonishing speed. It is already working with Taiwan’s National Science and Technology Center for Disaster Reduction to help mitigate the impact of typhoons that regularly hit the island. One key component of Earth-2’s cloud APIs is the use of Omniverse, which will enable The Weather Company to democratize customer access to its digital twins built on its rich weather data. In migrating to Omniverse, The Weather Company also plans to explore the use of score-based generative AI for its Weatherverse services and Weather Engine for enterprise-grade weather intelligence for its customers.
Last but certainly not least, Nvidia announced its 6G research cloud, powered by Nvidia Aerial Omniverse Digital Twin. Nvidia announced that industry leaders including Ansys, Keysight, Nokia and Samsung are among the first companies to take advantage of this future-looking capability. Nvidia is calling this solution its 6G Research Cloud platform, which is squarely aimed at using AI in the RAN to help plan and build 6G solutions. This effort includes companies such as Arm, Softbank and Rohde & Schwarz in addition to the earlier names.
The 6G Research Cloud has three components: the Aerial Omniverse Digital Twin for 6G, the Aerial CUDA-Accelerated RAN and the Sionna Neural Radio Framework. The first enables companies to build and test 6G radios in a fully virtual environment that considers all of the physics of the real world, then plan deployments and take advantage of Nvidia’s ray-tracing technology to simulate signals. The Aerial RAN is a software-defined full-RAN stack that Nvidia says offers significant flexibility for researchers to customize, program and test 6G networks in real time. Sionna is a framework that integrates with other frameworks including PyTorch and TensorFlow using Nvidia GPUs to generate and capture data and train AI and machine learning models at scale. This will become increasingly important with 6G as more networks become AI-accelerated.
Wrapping Up
Many people referred to GTC 2024 as the Woodstock of AI. Personally, I thought it might have been even bigger than that. Nvidia completely packed the SAP center (and had to turn people away). There were nearly 40 different announcements at GTC 2024, and I had space to cover just a few of them here. Nvidia successfully reasserted its dominance in the AI space; alongside the announcements across GPU, automotive and Omniverse, the company also had a ton of data center announcements, including a Who’s Who of cloud provider partnerships and a big AI Factory announcement with Dell.
I believe that Nvidia has done a great job of making itself the center of the AI universe, both for hardware and software, and it seems like it will sell every Blackwell GPU it can manufacture (with TSMC’s help). That said, it appears that Nvidia will be quite supply-constrained again this year, even though Blackwell is expected to reach GA only at the end of the year. Even Micron, which recently announced it will produce HBM3E for Nvidia and other partners, has already said it is sold out of all its HBM3E production for 2024, much like its competitor SK Hynix announced earlier this month.
The AI craze is just beginning, given that we’re only in the early stages of the implementation of AI and many companies are still figuring out how it works best for their applications. We are also starting to see the rise of the AI PC, which I believe will alleviate some of the demand for AI in the cloud but will also create more demand for other types of AI computing—including in the cloud—as some applications will require faster and faster compute that cannot be run on-device or at the edge. I believe we will see an AI compute flywheel that will continually increase the demand for computing until we have something that resembles AGI. There is a reason why Sam Altman believes we desperately need more AI computing—and Nvidia is leading the charge to supply it.