Nvidia’s growing impact on enterprise infrastructure was central at its recent GTC conference. GTC is the largest AI-focused event in the industry, bringing together nearly the entire AI ecosystem.
Applications and foundation models may provide enterprise value and drive investment, but the specialized infrastructure required to support AI makes modern AI practical. Nvidia sits at the center of it all, enabling cloud providers and on-prem solution providers alike.
Nvidia is a Platform Company
The big news from Nvidia is the launch of its next-generation Blackwell accelerators, which will bring new levels of capability to AI training and high-performance inference for generative AI. Nvidia’s new BH200 …
While customers will likely have access to raw GPUs, Nvidia packages its accelerators as system-level solutions to provide a turnkey, optimized, and efficient solution for enterprise AI. This starts with the Nvidia GB200 NVL72, an advanced rack-scale AI supercomputer designed for large-scale AI and HPC challenges.
It features the Grace Blackwell Superchip, which integrates high-performance NVIDIA GPUs and CPUs with a 900 GB/s NVLink-C2C interface for seamless data access. This architecture delivers 80 petaflops of AI performance, 1.7 TB of fast memory, and support for up to 72 GPUs.
Nvidia introduced its DGX SuperPOD with DGX GB200 systems, scaling things up even further. This SuperPOD is scalable to tens of thousands of GPUs, utilizing Nvidia GB200 Grace Blackwell Superchips for tackling trillion-parameter models.
This next-generation system ensures constant uptime with full-stack resilience. It features an efficient, liquid-cooled design for extreme performance. It integrates Nvidia AI Enterprise and Base Command software, streamlining AI development and deployment while maximizing developer productivity and system reliability.
AI Continues to be Cloud First
Nvidia is laser-focused on breaking out of the GPU business and delivering systems-level solutions to the market. This has caused some recent tension among the cloud service providers who prefer to build their own solutions, but that tension seems to be fading.
Nvidia and Amazon’s AWS, the last CSP to announce support for the current generation DGX clou, jointly announced a strategic engagement extending beyond just DGX support and including joint development of a new AI supercomputer as part of their revamped Project Ceiba.
Oracle Cloud, one of Nvidia’s first DGX partners, also announced broad support for the GPU giant’s new systems. Taking things further, Oracle will offer Nvidia’s Bluefield-3 DPUs as part of its networking stack, giving its customers a powerful new option for offloading data center tasks from CPUs.
Microsoft Azure announced support for Nvidia’s new Grace Blackwell GB200 and advanced Nvidia Quantum-X800 InfiniBand networking. Similarly, Google Cloud will support Nvidia’s GB200 NVL72 systems, which combine 72 Blackwell GPUs and 36 Grace CPUs interconnected by fifth-generation NVLink.
OEMs are Ready for AI
Despite the common belief, AI is not a cloud-only play. Dell Technologies, HPE, Supermicro, and Lenovo all have substantial AI-related businesses. In their latest earnings, Dell and HPE reported a healthy AI-related server backlog of about $2 billion each.
Nvidia lent its support to the on-prem story with a joint announcement with Dell that the two companies will collaborate on a new AI Factory initiative. Dell’s AI Factory combines Dell’s robust portfolio of computing, storage, networking, and workstations. The integration includes Nvidia’s Enterprise AI software suite and the Nvidia Spectrum-X networking fabric, ensuring a seamless and robust AI infrastructure.
Dell also announced updates to its PowerEdge server line-up to support Nvidia’s next-generation accelerators, including introducing a powerful new liquid-cooled eight-processor server.
Lenovo introduced new ThinkEdge servers designed for AI. Its new liquid-cooled eight-processor ThinkSystem SR780a V3 server boasts efficient power usage effectiveness. At the same time, the Lenovo ThinkSystem SR680a V3 is an air-cooled server that supports AI acceleration with Intel processors and a range of Nvidia GPUs. Finally, The Lenovo PG8A0N is a 1U node with open-loop liquid cooling for accelerators and supports the new Nvidia GB200 Grace Blackwell Superchip.
Hewlett Packard Enterprise didn’t introduce new servers but announced new capabilities for its targeted generative AI solutions. HPE and Nvidia are collaborating on new HPE Machine Learning Inference Software, allowing enterprises to rapidly and securely deploy ML models at scale. The latest offering will integrate with Nvidia NIM to deliver Nvidia-optimized foundation models using pre-built containers.
Storage Adapts to AI
Storage for AI training is fundamentally different from traditional enterprise storage. AI places new demands on throughput, latency, and scalability. While conventional storage architectures can serve moderate AI infrastructure, large training clusters may require highly scalable parallel file systems. Both approaches were on full display at GTC.
Weka and VAST Data are engaged in a cut-throat battle to provide the data infrastructure for AI service providers, each hard to avoid at GTC. Weka announced a new system that sees its software achieving Nvidia DGX SuperPOD certification. At the same time, VAST Data showed off its recently released Bluefield-3 solution for providing scalable storage for large scalable AI clusters.
Hammerspace is also in the mix with its news that Meta is using Hammerspace technology in Meta’s recently announced 48K GPU cluster.
On-prem, it’s still about traditional approaches to storage. Pure Storage announced new support AI workloads, including an RAG pipeline, Nvidia OVX Server Storage Reference architecture, new vertical-specific RAG models with Nvidia, and an expanded set of partners with ISVs like Run.AI and Weights & Biases.
Similarly, NetApp announced new RAG-focused services based on Nvidia NeMo Retriever microservices technology.
Analyst’s Take
There’s still much to be said about GTC, including the clear trend towards liquid-cooled solutions, infrastructure for inference, the push of AI to the edge, and even AI for cybersecurity. All of these things, though, build atop the infrastructure that Nvidia is delivering through its cloud and OEM partners.
While AI remains at the center of the technology world, that impact is broadening. Cloud providers are deploying increasingly richer solution stacks, but on-prem use is growing. Inference is increasingly important, driving the need for AI infrastructure both on-prem and at the edge.
Despite the broad impact of AI, the required infrastructure is increasingly defined by a single company. Nvidia continues to take a platform-centric approach, moving beyond GPUs to provide integrated, system-level AI solutions. Beyond its new Blackwell accelerators, the Nvidia GB200 NVL72 systems and corresponding SuperPOD solutions demonstrate this focus.
Nvidia drives the AI market with its strategy unfolding with precision and foresight. The company isn’t just selling chips; it’s crafting ecosystems that are helping to propel enterprises into the AI age.
Disclosure: Steve McDowell is an industry analyst, and NAND Research is an industry analyst firm that engages in, or has engaged in, research, analysis and advisory services with many technology companies, including those mentioned in this article. Mr. McDowell does not hold any equity positions with any company mentioned in this article.