When I wrote about XAI’s Colossus project a few weeks ago, the scale of the initiative seemed, for lack of a better word, dizzying.
You can look back at the video with Patrick from ServetheHome leading the audience through the halls of a supercomputing cluster ostensibly created to power XAI’s Grok chatbot: you can see row after row, rack after rack, of processors set in powerful motion, like a massive army, marching in lock-step and served with enormous amounts of electricity and water.
According to recent news, a major vendor of AI hardware is now claiming that three of its major customers are planning projects on the colossal scale of Colossus. Broadcom reports that no fewer than three of its clients may develop data centers with 1,000,000 processors by 2027.
This came from the company Q4 earnings call Dec 12, where leaders cited a 220% spike in revenue for the year.
“As you know, we currently have three hyperscale customers who have developed their own multi-generational AI XPU roadmap to be deployed at varying rates over the next three years,” President and CEO of Broadcom Hock Tan reportedly said on the call. “In 2027, we believe each of them plans to deploy 1,000,000 XPU clusters across a single fabric.”
For reference, Colossus started by reporting a need of 100,000 GPUs. After quickly doubling that order, Musk then came out with the frank update: that the center would require a cool million of the Nvidia GPUs that run there. Industry leaders like Jensen Huang marveled over the breakneck pace of the data center’s construction: so did notable journalists. It seemed like a unicorn. But is this kind of project, within the next few years, going to become de rigueur?
Who Could It Be Now?
The available reporting clarifies that there are no specific identifications of who is planning these enormous designs. Ask ChatGPT, and it will confirm this, although the model does give a list of top companies that it believes are more capable of scaling up these kinds of plans, including:
· Nvidia
· Microsoft
· Amazon Web Services
· OpenAI
· Meta
· Tesla
In addition, ChatGPT cites a collection of Chinese companies that have their own large data center capabilities, including Alibaba, Tencent and Baidu.
When you dig a little deeper, it looks like most of these companies are pretty far from this metric. For instance, Mark Zuckerberg has publicly said that the company is aiming for about 350,000 data center GPUs by the end of this year. Google has estimated a total of 2 million GPUs in all of its global operations, not just one center.
As for major clusters at AWS, where the hardware provides B2B services, one of the official numbers is that AWS has built a virtual supercomputer performing 9.95 petaflops, which looks like it would require a couple of hundred Nvidia H100s.
So a million GPUs is still a really big deal.
Major Concerns
While the news is impressive, some are likely to be less than enthused about the idea of multiple data centers of this size.
If we go back to Colossus, we have critics on the record talking about what they fear in terms of resource competition by this power-hungry behemoth.
For one thing, Colossus is estimated to need up to 1,000,000 gallons of water per day – where you often see per capita municipal usage of around 100 gallons per day.
There’s also the use of natural gas turbines to provide all of that energy…
Experts have been suggesting for a while now that the U.S. is poised to expand nuclear power infrastructure, in order to power data centers, and specifically, to locate the power sources right next to the superclusters in order to make things more efficient.
But even if they’re 100% efficient, it’s obvious that the power needs are going to be immense.
As key journalists and others covering this issue have been pointing out, things are moving at an incredibly rapid pace.
We’re going to have to figure out what impact these supercomputers are making on our society, because even though Colossus is the only one publicly identified as in the works, others are likely to be here soon.