An innovator in silicon and systems technology, Cerebras made several key announcements last week: the next generation wafer scale engine AI processor (WSE-3) and server (CS-3), the next Cerebras supercomputer Galaxy Condor 3 (CG-3) based on the CS-3, and a partnership with Qualcomm to support inference processing.

Continuing The Momentum

The past year has been a significant year for Cerebras. With a partnership with G42, an AI development holding company based in Abu Dhabi, Cerebras went from being a systems vendor to also being a service provider with a plan to build three supercomputing centers in the US, later expanded to nine, based on its AI platforms. It also marked a transition from being a niche technology provider to a competitor in the AI training space. This is significant because most AI startups had a simple business model – develop some intellectual property (IP) and then sell the company to a larger semiconductor, systems OEM, or hyperscaler for a big pay day, which is why most of the early AI startups failed. Few recent semiconductor startups have a business plan to be an ongoing entity. The two that come to mind are Ampere and Cerebras, both of which have become semiconductor success stories.

Cerebras has significant engineering prowess that differentiates it from the competition. With each new product generation, the company has overcome major engineering challenges. First with the ability to design, manufacture, and operate a single chip the size of a 200mm (12 inch) silicon wafer, dubbed “wafer scale engine” or WSE, to be able to train some of the world’s largest language models in an efficient and timely manner with high accuracy. Early sales success came from working with government and commercial entities with large data sets and unique challenges, such as pharmaceutical research. The company now touts a wide variety of customers in healthcare, energy, and other industry segments, as well as hyperscalers.

The second major engineering challenge was the ability to scale the platform across multiple systems for a data center scale solution. Cerebras introduced the CS-2 in 2022. In a partnership with G42, Cerebras built its first two supercomputers in 2023 the Condor Galaxy 1 (CG-1) and Condor Galaxy 2 (CG-2) in California. Both achieved four exaFLOPS of AI compute performance with FP16 data precision at just 2.5MW of power, a fraction of a traditional data center.

Cerebras is continuing that engineering and market momentum with its third generation of solutions. This begins with the third generation of the wafer scale engine, the WSE-3, that once again sets a record for the number of transistors in a single chip design. Built on the TSMC 5nm process, the WSE-3 features four trillion transistors, which includes 900,000 processing cores optimized for sparse linear algebra and 44 GB of on-chip memory. The result is 125 petaFLOPS (1015 or one thousand million million floating point operations per second) of AI performance. As a result, there really is no fair comparison to any other semiconductor solution in terms of size or single chip performance. However, Cerebras does not sell chips, they sell large complex servers. The new server is called the CS-3, which features a new chassis designed. According to the company, the CS-3 delivers twice the performance for the same power and price as the previous generation CS-2. By that measure, Moore’s Law is very much alive! In addition, up to 2048 CS-3 can be clustered together, a 10x increase over the CS-2, for a total of 256 exaFLOPS (1018 FLOPS) of AI performance.

A New Level Of AI Training

What this absurd level of performance enables is to the training ever increasing Large Language Models (LLMs) for generative AI in an efficient manner. This is particularly suited for the one trillion and larger LLMs. According to Cerebras, a single CS-3 can train an entire trillion parameter model while exponentially reducing the time and code required, resulting in 10x better FLOPS per dollar and 3.6x better compute performance per watt over some of the currently deployed AI training platforms. Note that Tirias Research cannot verify these figures.

In addition to the CS-3 platform, Cerebras announced that the construction of the Condor Galaxy 3 (CG-3) supercomputer in now underway in Dallas, Texas. CG-3 will provide 8 exaFLOPS of AI performance starting in Q2 2024. This is the third in what is now a plan to develop nine supercomputing data centers by the end of 2024, a very ambitions buildout plan.

Moving From Training To Inference

To offer customers a solution to transition from training complex traditional and generative AI models to efficiently inference processing those models, Cerebras is partnering with Qualcomm. Qualcomm announced the Cloud AI 100 for AI inference processing in 2020, and the Cloud AI 100 Ultra optimized for generative AI inference processing in November 2023. The Cloud AI 100 platform leverages Qualcomm’s expertise in power efficient processing specifically for AI and generative AI neural network models. However, through the use of sparsity, speculative decoding, MX6 weight compression, and model optimization (all topics for deeper technical articles), Cerebras and Qualcomm believe that they can further increase the efficiency of inference processing and are working together to do so. According to the companies, the use of these innovative techniques can result in up to a 10x increase in tokens per dollar.

The inference platforms will not be included in the Condor Galaxy data centers. However, based on the PCIe add-in card design for traditional servers, the Qualcomm Cloud AI 100 Ultra solutions can easily be located in any private or public data center, such as AWS.

A Competitive Future

This recent success of Cerebras is good for both the company and the industry. Tirias Research believes that we are still in the early days of the AI era, especially for generative AI. Because there is no one size or type of neural network model to meet the needs of every application or user, there will be a plethora of models. As a result, the electronics industry must adapt to the needs of the application and the user, which can range from processing on a consumer or IoT device to massive cloud resources. This is driving the need for more innovative and differentiated solutions in the market. Seeing a startup like Cerebras join the ranks of the semiconductor and system heavy weights brings the promise of more diversity and innovation to the industry to solve what promises to be more complex problems.

Share.
Exit mobile version