The arrival of generative AI or GenAI, is a high-tech industry inflection point like few others. It will not only change how everyone uses technology, GenAI will change society. In terms of impact, it is similar to the disruptive changes driven by the adoption of the PC, the Internet, and cellular connectivity plus the smartphone – perhaps more so. Each of these technologies changed how we learn, how we work, and how we play; essentially changing how we live. While pontificating the potential uses of a new technology is exciting, understanding how industry needs to adapt is imperative if we are to turn that excitement into reality. Through an analysis of past technology inflections, projections of technology advancements, and detailed forecast models for both potential applications and users, Tirias Research has developed estimates for the roadmap of the GenAI inflection, how much it will cost to implement that roadmap and what it means to the technology industry.
This Is Just The Beginning
To understand the impact, it is helpful to first understand where the industry currently is in the generative AI era. An applicable analogy is the evolution of the smartphone. The second-generation (2G) wireless technology, as defined by the 3GPP, was developed around the use of the phone as a voice or SMS text communication device with some basic productivity apps that one might find on a Palm Pilot, which was a digital day planner plus music player and handheld game device. 2G smartphones had limited processing capabilities and lacked a high-resolution camera and broadband connectivity. In addition, applications like social networking did not exist. When the third-generation (3G) smartphones arrived, not only did they bring more capable devices and broadband connectivity, they were also accompanied with applications like social networking and mobile content creation that drove new use models, the need for more local processing and cloud resources, and exponential growth in cellular data use and user-generated data that is still in effect today.
In many respects, we are in the 2G phase of GenAI. GenAI is currently used by limited segments of the population and for basic applications like text-to-text responses, text-to-speech digital assistants, and text-to-image content creation because of its limited capabilities and unresolved ethical questions around its training and use. Additionally, GenAI is predominantly a cloud-centric solution because of the processing resources required to handle the training and inference processing of Large Language Models (LLMs). However, unlike the cellular industry, it will not take the industry 10+ years to reach the next transition.
In terms of semiconductors and systems, the transition has already begun with many embedded, smartphone, PC, and server processors integrating AI-specific accelerators along with evolving memory architectures to support GenAI processing. Over the next five years, each will gain higher performance and efficiency through advanced semiconductor process nodes, architecture design improvements, higher memory capacity and bandwidth, and optimizing the neural network models to better match the application requirements to the processing resources available.
In terms of applications, the transition to new high-performance GenAI applications like video with the rapid advancements in the training, optimization, and adaptation of generative AI models for new high-performance GenAI applications, especially video. While video applications are basically just the generation of hundreds, thousands, or millions of images in succession, they require much more hardware and software resources to generate in real-time and allow for the inclusion of many other technologies, such as real-time avatars and backgrounds for video conferencing, interactive non-player characters (known as NPCs) in games, and unique and personalized interactions in spatial computing (formerly known as metaverses). And, as with any major technology inflection point, there will be even more applications and usage models than we can imagine.
According to Tirias Research Senior Analyst Simon Solotko, “the Tirias Research forecast begins with a forecast of the demand by several distinct types of users, including consumers, pro-consumers, enterprise users, and automated users, essentially machines using generative AI models. This demand is converted into the hardware, environmental, and human resources required to fulfill that demand using the technology available during each segment in time.” According to the Tirias Research demand forecast, text-based use of large language GenAI models like GPT-4, Llama 2, and PaLM 2 will increase 3x in 2024, but will experience a 151x increase by 2028. Similarly, images and videos, which require much more data and processing resources, will increase 4x in 2024 and 167x by 2028. When you consider the cost of the servers, the power, and the human labor to perform all this with cloud-centric processing on premises, in a private or co-located data center, or in a cloud data center, the total costs of operations (TCO) grows exponentially as well from an increase of more than US$1.7 billion in 2024 to over $84 billion in 2028, a figure that may not be practical or economically feasible. Note that this includes the cost of operating live services, AI inference, and not the cost of training models. Today, training is estimated to be the largest contributor to the nascent infrastructure, but it is likely to be rapidly surpassed by inference as the usage of services grows, as projected by the forecast.
The Tradeoff
According to Mr. Solotko, the model assumes a continued increase in model sizes partially mitigated by increasingly sophisticated techniques to improve model performance. As with the example of OpenAI’s GPT, the model sizes have grown approximately 10x from one generation to the next because of the growth in the data set used for each generation, which is roughly two years old. Some would argue this growth cannot continue at this rate, and as history in the technology industry would indicate, that is correct. However, “with larger data sets and models comes greater knowledge and accuracy. And with generative AI still in its infancy as seen by its current limitations, there is still room to grow,” according to Mr. Solotko. A sentiment also echoed recently By OpenAI Sam Altman at the Intel Direct Connect event. Even with efforts to reduce the size of training models for use in inference processing through optimizations, such as quantization and pruning (reference to previous article), greater accuracy will still require larger models and/or the division of general models to domain-specific models.
Additionally, there will continue to be exponential growth in the number of models as it becomes easier to access data sets and cloud resources to create new models. According to Mr. Solotko, “this is an innovation revolution that will continue to push the limits of technology over the next decade.”
Balancing At The Edge
The recent buzz in the technology industry is the potential for doing AI, or more accurately generative AI, at the edge. The term “edge” itself is very nebulous. To some, it means the outer parts of a network, such as a base station, a router, or remote server. To others, it means the point at which data is created and consumed, such as PCs, smartphones, automobiles, and other consumer devices and industrial machines. Regardless of where the boundary lies, doing more processing at the edge will be required for several reasons.
The first reason is to reduce the demands on the data center, as well as the supporting power and communications infrastructure. Tirias Research forecasts the cost of operating data centers to support the entire GenAI demand will be $84 billion plus the cost of building the data center and supporting infrastructure. This will stress the industry’s capability to build and operate the data centers in the time required.
The second reason is the need for performance. Many AI applications can and will require real-time or near real-time processing and the ability to perform functions anywhere, even when wireless communication networks are not available, such as automotive functions.
The third, and most important reason why edge GenAI will be required is personalization. Using generalized AI functions can be useful, but for GenAI to reach its potential, it must be personalized to the application, user, and/or environmental conditions. In other words, it must be contextually aware through the use of local information. A true digital assistant must not only understand a request by the user, but the preferences, location, and limitations of a user. Likewise, a machine must understand not only the limits of its functions but the limits in relation to the operating environment. Much of this data may be considered private or secure, further requiring localized processing.
As a result, the entire industry is looking for ways to do more AI processing at the edge to improve performance through reduced latency, provide increased security for local data, and to provide a more customize or personalized experience based on models which take into account unique local, environmental, and/or personal data. Tirias Research believes that AI at the edge is required to create a complete AI infrastructure that can balance the costs and resource requirements of GenAI with the need to provide a personalized experience. If the industry is successful with bringing AI to the edge, Tirias Research forecasts a potential reduction of 20% of GenAI processing in the cloud at a savings of $16 billion or more in data center operating costs by 2028, and this figure will grow as a percentage of total GenAI processing going forward. While this will not alleviate the demand for GenAI training and inference processing in the cloud, it will enable a more viable and sustainable growth rate for future GenAI data centers.
The Holistic View
Generative AI is driving a new wave of innovation and technology use. As a result, it will require a hybrid infrastructure topology beyond what was required before. A hybrid infrastructure allows for the use of all the resources available on device or in the cloud, as well as the entire communications and power infrastructure that is in place to support both. Tirias Research believes that there will be a balance between edge and cloud processing for GenAI. In many cases, it may involve using resources in the cloud, on-device, or both as Microsoft is proposing with future generations of Copilot. GenAI is more than just building new data centers with the latest and greatest discrete accelerators, it is about creating a hybrid architecture to support the varied needs of its various workloads and ultimately its various customers.
Final Thoughts
Generative AI is poised to change society more than any technology before it. However, in order to realize its full potential, it will also have just as great of an impact on the technology industry that is rapidly innovating to enable it. The hardware, software, and business models supporting GenAI are undergoing rapid evolution while demand is accelerating. Both cloud and edge AI processing will be required. It will be a challenge to meet this demand but so far, the industry is rising to that challenge and significantly benefiting from it in the process.