The networking infrastructure market is more dynamic and interesting than it’s been in decades—and AI is going to make it more interesting.

AI is a voracious consumer of data, whether that’s in the hyperscaler cloud, fueling the machinations of a large language model (LLM), or at the edge, where private infrastructure must gather and securely transmit data to any type of destination to be used for a variety of applications. This means more demand for network connections.

What’s interesting about AI is that not only will create new market for networking infrastructure hardware and software, but it will also boost traditional networking markets such as datacenter and enterprise, because of the new demands for data.

All of this has dozens of networking player positioning for new markets that have been relatively static for decades. Cisco has dominated the networking market since the Internet bubble days, with estimates of 50-60% market share in enterprise and datacenter networking markets. This lack of competitive dynamics has made the market a big dull. But that’s started changing in recent years, with competitors such as Arista Networks taking share in the cloud hyperscale markets. The looming merger of Juniper Networks and HPE also provides a twist, with that combination possibly taking the number two spot in networking. With Juniper ramping up its AI networking roadmap, it will become a more strategic asset for HPE. At the same time, NVIDIA, the leader in chips for AI infrastructure, has also built its own full networking stack optimized for AI, jumping in front of the networking incumbents for AI workloads for hyperscaler LLMs.

Networking innovation also abounds. Startups such as Arrcus and DriveNets are attacking AI with a disaggregated hardware and cloudscale network operating system (NOS) approach. Hedgehog and Aviz Networks are leveraging the open-source Software for Open Networking in the Cloud (SONiC) NOS as well as cloud tools such as Kubernetes. And because AI requires more connectivity to more data, you can expect it to give a boost to multicloud networking, which features hot startups such as Alkira, Aryaka, Aviatrix, Graphiant, Itential, and Prosimo, among others.

This will all be great for the market. Networking buyers have more options than ever before. And they’ll be able to choose between many approaches, whether that’s a full networking stack provided by AI infrastructure leader NVIDIA, best-of-breed networking with established companies such as Cisco and HPE/Juniper, or innovative startup solutions.

We’ll add more on the competition later, but first let’s look at the analysis of why networking for AI has different requirements.

Why AI Networking Is a New Market

It looks like AI applications will take many forms, ranging from huge cloud LLMs to other use cases, including small language models (SLMs) used in private clouds for specific vertical applications. For example, AI can be used to train a generalized chatbot to help with chat and writing, but it can also be used to develop drugs using customized data—or optimize a manufacturing site.

The first thing to understand is that AI networking often has different requirements from traditional networking. The transition from general-purpose computing to accelerated computing requires new software and distributed networking architectures to connect, move, and process data at lightning-fast speeds, with very low latency and almost no tolerance for data loss. This isn’t networking in your local coffee shop.

The arms race to build huge LLM clouds has also spurred demand for specialized processors such as SmartNICs, IPUs, and DPUs to boost performance of networking, security, and storage functions of AI networks. But there are more areas to watch: Networking players will use a variety of architectures, software, and components to build more economical infrastructure to access AI models, whether those are at the edge or in the cloud. Whether connecting chips within supercomputers, interconnecting servers in AI clusters, or linking those clusters to the network edge, existing technologies must evolve to sustain the performance demanded by AI applications.

Futuriom recently spent months examining the end-user requirements for AI workloads in a detailed report on AI Networking. The market is already starting to segment, and it falls into two categories:

1) Training. This is the step in which LLMs such as ChatGPT, Llama, Claude AI, and Mistral are trained by repeatedly running billions of parameters against neural networks to forge a system that recognizes words, images, sounds, etc. These LLMs are fundamental to AI applications. SLMs will also require unique networking solutions.

2) Inference. This is the process of adapting an LLM or SLM to work with specific sets of data to create an AI application that delivers information, solves a specific problem, or completes a task. A bank, for instance, may adapt Claude AI to streamline customer service at ATMs by running the model against anonymized data from multiple transactions. This is often referred to as the “front end” of AI, and it also requires processing and networking capabilities closer to the customer.

Training and inference both call for features not present in traditional or general-purpose client-server networks or, for that matter, in high performance computing (HPC) networks based on that paradigm.

The new needs include the following: higher capacity (scaling to 400 Gb/s and 800 Gb/s), higher throughput, lower latency, high reliability, faster access to storage, optimized clustering, and high compute utilization—to only name a few.

Let the Competition Begin!

With AI continuing to capture the business world’s imagination with its potential for productivity gains and new digital products, it’s understandable there is excitement about the AI infra buildout. However, with the revenue and productivity gains still elusive, this is a multi-year cycle—if not a multi-decade cycle—in which changing business models and architectures are to be expected.

The AI networking market, which is estimated to be abou 10-15% of the total AI infrastructure budget, will certainly be billions of dollars, but it is starting from a low level. Arista Networks CEO Jayshree Ullal is on the record expecting $750 million networking revenue directly connected to AI buildouts in the next year, but that number is expected to grow fast.

The market for AI networking has this far been described as Infiniband vs. Ethernet,, because NVIDIA’s early lead in connecting GPUs with networking was focused on Infiniband technology, which has special low-latency and lossless characteristics. However, Ethernet solutions are now coming to market, and NVIDIA also has Ethernet-based technology with its Spectrum-X platform. As more Ethernet-based solutions come to market, the AI networking will broaden out. SLMs can be run by a wide variety of vertical businesses and don’t require the full horsepower of LLMs. They could even be implemented in private datacenters and infrastructure. Ethernet is broadly deployed, well understood, and can benefit from the economies of scale of widely available components.

To this end, Ethernet is being adapted to handle AI networking needs for lower latency and lossless communications—in a sense make it more Infiniband-like, while at the same time taking advantage of Ethernet economics. A raft of vendors has teamed up and formed the Ultra Ethernet Consortium (UEC), which is tasked with introducing upgrades to the Ethernet standard to make it suitable for demanding AI environments, whether large or small. Ethernet has already been adapted with Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) to make it more AI-class. And it’s going to further evolve. Most networking vendors support RoCEv2, which adds a variety of enhancements to RoCE, including DCQCN (Data Center Quantized Congestion Notification), a technology that combines PFC (Priority Flow Control) and ECN (Explicit Congestion Notification), as well as smart queuing and buffer management. Some vendors also have added AI and ML to RoCEv2 to improve overall performance.

There’s a lot to be said for open networking as well, whereby customers can opt for building their own networks by mixing and matching vendor NOSs and hardware. A strong portfolio of commercial silicon offerings from chipmakers Broadcom, Marvell, and Intel enables networking experts to use off-the-shelf hardware and pair it with the NOS of their choice, including the open-source SONiC.

Large, established networking vendors such as Arista, Cisco, Broadcom, Juniper, HPE, and Nokia have joined the UEC to pursue these goals. In this group, the forthcoming merger of Juniper and HPE looms large, giving the combined networking company more scale that is expected to make it the number-two by market share to Cisco.

AI networking also offers additional opportunities for startups. These include vendors with technology rooted in SONiC, such as Aviz Networks and Hedgehog, as well as startups focusing on scale-out, disaggregated systems based on their own NOS, such as Arrcus and Israel-based DriveNets, which already makes a hyperscale routing solution for the telecommunications market.

There are yet more vendors to watch in this explosive space. For example, startup Enfabrica offers a compute-to-compute interconnect switch for AI servers that acts as a high-bandwidth “crossbar of NICs,” augmenting compute, network, and memory connections within a cluster. And multicloud networking and Network as a Service (NaaS) vendors such as Alkira, Aryaka, Aviatrix, Itential, and Prosimo are making it easier for organizaions to build secure network connections to shuttle data to and from AI sources.

The AI networking boom will also fuel the optical market, in which high-speed optics are needed to support the boom in bandwidth. Here, optical equipment market leader Ciena’s position in coherent optics provides an opportunity to speed up interconnections in datacenters. Thailand-based Fabrinet has become a darling of AI investors, which sees strong growth in optical components for AI applications, as do rivals Coherent and Lumentum. Shares of optical fiber manufacturer Corning recently popped 10% after an earnings pre-announcement in which it increased its second-quarter sales expectations by around $200 million due in large part to greater-than-expected demand for fiber connections inside of datacenters running AI applications. This is also one of the areas where Cisco is well-positioned, with its own optical components which can be packaged with its Silicon One chip platform.

Put all this together and what you get is a huge and interesting scrum for AI infrastructure networking leadership, which is expected to provide plenty of twists and turns. Networking is cool again. Grab the popcorn!

Share.
Exit mobile version