NVIDIA announced that it’s acquiring Run:ai, an Israeli startup that built a Kubernetes-based GPU orchestrator. While the price is not disclosed, there are reports that it is valued anywhere between $700 million and $1 billion.
The acquisition of Run:ai highlights Kubernetes’ growing importance in the generative AI era. This makes Kubernetes the de facto standard for managing GPU-based accelerated computing infrastructure.
Run:ai is a Tel Aviv, Israel-based AI infrastructure startup founded in 2018 by Omri Geller (CEO) and Dr. Ronen Dar (CTO). It has created an orchestration and virtualization platform tailored to the specific requirements of AI workloads running on GPUs, which efficiently pools and shares resources. Tiger Global Management and Insight Partners led a $75 million Series C round in March 2022, bringing the company’s total funding to $118 million.
The Problem Run:ai Solves
Unlike CPUs, GPUs cannot be easily virtualized so that multiple workloads can use them at the same time. Hypervisors like VMware’s vSphere and KVM enabled the emulation of multiple virtual CPUs from a single physical processor, giving workloads the illusion that they were running on a dedicated CPU. When it comes to GPUs, they cannot be effectively shared across multiple machine learning tasks, such as training and inference. For example, researchers cannot use half of a GPU for training and experimentation while using the other half for another machine learning task. Similarly, they cannot pool multiple GPUs to make better use of the available resources. This poses a huge challenge to enterprises running GPU-based workloads in the cloud or on-premises.
The problem described above extends to containers and Kubernetes. If a container requires a GPU, it will effectively consume 100% of the GPU if it is not used to its full potential. The shortage of AI chips and GPUs exacerbates the problem.
Run:ai saw an opportunity to effectively solve this problem. They used Kubernetes’ primitives and proven scheduling mechanisms to create a layer that allows enterprises to use only a fraction of the available GPU or pool multiple GPUs. This resulted in better utilization of GPUs, delivering better economics.
Here are 5 key features of Run:ai platform:
- Orchestration and virtualization software layer tailored to AI workloads running on GPUs and other chipsets. This allows efficient pooling and sharing of GPU compute resources.
- Integration with Kubernetes for container orchestration. Run:ai’s platform is built on Kubernetes and supports all Kubernetes variants. It also integrates with third-party AI tools and frameworks.
- Centralized interface for managing shared compute infrastructure. Users can manage clusters, pool GPUs and allocate computing power for various tasks through Run:ai’s interface.
- Dynamic scheduling, GPU pooling and GPU fractioning for maximum efficiency. Run:ai’s software enables splitting GPUs into fractions and allocating them dynamically to optimize utilization.
- Integration with Nvidia’s AI stack includes DGX systems, Base Command, NGC containers and AI Enterprise software. Run:ai has partnered closely with Nvidia to offer a full-stack solution.
Notably, Run:ai is not an open-source solution, even though it is based on Kubernetes. It provides customers with proprietary software that must be deployed in their Kubernetes clusters together with a SaaS-based management application.
Why did NVIDIA acquire Run:ai?
NVIDIA’s acquisition of Run:ai strategically positions the company to strengthen its leadership in the AI and machine learning sectors, especially in the context of optimizing GPU utilization for these technologies. Here are the primary reasons why NVIDIA pursued this acquisition:
Enhanced GPU Orchestration and Management: Run:ai’s advanced orchestration tools are pivotal for managing GPU resources more efficiently. This capability is critical as the demand for AI and machine learning solutions continues to rise, requiring more sophisticated management of hardware resources to ensure optimal performance and utilization.
Integration with NVIDIA’s Existing AI Ecosystem: By acquiring Run:ai, NVIDIA can integrate this technology into its existing suite of AI and machine learning products. This enhances NVIDIA’s overall product offerings, allowing for better service to customers who rely on NVIDIA’s ecosystem for their AI infrastructure needs. NVIDIA HGX, DGX and DGX Cloud customers will gain access to Run:ai’s capabilities for their AI workloads, particularly for generative AI workloads.
Expansion of Market Reach: Run:ai’s established relationships with key players in the AI space, including their prior integration with NVIDIA’s technologies, provide NVIDIA with an expanded market reach and the potential to serve a broader array of customers. This is particularly valuable in sectors that are rapidly adopting AI technologies but face challenges in resource management and scalability.
Innovation and Research Development: The acquisition enables NVIDIA to harness the innovative capabilities of Run:ai’s team, known for their pioneering work in GPU virtualization and management. This could lead to further advancements in GPU technology and orchestration, keeping NVIDIA at the forefront of technological developments in AI.
Competitive Advantage in a Growing Market: As enterprises increase their investment in AI and machine learning, effective GPU management becomes a competitive advantage. NVIDIA’s acquisition of Run:ai ensures it remains competitive against other tech giants venturing into the AI hardware and orchestration space.
By acquiring Run:ai, NVIDIA not only enhances its product capabilities but also solidifies its position as a leader in the AI infrastructure market, ensuring it stays ahead of the curve in technology innovations and market demands.
What does this mean for Kubernetes and Cloud Native ecosystem?
NVIDIA’s acquisition of Run:ai is significant for the Kubernetes and cloud-native ecosystems for several reasons:
Enhanced GPU Orchestration in Kubernetes: The integration of Run:ai’s advanced GPU management and virtualization capabilities into Kubernetes will allow for more dynamic allocation and efficient utilization of GPU resources across AI workloads. This aligns with Kubernetes’ capabilities in handling complex, resource-intensive applications, particularly in AI and machine learning, where efficient resource management is critical.
Advancements in Cloud-Native AI Infrastructure: By leveraging Run:ai’s technology, NVIDIA can further enhance the Kubernetes ecosystem’s ability to support high-performance computing (HPC) and AI workloads. This synergy between NVIDIA’s GPU technology and Kubernetes will likely lead to more robust solutions for deploying, managing and scaling AI applications in cloud-native environments.
Wider Adoption and Innovation: The acquisition could drive broader adoption of Kubernetes in sectors that are increasingly reliant on AI, such as healthcare, automotive and finance. The ability to efficiently manage GPU resources in these sectors can lead to faster innovation and deployment cycles for AI models.
Impact on Kubernetes Maturity: The integration of NVIDIA and Run:ai technologies with Kubernetes underlines the platform’s maturity and readiness to support advanced AI workloads, reinforcing Kubernetes as the de facto system for modern AI and ML deployments. This could also encourage more organizations to adopt Kubernetes for their AI infrastructure needs.
NVIDIA’s move to acquire Run:ai not only strengthens its position in the AI and cloud computing markets but also enhances the Kubernetes ecosystem’s capacity to support the next generation of AI applications, benefiting a wide range of industries.