Close Menu
Alpha Leaders
  • Home
  • News
  • Leadership
  • Entrepreneurs
  • Business
  • Living
  • Innovation
  • More
    • Money & Finance
    • Web Stories
    • Global
    • Press Release
What's On
Stranded on a Denver tarmac, Booking.com’s CEO envisions AI that should have rerouted him to Aspen

Stranded on a Denver tarmac, Booking.com’s CEO envisions AI that should have rerouted him to Aspen

11 June 2026
Why Digital Transformation Initiatives Keep Failing

Why Digital Transformation Initiatives Keep Failing

11 June 2026
Brazil’s biggest soccer broadcaster Is now a guy who started on Twitch. He beat Globo

Brazil’s biggest soccer broadcaster Is now a guy who started on Twitch. He beat Globo

11 June 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Alpha Leaders
newsletter
  • Home
  • News
  • Leadership
  • Entrepreneurs
  • Business
  • Living
  • Innovation
  • More
    • Money & Finance
    • Web Stories
    • Global
    • Press Release
Alpha Leaders
Home » Nvidia Announces Rubin CPX GPU To Speed Long-Context AI
Innovation

Nvidia Announces Rubin CPX GPU To Speed Long-Context AI

Press RoomBy Press Room9 September 20254 Mins Read
Facebook Twitter Copy Link Pinterest LinkedIn Tumblr Email WhatsApp
Nvidia Announces Rubin CPX GPU To Speed Long-Context AI

In an industry-first, Nvidia has announced a new GPU, the Rubin CPX, to offload the compute-intensive “context processing” off another GPU. Yep, now, for some AI, you will need two GPUs to achieve maximize performance and profit. I would be surprised if the competition doesn’t follow suit; the benefits are tremendous. (Nvidia, like many other semiconductor firms, is a client of my company, Cambrian-AI Research.)

Rubin CPX is designed to handle very long input to LLMs, over 1 million tokens. Not many applications needs such a long context to be encoded for AI processing. But those that do desperately need a better hardware platform that can handle the job; encoding is an extremely compute-intensive process. Modern GPUs are designed for the memory- and network-bound generation phase of LLMs, with expensive HBM memory that isn’t needed for decoding. As Nvidia has been explaining the different needs of these two phases over the last couple years, and highlighted the benefits of disaggregating inference to different GPUs in their MLPerf announcement, many of us began wondering when someone would build a solution tailored for the pre-fill job. Nvidia did just that with CPX, but you may have to wait a year to get it.

The Rubin-CPX Processor for Long Context AI

Nvidia estimates that some 20% of AI applications are waiting for the emergence of the first token (Time to First Token, or TTFT) while the GPUs crunch on the decoding work. That can take perhaps five to 10 minutes for 100,000 lines of code. For multi-frame, multi-second videos, pre-processing and per-frame embedding increases latency rapidly; 10–20 seconds or longer is common, varying with video length and LLM capabilities. That is why video LLMs typically are only used today to create short clips.

And as the chart above contends, an AI Factory’s profit increases with performance. Even if the competition were to give away their GPUs for free, today’s GB200 NVL72 can increased token profit by near four fold over the free competition. And one should assume an even better ROI with Blackwell Ultra and Rubin next year. Of course, it will be even better when you add the new CPX to a rack of Rubin GPUs.

If you use the Blackwell GPUs in today’s rack more intelligently, dividing the context and generation across different GPUs, you can increase the performance by three fold with the same cost and energy profile. Now, if you add a GPU that is optimized for long-context decoding, lowering cost by using lex expensive memory, and increasing the attention acceleration by another 3X, the total inference performance can increase by another factor of three.

Nvidia plans to make the Rubin CPX available in two forms. For new installations requiring long-context AI, the Vera Rubin NVL144 CPX adds the CPX chips onto the compute tray housing the Vera CPU and the Rubin GPU, tripling performance of next year’s Vera Rubin.

But hey! You just payed $3M for a shiny new NVL144! No worries. Nvidia will sell you a separate rack full of the right amount of CPX nodes to attach to your Rubin rack. This will increase performance of the Vera Rubin rack from 5 Exaflops to 8 EF, and supports up to 150TB of fast GDDR7 memory.

Nvidia presented the slide below to show the performance improvement of Rubin CPX handling large context windows as up to 6.5X over the GB300.

Here’s Nvidia’s updated roadmap through Feynman in 2028. While Nvidia did not announce that the Rubin CPX would give rise to a Rubin Ultra CPX, it can probably be assumed. Nvidia announces products over a year out these days, as data center operators need to plan for future upgrades and expansions. For example, now planners can make room for a CPX rack next to the Rubin racks installed before CPX availability.

What’s next?

This announcement represents a major milestone in the software and hardware needed to efficiently process inference queries, disaggregating inference processing into two workloads with a GPU tailored for each in the case of long-context windows greater than one million tokens. Others like Google and AMD will certainly evaluate the methods used here, and decide if their customers would benefit.

Disclosures: This article expresses the opinions of the author and is not to be taken as advice to purchase from or invest in the companies mentioned. My firm, Cambrian-AI Research, is fortunate to have many semiconductor firms as our clients, including Baya Systems BrainChip, Cadence, Cerebras Systems, D-Matrix, Esperanto, Flex, Groq, IBM, Intel, Micron, NVIDIA, Qualcomm, Graphcore, SImA.ai, Synopsys, Tenstorrent, Ventana Microsystems, and scores of investors. I have no investment positions in any of the companies mentioned in this article. For more information, please visit our website at https://cambrian-AI.com.

 

CPX GPU Inference MLPerf
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link

Related Articles

Why Digital Transformation Initiatives Keep Failing

Why Digital Transformation Initiatives Keep Failing

11 June 2026
Screwworm Can Infect People, Pets And Livestock—What To Watch For

Screwworm Can Infect People, Pets And Livestock—What To Watch For

11 June 2026
You Have Eyes Everywhere, But You’re Still Flying Blind

You Have Eyes Everywhere, But You’re Still Flying Blind

11 June 2026
Why Time-To-First-Token Is The Key To Speed And Safety In Physical AI

Why Time-To-First-Token Is The Key To Speed And Safety In Physical AI

11 June 2026
Why The Path To RCS Still Runs Through SMS

Why The Path To RCS Still Runs Through SMS

11 June 2026
Audio-Technica Reveals Limited-Edition Headphones With Sunburst Finish

Audio-Technica Reveals Limited-Edition Headphones With Sunburst Finish

11 June 2026
Don't Miss
Unwrap Christmas Sustainably: How To Handle Gifts You Don’t Want

Unwrap Christmas Sustainably: How To Handle Gifts You Don’t Want

By Press Room27 December 2024

Every year, millions of people unwrap Christmas gifts that they do not love, need, or…

Exclusive: DeFi platform Azura launches after raising .9 million from Initialized

Exclusive: DeFi platform Azura launches after raising $6.9 million from Initialized

22 October 2024
Sam Altman’s World Wants To Scan Your Eyes To Prove You’re Human

Sam Altman’s World Wants To Scan Your Eyes To Prove You’re Human

22 October 2024
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Latest Articles
Chevron’s CFO on why finance chiefs are defining AI’s business value

Chevron’s CFO on why finance chiefs are defining AI’s business value

11 June 20261 Views
You Have Eyes Everywhere, But You’re Still Flying Blind

You Have Eyes Everywhere, But You’re Still Flying Blind

11 June 20261 Views
SpaceX IPO: Wall Street analysts say the stock is worth only half of Elon Musk’s price

SpaceX IPO: Wall Street analysts say the stock is worth only half of Elon Musk’s price

11 June 20261 Views
Why Time-To-First-Token Is The Key To Speed And Safety In Physical AI

Why Time-To-First-Token Is The Key To Speed And Safety In Physical AI

11 June 20266 Views

Recent Posts

  • Stranded on a Denver tarmac, Booking.com’s CEO envisions AI that should have rerouted him to Aspen
  • Why Digital Transformation Initiatives Keep Failing
  • Brazil’s biggest soccer broadcaster Is now a guy who started on Twitch. He beat Globo
  • Screwworm Can Infect People, Pets And Livestock—What To Watch For
  • Chevron’s CFO on why finance chiefs are defining AI’s business value

Recent Comments

No comments to show.
About Us
About Us

Alpha Leaders is your one-stop website for the latest Entrepreneurs and Leaders news and updates, follow us now to get the news that matters to you.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks
Stranded on a Denver tarmac, Booking.com’s CEO envisions AI that should have rerouted him to Aspen

Stranded on a Denver tarmac, Booking.com’s CEO envisions AI that should have rerouted him to Aspen

11 June 2026
Why Digital Transformation Initiatives Keep Failing

Why Digital Transformation Initiatives Keep Failing

11 June 2026
Brazil’s biggest soccer broadcaster Is now a guy who started on Twitch. He beat Globo

Brazil’s biggest soccer broadcaster Is now a guy who started on Twitch. He beat Globo

11 June 2026
Most Popular
Screwworm Can Infect People, Pets And Livestock—What To Watch For

Screwworm Can Infect People, Pets And Livestock—What To Watch For

11 June 20261 Views
Chevron’s CFO on why finance chiefs are defining AI’s business value

Chevron’s CFO on why finance chiefs are defining AI’s business value

11 June 20261 Views
You Have Eyes Everywhere, But You’re Still Flying Blind

You Have Eyes Everywhere, But You’re Still Flying Blind

11 June 20261 Views

Archives

  • June 2026
  • May 2026
  • April 2026
  • March 2026
  • February 2026
  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • March 2022
  • January 2021
  • March 2020
  • January 2020

Categories

  • Blog
  • Business
  • Entrepreneurs
  • Global
  • Innovation
  • Leadership
  • Living
  • Money & Finance
  • News
  • Press Release
© 2026 Alpha Leaders. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.