CES 2026: NVIDIA introduces Rubin AI platform, Microsoft ready to deploy

After the massive success of the Blackwell platform, at CES 2026, NVIDIA today announced Rubin, the next-generation platform for AI supercomputers. The Rubin platform comprises the following six chips, all co-designed for maximum training performance and inference efficiency:

NVIDIA Vera CPU
NVIDIA Rubin GPU
NVIDIA NVLink 6 Switch
NVIDIA ConnectX-9 SuperNIC
NVIDIA BlueField-4 DPU
NVIDIA Spectrum-6 Ethernet Switch

For hyperscalers, this new Rubin platform features the NVIDIA Vera Rubin NVL72 rack-scale solution and the NVIDIA HGX Rubin NVL8 system.

NVIDIA claims that the new Rubin platform will deliver AI inference with up to a 10x lower cost per token compared to the NVIDIA Blackwell platform. Also, MoE models can be trained with 4x fewer GPUs compared to Blackwell. These massive improvements were made possible because of the latest generation of NVIDIA NVLink interconnect technology, Transformer Engine, Confidential Computing and RAS Engine, and the NVIDIA Vera CPU.

NVIDIA Rubin is already in full production, and all major AI companies and hyperscalers (AWS, Google Cloud, Microsoft, and OCI) will be adopting Rubin platform-based solutions in the second half of 2026.

Microsoft has confirmed that it will deploy NVIDIA Vera Rubin NVL72 rack-scale systems as part of its next-generation AI data centers. In fact, Microsoft has published a long blog post highlighting that it is already ready for deploying this new Rubin platform when it becomes available later this year.

Since the new NVIDIA Vera Rubin chips will deliver 50 PF NVFP4 inference performance per chip and 3.6 EF NVFP4 per rack, nearly a five times increase over NVIDIA GB200 NVL72 rack systems, Microsoft has already made some core architectural assumptions to adopt Rubin in the future:

NVIDIA NVLink evolution: Azure’s rack architecture has already been redesigned to operate well with the sixth-generation NVIDIA NVLink fabric in Vera Rubin NVL72 systems to reach ~260 TB/s of scale-up bandwidth.
HBM4/HBM4e thermal and density planning: Azure’s cooling, power envelopes, and rack geometries have already been upgraded to handle the tighter thermal windows and higher rack densities of the Rubin memory stack.
SOCAMM2 driven memory expansion: Azure’s platform has already integrated and validated for memory extension behaviors of Rubin Superchips.
Reticle sized GPU scaling and multi-die packaging: Azure’s supply chain, mechanical design, and orchestration layers have been pre-tuned for the massively larger GPU footprints and multi-die layouts of Rubin products.
High-performance scale-out networking: Azure’s network infrastructure will support Rubin"s NVIDIA ConnectX-9 1,600 Gb/s networking to support large-scale AI workloads.

Microsoft highlighted that years of co-design with NVIDIA across interconnects, memory systems, thermals, packaging, and rack-scale architecture have allowed them to integrate Rubin directly into Azure’s infrastructure without much rework.

Tags