Amazon unveils the next generation of EC2 instances rocking the latest Nvidia A100 GPUs

Today, Amazon unveiled its next generation of Amazon Elastic Compute Cloud (Amazon EC2) GPU-powered instances. Dubbed P4d, each EC2 instance will house eight of Nvidia's latest A100 Tensor Core GPUs rocking the Ampere architecture, which will be delivering 2.5 petaflops of mixed-precision performance and 320 GB of high bandwidth GPU memory in a single machine. To complement this, the new P4d instances will also feature 96 Intel Xeon Scalable (Cascade Lake) vCPUs, a total of 1.1 TB of system memory, and 8 TB of local, fast NVMe storage.

In addition to the raw compute power provided by the CPU and GPU, at 400 Gbps, the total network bandwidth will be 16x more than the last generation of P3 instances, which housed Nvidia V100 GPUs. And taken together, Amazon claims that the new P4d EC2 instances more than double the performance of the last generation.

P4d instances deliver up to 60% lower cost to train and over 2.5x better deep learning performance with 2.5x the memory, twice the double precision floating point performance, 16x network bandwidth, and 4x local NVMe-based SSD storage compared to previous generation P3 instances.

By combining non-blocking petabit-scale networking infrastructure integrated with Amazon FSx for Lustre high-performance storage, AWS's Elastic Fabric Adapter (EFA) and NVIDIA GPUDirect RDMA (remote direct memory access), Amazon is also providing the ability to create P4d instances with EC2 UltraClusters. Tailored to use cases that require the maximum compute power, these EC2 UltraClusters, can scale to over 4,000 A100 GPUs, twice that of any other cloud provider's offering.

Jargon aside, Amazon's new P4d instances will allow you to train larger and more complex machine learning models that are becoming increasingly ubiquitous today in deep learning. Inference will be sped up significantly as well. Both perks should lower initial and running costs for your use cases.

As far as pricing is concerned, AWS is only offering one configuration for the P4d instances, for now at least. The p4d.24xlarge, with eight Nvidia A100 GPUs, 96 vCPUs, 400 Gbps of network bandwidth, 8TB worth of NVMe SSDs, 19 Gbps of EBS Bandwidth, and 600 GB/s NVSwitch, will set you back $32.77 per hour. Reserving an instance for one year or three years will bring that hourly cost down to $19.22 and $11.57, respectively. Further details can be found here.