AWS and NVIDIA to Collaborate on Next-Gen EC2 P5 Instances for Accelerating Generative AI

MMS Founder
MMS Daniel Dominguez

Article originally posted on InfoQ. Visit InfoQ

AWS and NVIDIA announced the development of a highly scalable, on-demand AI infrastructure that is specifically designed for training large language models and creating advanced generative AI applications. The collaboration aims to create the most optimized and efficient system of its kind, capable of meeting the demands of increasingly complex AI tasks.

The collaboration will make use of the most recent Amazon Elastic Compute Cloud (Amazon EC2) P5 instances powered by NVIDIA H100 Tensor Core GPUs as well as AWS’s cutting-edge networking and scalability, which will provide up to 20 exaFLOPS of compute performance for creating and training the largest deep learning models.

Accelerated computing and AI have arrived, and just in time. Accelerated computing provides step-function speed-ups while driving down cost and power as enterprises strive to do more with less. Generative AI has awakened companies to reimagine their products and business models and to be the disruptor and not the disrupted, said Jensen Huang, founder and CEO of NVIDIA.

Customers can scale up to 20,000 H100 GPUs in EC2 UltraClusters for on-demand access to supercomputer-class performance for AI thanks to P5 instances, which will be the first GPU-based instance to benefit from AWS’s second-generation Elastic Fabric Adapter (EFA) networking, which offers 3,200 Gbps of low-latency, high bandwidth networking throughput.

AWS and NVIDIA have collaborated for over a decade to create AI and HPC infrastructure, resulting in the development of P2, P3, P3dn, and P4d(e) instances. The latest P5 instances are the fifth generation of NVIDIA GPU-powered AWS offerings and are optimized for training complex LLMs and computer vision models for demanding generative AI applications such as speech recognition, code generation, and video/image generation.

Amazon EC2 P5 instances are deployed in powerful hyperscale clusters called EC2 UltraClusters, which consist of top-performing compute, networking, and storage resources. These clusters are among the most powerful supercomputers globally and enable customers to execute complex multi-node machine learning training and distributed HPC workloads. With petabit-scale non-blocking networking powered by AWS EFA, customers can run high-level inter-node communication applications at scale on AWS. EFA’s custom OS and integration with NVIDIA GPUDirect RDMA boost the performance of inter-instance communications, reducing latency and increasing bandwidth utilization. This is essential for scaling deep learning model training across hundreds of P5 nodes.

In addition, both companies have begun collaborating on future server designs to increase the scaling efficiency with subsequent-generation system designs, cooling technologies, and network scalability.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.