MMS • Renato Losio
Article originally posted on InfoQ. Visit InfoQ
Azure recently announced the preview of the ND H100 v5, virtual machines that integrate the latest Nvidia H100 Tensor Core GPUs and support Quantum-2 InfiniBand networking. According to Microsoft, the new option will offer AI developers improved performance and scaling across thousands of GPUs.
With generative AI solutions like ChatGPT accelerating demand for cloud services that can handle the large training sets, Matt Vegas, principal product manager at Azure, writes:
For Microsoft and organizations like Inflection, Nvidia, and OpenAI that have committed to large-scale deployments, this offering will enable a new class of large-scale AI models.
Designed for conversational AI projects and ranging in sizes from eight to thousands of Nvidia H100 Tensor Core GPUs, the new servers are powered by 4th Gen Intel Xeon Scalable processors and provide interconnected 400 Gb/s Nvidia Quantum-2 CX7 InfiniBand per GPU. According to the manufacturer of graphics processing units, the new H100 v5 can speed up large language models (LLMs) by 30x over the previous generation Ampere architecture. In a separate article, John Roach, features writer and content strategist, summarises how “Microsoft’s bet on Azure unlocked an AI revolution“:
The system-level optimization includes software that enables effective utilization of the GPUs and networking equipment. Over the past several years, Microsoft has developed software techniques that have grown the ability to train models with tens of trillions of parameters, while simultaneously driving down the resource requirements and time to train and serve them in production.
But the new instances do not target only the growing requirements of Microsoft and other enterprises implementing large-scale AI training deployments. Vegas adds:
We’re now bringing supercomputing capabilities to startups and companies of all sizes, without requiring the capital for massive physical hardware or software investments.
Azure is not the only cloud provider partnering with Nvidia to develop a highly scalable on-demand AI infrastructure. As reported recently on InfoQ, AWS announced its forthcoming EC2 UltraClusters of P5 instances, which can scale in size up to 20000 interconnected H100 GPUs. Nvidia recently announced as wellthe H100 NVL, a memory server card for large language models.
Due to the massive spike in demand for conversational AI, some analysts think there is a huge supply shortage of Nvidia GPUs, suggesting some firms might be turning to AMD GPUs and Cerebras WSE to supplement a shortage of hardware.
The preview (early access) of the Azure ND H100 v5 VMs is only available to approved participants who must submit a request.