Azure-based 'AI Supercomputer' To Use Nvidia GPUs And Software

Microsoft and Nvidia say they are teaming up to build an "AI supercomputer" using Azure infrastructure combined with Nvidia's GPU accelerators, network kit, and its software stack.

The target market will be enterprises looking to train and deploy large state-of-the-art AI models at scale.

According to Nvidia, the project will be a multi-year effort that will aim to deliver "one of the most powerful AI supercomputers in the world," and the move will also make Azure the first public cloud to incorporate Nvidia's AI software stack.

To build this system, the pair will use the Azure cloud platform's ND-series and NC-series virtual machines, which are GPU-based instances, but the project will involve bringing "tens of thousands" of Nvidia's A100 and the latest H100 GPUs to the platform.

Nvidia claims that Microsoft's Azure is the first public cloud to incorporate its Quantum-2 InfiniBand networking switches for its AI-optimized virtual machines. The current Azure instances feature 200Gb/s Quantum InfiniBand and A100 GPUs, but future ones will have 400Gbps Quantum-2 InfiniBand and the newer H100 GPUs.

The two companies offered no time frame for this project to become operational, but it appears that this so-called AI supercomputer may actually not exist as such.

Reading between the lines, it appears that Microsoft is simply adding to Azure a bunch of AI-optimized instances with Nvidia's latest hardware and software, which customers will string together as required for their specific project.

We asked Nvidia for clarification, and a spokesperson told us that: "All new capabilities are in Azure instances, but the setup is such that enterprises will be able to scale those instances all the way up to supercomputing status." So that's an AI cloud service, in that case.

Nvidia's spokesperson added: "Customers can acquire resources as they normally would with a real supercomputer. For both you have a software layer that reserves resources. Just now it is in the cloud and not on a dedicated supercomputer. The most important thing is scalability. The resource on Azure can be scaled up to supercomputer standards with the same AI software, network capability and compute nodes."

The partnership will also see Nvidia make use of Azure resources to conduct research into generative AI models, which the company describes as a "rapidly emerging area" of AI in which foundational models such as Megatron Turing NLG 530B are the basis for new algorithms capable of synthesizing text, computer code, images, and even video.

The model in question was developed by a team from Microsoft and Nvidia, using Nvidia's Megatron-LM transformer model and Microsoft's DeepSpeed deep learning optimization library, the latter of which the pair will also work on optimizing.

Scott Guthrie, Microsoft EVP for its Cloud + AI Group, said that AI will fuel the next wave of automation in enterprises, enabling organizations to do more with less.

"Our collaboration with Nvidia unlocks the world's most scalable supercomputer platform, which delivers state-of-the-art AI capabilities for every enterprise on Microsoft Azure."

Nvidia's VP of enterprise computing, Manuvir Das, said the partnership aimed to provide researchers and companies with infrastructure and software to exploit AI.

"The breakthrough of foundation models has triggered a tidal wave of research, fostered new startups and enabled new enterprise applications," he said.

To this end, the AI supercomputer/cloud service will support a broad range of applications and services, including Microsoft DeepSpeed and Nvidia's AI Enterprise software suite. The latter is already certified and supported on Azure instances with A100 GPUs, while support for Azure instances with the newer H100 GPUs will be added in a future software release, Nvidia said. ®

RECENT NEWS

Google Leverages AI To Automatically Lock Phones During Theft

Amid increasing incidents of mobile phone thefts, Google has launched an AI-based feature that automatically locks the s... Read more

Microsofts Emissions Surge Nearly 30% Amid AI Demand Growth

Microsoft has reported a nearly 30% increase in its emissions from 2020 to 2023, underscoring the challenges the tech gi... Read more

Impact Of AWS Leadership Change On The Global AI Race

The recent leadership transition at Amazon Web Services (AWS), with Adam Selipsky stepping down and Matt Garman taking t... Read more

The Global Impact Of App Stores On Technology And Economy

Since Apple launched its App Store in 2008, app stores have become a central feature of the digital landscape, reshaping... Read more

Alibaba's Cloud Investment Strategy: Fuelling AI Innovation And Growth

Alibaba Group's cloud business, Alibaba Cloud, has emerged as a powerhouse in the tech industry, spearheading innovation... Read more

Elon Musk Takes On Government 'Censorship': A Clash Of Titans In The Digital Arena

Elon Musk's recent endeavors to challenge government-led content takedowns mark a significant development in the ongoing... Read more