Tesla Hedges Dojo Supercomputer Bet With 10K Nvidia H100 GPU Cluster

Tesla still dreams of fueling its motors with actual full self-driving (FSD) capabilities, and it's blowing piles of cash on AI infrastructure to reach that milestone.

The American EV manufacturer's latest investment is in a 10,000 GPU compute cluster, revealed in a xeet by Tesla AI Engineer Tim Zaman over the weekend. The system, which came online Monday, will help crunch the data collected by its vehicles and accelerate development of the FSD functionality we've heard so much about. The automaker declined to comment further.

Tesla has been teasing fully autonomous driving capabilities since 2016. So far what's been delivered is essentially super-cruise-control: a driver assistance system that is not truly self-driving and requires a human to keep their hands on the wheel.

CEO Elon Musk has no problem throwing money at his goal of achieving FSD. Last month Tesla revealed it would invest $1 billion to build out its Dojo supercomputer between now and the end of 2024 to speed the development of its autonomous driving software.

That particular AI supercomputer uses the company's massive 15kW Dojo Training tiles, six of which make up a one-exaFLOPS (BF16) Dojo V1 system that we took a look at last year. Each tile is made up of a set of D1 chip dies, all designed by Tesla and fabbed by TSMC.

It's no secret that Tesla still employs thousands of GPUs in its infrastructure. In 2021 the automaker deployed a cluster of 720 GPU nodes each equipped with eight of its then bleeding edge A100 accelerators for a total of 5,760 GPUs. Combined the system offered up to 1.8 exaFLOPS of FP16 performance.

"We'll actually take the hardware as fast as Nvidia will deliver it to us," Musk previously said. "If they could deliver us enough GPUs, we might not need Dojo, but they can't because they've got so many customers."

This latest deployment is nearly twice as large and uses Nvidia's latest generation H100 GPUs, which offer roughly three times the FP16 performance of its predecessor. The chip also added support for FP8 math.

As you drop down the scale, you give up some accuracy in exchange for greater performance. In the case of Nvidia's H100, FP8 nets you just shy of four petaFLOPS of peak performance with scarcity.

Assuming Tesla is using Nvidia's most powerful SXM5 H100 modules, which plug into the accelerator giant's HGX chassis, we're looking at 1,250 nodes each with eight GPUs. Combined we're looking at 39.5 exaFLOPS of FP8 performance.

According to Zaman, the system is supported by a hot tier cache capacity of more than 200 petabytes.

We also know that Tesla isn't just renting a bunch of GPUs from cloud providers like Microsoft or Google. Zaman says the entire system is housed on-prem at Tesla's facilities.

"Many orgs say 'We have' which usually means 'We rented' few actually own, and therefore fully vertically integrate. This bothers me because owning and maintaining is hard. Renting is easy," he wrote.

Tesla may be looking to expand its datacenter footprint to accommodate additional capacity. Earlier this month the carmaker posted a job opening for senior engineering program manager for datacenters, who would "lead the end-to-end design and engineering of Tesla's first of its kind datacenters and will be one of the key members of its engineering team."

While we can only speculate what a first of its kind datacenter might involve, the opening suggests this individual could oversee construction of a new facility. ®

RECENT NEWS

Google Leverages AI To Automatically Lock Phones During Theft

Amid increasing incidents of mobile phone thefts, Google has launched an AI-based feature that automatically locks the s... Read more

Microsofts Emissions Surge Nearly 30% Amid AI Demand Growth

Microsoft has reported a nearly 30% increase in its emissions from 2020 to 2023, underscoring the challenges the tech gi... Read more

Impact Of AWS Leadership Change On The Global AI Race

The recent leadership transition at Amazon Web Services (AWS), with Adam Selipsky stepping down and Matt Garman taking t... Read more

The Global Impact Of App Stores On Technology And Economy

Since Apple launched its App Store in 2008, app stores have become a central feature of the digital landscape, reshaping... Read more

Alibaba's Cloud Investment Strategy: Fuelling AI Innovation And Growth

Alibaba Group's cloud business, Alibaba Cloud, has emerged as a powerhouse in the tech industry, spearheading innovation... Read more

Elon Musk Takes On Government 'Censorship': A Clash Of Titans In The Digital Arena

Elon Musk's recent endeavors to challenge government-led content takedowns mark a significant development in the ongoing... Read more