Swiss Boffins Just Trained A 'fully Open' LLM On The Alps Supercomputer
Supercomputers are usually associated with scientific exploration, research, and development, and ensuring our nuclear stockpiles actually work.
Typically, these workloads rely on highly precise calculations, with 64-bit floating point mathematics being the gold standard. But as support for lower-precision datatypes continues to find its way into the chips used to build these systems, supercomputers are increasingly being used to train AI models.
This is exactly what the boffins at ETH Zürich and the Swiss Federal Technology Institute in Lausanne, Switzerland, have done. At the International Open-Source LLM Builders Summit in Geneva this week, researchers teased a pair of open large language models (LLMs) trained using the nation's Alps supercomputer.
As supercomputers go, Alps is better suited than most for running AI workloads alongside more traditional high-performance computing (HPC) applications. The system is currently the third-most powerful supercomputer in Europe, and eighth worldwide in the bi-annual Top500 ranking. It's also among the first large-scale supercomputers based around Nvidia's Grace-Hopper Superchips.
Each of these GH200 Superchips features a custom Grace CPU powered by 72 Arm Neoverse V2 cores, connected via a 900GB/s NVLink-C2C fabric to a 96GB H100 GPU. Those GPUs account for the lion's share of Alps' total compute capacity, with up to 34 teraFLOPS of FP64 vector performance. However, if you're willing to turn down the resolution a bit to, say, FP8, the performance jumps to nearly four petaFLOPS of sparse compute.
Built by HPE's Cray division, Alps features a little over 10,000 of these chips across 2688 compute blades, which have been stitched together using the OEM's custom Slingshot-11 interconnects. Combined, the system boasts 42 exaFLOPS of sparse FP8 performance or roughly half when using the more precise BF16 data type.
While Nvidia's H100 accelerators have been widely employed for AI training for years now, the overwhelming majority of these Hopper clusters have employed Nvidia's 8-GPU HGX form factor rather than its Superchips.
With that said, Alps isn't the only supercomputer to use them. The Jupiter supercomputer in Germany and the UK's Isambard AI, both of which came online this spring, also use Nvidia's GH200 Superchips.
"Training this model is only possible because of our strategic investment in 'Alps', a supercomputer purpose-built for AI," Thomas Schulthess, Director of Swiss National Supercomputing Centre (CSCS) and professor at ETH Zurich, said in a blog post.
The researchers have yet to name the models, but we do know they'll be offered in both eight-billion and 70-billion parameter sizes, and have been trained on 15 trillion tokens of data. They're also expected to be fluent in more than 1,000 languages, with roughly 40 percent of the training data being in languages other than English.
More importantly, the researchers say, the models will be fully open. Instead of releasing simply the models and weights for the public to scrutinize and tweak, as we've seen with models from Microsoft, Google, Meta, and others, researchers at ETH Zürich also intend to release the source code used to train the model and claim that the "training data will be transparent and reproducible."
"By embracing full openness — unlike commercial models that are developed behind closed doors — we hope that our approach will drive innovation in Switzerland, across Europe, and through multinational collaborations," EPFL professor Martin Jaggi said in the post.
- Europe's exascale dreams inch closer as SiPearl finally tapes out Rhea1 chip
- Amazon built a massive AI supercluster for Anthropic called Project Rainier – here's what we know so far
- Jupiter, Europe's most powerful super, takes maiden flight - but misses exascale orbit
- UK's Isambard-AI super powers up as government goes AI crazy
According to Imanol Schlag, a research scientist at the ETH AI Center, this transparency is essential to building high-trust applications and advancing research in AI risks and opportunities."
What's more, researchers contend that for most tasks and general knowledge questions, circumventing web crawling protections wasn't necessary, and complying with these opt-outs showed no sign of performance degradation.
The LLMs are expected to make their way into public hands later this summer under a highly permissive Apache 2.0 license. ®
From Chip War To Cloud War: The Next Frontier In Global Tech Competition
The global chip war, characterized by intense competition among nations and corporations for supremacy in semiconductor ... Read more
The High Stakes Of Tech Regulation: Security Risks And Market Dynamics
The influence of tech giants in the global economy continues to grow, raising crucial questions about how to balance sec... Read more
The Tyranny Of Instagram Interiors: Why It's Time To Break Free From Algorithm-Driven Aesthetics
Instagram has become a dominant force in shaping interior design trends, offering a seemingly endless stream of inspirat... Read more
The Data Crunch In AI: Strategies For Sustainability
Exploring solutions to the imminent exhaustion of internet data for AI training.As the artificial intelligence (AI) indu... Read more
Google Abandons Four-Year Effort To Remove Cookies From Chrome Browser
After four years of dedicated effort, Google has decided to abandon its plan to remove third-party cookies from its Chro... Read more
LinkedIn Embraces AI And Gamification To Drive User Engagement And Revenue
In an effort to tackle slowing revenue growth and enhance user engagement, LinkedIn is turning to artificial intelligenc... Read more