Nvidia Is Building the ‘World’s Fastest AI Supercomputer’ for Meta

Nvidia Is Building the ‘World’s Fastest AI Supercomputer’ for Meta
Image: Nvidia

No matter how you feel about the company formerly known as Facebook, you know they have an absolute tonne of data and I guess it needs to be managed somehow. Enter Nvidia to save the day, working with Meta to deliver a massive beast of a supercomputer for the social media empire’s researchers.

The AI Research SuperCluster (RSC) is already training new models to advance artificial intelligence. Once fully deployed later this year, Meta’s RSC is expected to be the largest customer installation of Nvidia DGX A100 systems.

According to Nvidia, when RSC is fully built out, Meta will be using it to train AI models with more than a trillion parameters.

That could advance fields such as natural-language processing for jobs like identifying harmful content in real time.

“Developing the next generation of advanced AI will require powerful new computers capable of quintillions of operations per second,” Meta says.

 

RSC took just 18 months to go from an idea on paper to a working AI supercomputer, which is pretty impressive and shouldn’t be slept on. Meta says the RSC is among the fastest AI supercomputers running today and “will be the fastest AI supercomputer in the world when it’s fully built out in mid-2022”.

“RSC will help Meta’s AI researchers build new and better AI models that can learn from trillions of examples; work across hundreds of different languages; seamlessly analyse text, images and video together; develop new augmented reality tools; and much more,” Meta explains.

“Our researchers will be able to train the largest models needed to develop advanced AI for computer vision, NLP, speech recognition and more.

“We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they can seamlessly collaborate on a research project or play an AR game together.”

Meta goes on to say that ultimately, the work done with RSC will pave the way toward building technologies for “the next major computing platform”. Yep. The metaverse (sorry for swearing), where AI-driven applications and products will “play an important role”.

What’s the Meta supercomputer packing?

I’m so glad you asked.

nvidia meta supercomputer
Image: Meta

The new AI supercomputer currently uses 760 Nvidia DGX A100 systems as its compute nodes. They pack a total of 6,080 Nvidia A100 GPUs linked on an Nvidia Quantum 200Gb/s InfiniBand network to deliver 1,895 petaflops of TF32 performance.

In addition to the 760 DGX A100 systems and InfiniBand networking, Penguin Computing (Nvidia’s delivery partner) provided Meta with managed services and AI-optimised infrastructure comprised of 46 petabytes of cache storage with its Altus systems.

When RSC is complete, the InfiniBand network fabric will connect 16,000 GPUs as endpoints, making it one of the largest such networks deployed to date. Additionally, Meta said it designed a caching and storage system that can serve 16 TB/s of training data, and it plans to scale it up to 1 exabyte.

Meta goes on in its blog to explain how it’s safeguarding the RSC, the TL;DR is that it will use this Nvidia supercomputer beast for good, it promises.