Huang said it can make supercomputing tasks — which are vital in the fight against COVID-19 — much more cost-efficient and powerful than today’s more expensive systems.
The chip has a monstrous 54 billion transistors (the on-off switches that are the building blocks of all things electronic), and it can execute 5 petaflops of performance, or about 20 times more than the previous-generation chip Volta. Huang made the announcement during his keynote at the Nvidia GTC event, which was digital this year.
The launch was originally scheduled for March 24 but was delayed by the pandemic. Nvidia rescheduled the release for today, as the chips and the DGX A100 systems that used the chips are now available and shipping.
Register today and save 30% off digital access passes.
The Nvidia A100 chip uses the same Ampere architecture (named after French mathematician and physicist André-Marie Ampère) that could be used in consumer applications such as Nvidia’s GeForce graphics chips. In contrast to Advanced Micro Devices (AMD), Nvidia is focused on creating a single microarchitecture for its GPUs for both commercial AI and consumer graphics use. But Huang said mixing and matching the different elements on the chip will determine whether it is used for AI or graphics.
The DGX A100 is the third generation of Nvidia’s AI DGX platform, and Huang said it essentially puts the capabilities of an entire datacenter into a single rack. That is hyperbole, but Paresh Kharya, director of product management datacenter and cloud platforms, said in a press briefing that the 7-nanometer chip, codenamed Ampere, can take the place of a lot of AI systems being used today.
“You get all of the overhead of additional memory, CPUs, and power supplies of 56 servers … collapsed into one,” Huang said. “The economic value proposition is really off the charts, and that’s the thing that is really exciting.”
For instance, to handle AI training tasks today, one customer needs 600 central processing unit (CPU) systems to handle millions of queries for datacenter applications. That costs $11 million, and it would require 25 racks of servers and 630 kilowatts of power. With Ampere, Nvidia can do the same amount of processing for $1 million, a single server rack, and 28 kilowatts of power.
“That’s why you hear Jensen say, ‘The more you buy, the more you save,’” Kharya said.
Huang added, “It’s going to replace a whole bunch of inference servers. The throughput of training and inference is off the charts — 20 times is off the charts.”
The first order
The first order for the chips is going to the U.S. Department of Energy’s (DOE) Argonne National Laboratory, which will use the cluster’s AI and computing power to better understand and fight COVID-19.
DGX A100 systems use eight of the new Nvidia A100 Tensor Core GPUs, providing 320 gigabytes (GBs) of memory for training the largest AI data sets, and the latest high-speed Nvidia Mellanox HDR 200Gbps interconnects.
Multiple smaller workloads can be accelerated by partitioning the DGX A100 into as many as 56 instances per system, using the A100 multi-instance GPU feature. Combining these capabilities enables enterprises to optimize computing power and resources on demand to accelerate diverse workloads — including data analytics, training, and inference — on a single fully integrated, software-defined platform.
Immediate DGX A100 adoption and support
Nvidia said a number of the world’s largest companies, service providers, and government agencies have placed initial orders for the DGX A100, with the first systems delivered to Argonne earlier this month.
Rick Stevens, associate laboratory director for Computing, Environment, and Life Sciences at Argonne National Lab, said in a statement that the center’s supercomputers are being used to fight the coronavirus, with AI models and simulations running on the machines in hopes of finding treatments and a vaccine. The DGX A100 systems’ power will enable scientists to do a year’s worth of work in months or days.
The University of Florida will be the first U.S. institution of higher learning to receive DGX A100 systems, which it will deploy to infuse AI across its entire curriculum to foster an AI-enabled workforce.
Among other early adopters are the Center for Biomedical AI at the University Medical Center Hamburg-Eppendorf, Germany, which will leverage DGX A100 to advance clinical decision support and process optimization.
Thousands of previous-generation DGX systems are currently being used around the globe by a wide range of public and private organizations. Among these users are some of the world’s leading businesses, including automakers, health care providers, retailers, financial institutions, and logistics companies that are adopting AI across their industries.
Nvidia also revealed its next-generation DGX SuperPod, a cluster of 140 DGX A100 systems capable of achieving 700 petaflops of AI computing power. Combining 140 DGX A100 systems with Nvidia Mellanox HDR 200Gbps InfiniBand interconnects, the company built its own next-generation DGX SuperPod AI supercomputer for internal research in areas such as conversational AI, genomics, and autonomous driving.
It took only three weeks to build that SuperPod, Kharya said, and the cluster is one of the world’s fastest AI supercomputers — achieving a level of performance that previously required thousands of servers.
To help customers build their own A100-powered datacenters, Nvidia has released a new DGX SuperPod reference architecture. This gives customers a blueprint that follows the same design principles and best practices Nvidia used.
DGXpert program, DGX-ready software
Nvidia also launched the Nvidia DGXpert program, which brings DGX customers together with the company’s AI experts, and the Nvidia DGX-ready software program, which helps customers take advantage of certified, enterprise-grade software for AI workflows.
The company said that each DGX A100 system has eight Nvidia A100 Tensor Core graphics processing units (GPUs), delivering 5 petaflops of AI power, with 320GB in total GPU memory and 12.4TB per second in bandwidth.
The systems also have six Nvidia NVSwitch interconnect fabrics with third-generation Nvidia NVLink technology for 4.8 terabytes per second of bi-directional bandwidth. And they have nine Nvidia Mellanox ConnectX-6 HDR 200Gb per second network interfaces, offering a total of 3.6 terabits per second of bi-directional bandwidth.
The chips are made by TSMC in a 7-nanometer process. Nvidia DGX A100 systems start at $199,000 and are shipping now through Nvidia Partner Network resellers worldwide.
Huang said the DGX A100 uses the HGX motherboard, which weighs about 50 pounds and is “the most complex motherboard in the world.” (This is the board he pulled out of his home oven in a teaser video). It has 30,000 components and a kilometer of wire traces.
As for a consumer graphics chip, Nvidia would configure an Ampere-based chip in a very different way. The A100 uses high-bandwidth memory for datacenter applications, but that wouldn’t be used in consumer graphics. The cores would also be heavily biased for graphics instead of the double-precision floating point calculations datacenters need, he said.
“We’ll bias it differently, but every single workload runs on every single GPU,” Huang said.