Nvidia Sets New Conversational AI Records, Shares Code To Help Others
'We're trying to find ways of computers to actually understand the context of language and generate language to respond. The whole world is really racing to create conversational AI,' an Nvidia exec says of the chipmaker's new AI advances.
Nvidia has new reason to boast about its artificial intelligence prowess thanks to three major breakthroughs in the realm of conversational AI — and it wants to help other developers along the way.
The Santa Clara, Calif.-based chipmaker announced the three breakthroughs on Tuesday, saying it has set new records in training, inference and the size of the training model for BERT, an advanced AI language model developed by Google that is dominant the conversational AI space. Nvidia is also sharing the software it used to achieve these breakthroughs, which includes PyTorch code for training massive Transformer models as well as NGC model scripts and check-points for TensorFlow.
[Related: Intel, AMD Lock Horns Over High-Performance Computing Prowess]
"We're trying to find ways of computers to actually understand the context of language and generate language to respond," said Bryan Catanzaro, Nvidia's vice president of applied deep learning research. "The whole world is really racing to create conversational AI."
For training, Nvidia said it has reduced the typical training time for the large version of BERT, also known as Bidirectional Encoder Representations from Transformers, from several days to just 53 minutes, thanks to an Nvidia DGX SuperPod cluster of 92 Nvidia DGX-2 servers running 1,472 Nvidia Tesla V100 GPUs. The company said it was also able to train BERT-Large on one DGX-2 server in 2.8 days.
"We're really excited about reducing the time it takes because the research process is an exploration," where researchers are constantly iterating on their models," Catanzaro said. "The more iterations they can perform, the more accurate and useful their models become."
The company said it has also trained the largest language model based on Transformers, the building block used for BERT, measuring at 8.3 billion parameters, or 24 times larger than the size of the BERT-Large model.
Catanzaro said one of the constraints in training larger models, which is key to more accurate conversational AI, is traditional software bounded by the size of memory in the system. To get around this, the company developed a way for servers to perform model parallelism, where each GPU in a system is assigned a different part of the model. Nvidia is open-sourcing its model parallelism code, Megatron-LM, as part of its overall software sharing efforts.
As for interence, Nvidia said it was able to perform inference on the BERT-Base SQuAD dataset in only 2.2 milliseconds, well below the 10-millisecond processing threshold require by many real-time applications. The company said this is also a major improvement over the 40 milliseconds measured CPU-optimized code. This was made possible by Nvidia T4 GPUs running optimized TensorRT code.
Dominic Daninger, vice president of engineering at Nor-Tech, a Burnsville, Minn.-based high-performance computing system builder, said one of his customers, a hard drive manufacturer, has invested a lot of its infrastructure in GPUs, which make up 80 percent of systems versus CPUs.
"I think their code leads itself to parallelism, which is a good match for the capabilities of the GPU, so they're just getting a lot more compute for the dollar," he said.
While AMD is making a lot of noise in the data center with its new second-generation EPYC CPUs, Nvidia's GPUs still have some important advantages in the HPC and AI space, especially when it comes to core count, according to Daninger.
"I think Nvidia has a pretty strong play in the GPU market," he said. "AMD has not been able to make as strong a play there as they have with the CPU."