Oracle Cloud Preparing To Introduce Nvidia’s New A100s

The groundbreaking GPUs designed to accelerate machine learning training and inference will be available on Oracle Cloud Infrastructure instances by the end of summer.

ARTICLE TITLE HERE

Oracle is preparing to introduce Nvidia’s groundbreaking A100 Tensor Core GPUs to its cloud data centers by the end of summer.

With the Redwood Shores, Calif.-headquartered company’s traditional emphasis on data and analytics, Oracle Cloud looks to stay at the forefront of enabling advanced number-crunching workloads. That means making available to customers the latest-and-greatest hardware, Karan Batta, senior director of Oracle Cloud Infrastructure development and product management, told CRN.

But the recently unveiled A100s, based on Nvidia’s 7-nanometer Ampere GPU architecture, in themselves are not a differentiator—it’s the cloud environment in which they’re installed that really makes them shine, Batta said.

“At the end of the day, this is no special sauce for a cloud provider. Everybody eventually is going to have the same chip,” Batta said. “The real special sauce comes from platform and what the platform offers.”

Oracle Cloud Infrastructure’s emphasis on bare-metal computing, flexible instances and fast networking will set it apart in making Nvidia’s new GPUs excel for High Performance Computing and artificial intelligence workloads, he told CRN.

“We have a set of building blocks that basically enhances the offering,” Batta said.

Oracle’s Gen 2 cloud has been gaining momentum over the last few months, with more customer wins and broader regional availability, Batta said.

It’s a cloud designed with a different philosophy from the hyper-scale market leaders—purpose built to deliver performance and reliability akin to an on-premises machine.

OCI distinguishes itself by securely isolating customer workloads from Oracle’s cloud controller software, delivering the superior performance of bare-metal, and operating a network that’s never oversubscribed, Batta said.

That network is “completely flat, predictable, and super cheap,” Bata said. “That’s a business decision and a technical decision.”

Market leader Amazon Web Services offers “a commodity cloud,” Batta said. But Oracle-based enterprises, most of which are still operating their IT on-premises, typically have different expectations.

“We have customers that run their entire data catalog on top of Oracle, so they expect a level of performance and flexibility and a price offering that we’ve built from the ground up.”

The A100s will be the third generation of Nvidia GPUs offered by Oracle Cloud.

Three years ago, Oracle’s Gen 1 cloud introduced GPUs developed with Nvidia’s Pascal architecture. Currently, Oracle offers those featuring Nvidia’s Volta architecture.

The A100s, featuring Ampere architecture, will be available to Oracle customers in Europe, North America and Asia by late summer.

The new chips unify the AI training and inference processes into one architecture that can outperform previous generations—V100 and T4—several times over. The A100s can partition into as many as seven distinct GPU inferences, Nvidia revealed during its virtual GPU Technology Conference Thursday. Or they can link with seven other units to act as one giant GPU.

Oracle Cloud complements their abilities by offering remote direct memory access to speed the network used by parallel compute clusters doing machine learning training, Batta told CRN.

“Nvidia is a longtime partner for us,” Batta said. “When we jumped into the cloud segment, we wanted to offer a complete portfolio of instances, which means you have to have a GPU offering for data scientists, engineers, and machine learning algorithms. “

Working with Nvidia, Oracle has optimized those instances for customers looking to match on-premises performance, offered Nvidia Quadro in its cloud to power workstations, and enabled running the Nvidia GPU Cloud (NGC) environment on top of those processors.

“Our target is legacy,” Batta said. “Big enterprise customers that still haven’t moved to cloud yet. We’ve built offerings for them specifically.”

Cloud is a multi-trillion-dollar market, and none of the Infrastructure-as-a-Service leaders are anywhere near that level of revenue, which suggests “most of world is still on-prem,” Batta said.

Oracle looks to accelerate migrations by allowing enterprises to preserve their current infrastructure components, such as 10-year-old operating systems, and giving them freedom on the timeline by which they upgrade and modernize applications. Oracle doesn’t want customers to have to adopt technologies like Kubernetes forcibly, he said.

“We say lift and improve,” Batta said. “We want you to be able to move your entire stack on-prem into our Oracle cloud and then you can improve components of that.”

“Nvidia has done a great job in terms of hardware innovation, but it takes time for applications and workloads to really harness all that power,” Batta told CRN.

Oracle expects some customers to quickly migrate to the A100 instances once they’re available—others to take more time.

Researchers typically adopt on a quicker timeline to realize a performance boost, which makes the latest GPUs especially useful right now in empowering scientists studying Covid-19. Enterprises running mission-critical systems that support their own customers are less likely to make a big change right away.

But with the latest frameworks now revealed by Nvidia, developers can begin preparing applications running on the current GPU generation to migrate once the A100s come online.

The OCI A100-powered instances will be sold at $3.05 per hour per GPU across all Oracle data centers that support them.