Inside Intel, Cnvrg.io Wants To Be ‘Switzerland For AI Computing’
In an interview with CRN, cnvrg.io CEO Yochay Ettun talks about his ambitions of cnvrg.io becoming the ‘Switzerland for AI computing,’ how the new Metacloud service is helping AI developers fight against vendor lock-in and why Intel acquired his company last year.
Mixing And Matching Hardware To Optimize Machine Learning Workflows
Inside Intel, a little-known, independently operated company has ambitions of becoming the “Switzerland for AI computing” by making it effortless for enterprises to run and move workloads across a variety of infrastructure, whether it’s in the cloud, on-premises or at the edge.
That means making it as simple for an enterprise to spin up an AI compute instance on a Dell EMC server as it is on Amazon Web Services. But it also means creating seamless workflows between the different infrastructure types so that organizations can optimize for performance and cost without needing to spend months “re-instrumenting” software stacks for a new environment.
[Related: Intel CEO Pat Gelsinger: The Silicon Man With The Software Plan]
The Intel-owned company making these capabilities possible is called cnvrg.io, and its CEO and co-founder, Yochay Ettun, said the new offering, cnvrg.io Metacloud, is part of the company’s ambitions to build an AI infrastructure marketplace that serves as a “cloudless cloud.”
“Eventually what we see is Metacloud providing OEMs and other infrastructure providers an easier way to consume their resources,” he told CRN in an interview.
Metacloud is a managed version of cvrg.io’s “operating system” for machine learning, and, in support of Intel CEO Pat Gelsinger’s vow to be “ecosystem-friendly,” it can juggle workloads on systems that are not just based on Intel chips but also based on those from its rivals, Nvidia and AMD.
“With AI, we have many customers doing their pre-processing on Xeon CPUs, and then they‘re doing their training — some of it is done on CPU, some of it is on a GPU — and then they do inference on CPU. So even in a single pipeline, you need this type of heterogeneous compute support,” Ettun said.
In his interview with CRN, Ettun talked about how Metacloud is helping AI developers fight against vendor lock-in, how the new service will help OEMs, how it fits in with Intel CEO Pat Gelsinger’s new “software-first” strategy and why Intel acquired the company last year.
How does cnvrg.io the operating system works? And how does Metacloud work on top of that?
So cnvrg.io, we founded it a few years ago, Leah [Kolben] and myself — Leah is the CTO. And we were both data scientists ourselves, building machine learning models for a lot of different companies. And we built cnvrg.io basically to solve a pain [point] that we had when building models, and we saw we were wasting a lot of our time on stuff that doesn‘t relate to data science like DevOps and infrastructure and spinning up resources and spinning them down and configuring networks and all those kinds of things. And we also saw that we need some sort of management platform for our AI development. So just like you have like GitHub and CI/CD [combined practices of continuous integration] for software developers, we also needed similar but different tools for AI.
So we built cnvrg.io. It was an internal product. It was very helpful for us, but we saw it can help many other customers as well, so we decided to take it out and build a company out of it. Now cnvrg.io today is basically an end-to-end machine learning platform. We help data scientists build and deploy machine learning models to production for research for any type of use case. We have customers that use it for computer vision, autonomous driving applications, many customers that use it for analytics and forecasting and text classification. The platform is basically an end-to-end AI development platform.
Now cnvrg.io Metacloud, it‘s a pretty exciting announcement for us, because what we’re doing basically is that we have a managed instance of cnvrg.io. So today, every customer needs to install cnvrg.io on their own private environment. And with Metacloud, we have a single managed cnvrg.io instance that any customer can sign up as a [Software-as-a-Service] and bring their own compute and storage. Technically speaking, we’re allowing customers to connect their own cloud providers, on-prem resources or even both into the same cnvrg.io deployment. And this basically means that as a data scientist, I can now spin up resources on [Amazon Web Services], on [Microsoft] Azure, on my Dell on-prem resources, on my own private GPU cluster. Anywhere I want, I can do that. Also important is the work that we’re doing with Dell. I’m sure you know about Dell APEX and the as-a-Service type of things.
CRN's Mark Haranas covers Dell and its APEX offerings closely, so I'm definitely aware of the new consumption models that OEMs are starting to embrace.
And I think this is really amazing. I think Dell, Lenovo and Supermicro and many other partners of cnvrg.io are going after that direction, and cnvrg.io basically serves as the software layer that connects the data scientists to the OEM. So for cnvrg.io, our customers will be able to spin up Dell resources, Lenovo resources, Supermicro [resources]. Instead of going to Supermicro and purchasing a large server and then shipping and installing and configuring, everything is on demand — just like the AWS experience, [it‘s] the same with cnvrg.io and the other OEMs.
For the enterprise customer, the end users for Metacloud are obviously data scientists, but who is managing or running Metacloud for them?
That‘s a good question. So when we started cnvrg.io, it was only data scientists. The AI market has evolved pretty fast, and today we’re selling to IT executives and engineers, DevOps engineers, all of the people that basically manage that infrastructure. Think of it: You buy a large GPU or CPU cluster. The data scientists won’t be involved with the purchase. It’s the infrastructure administrator or something like that, so we work with them.
The press release for Metacloud lists all these different integrations with infrastructure providers, but then you also specifically list an integration with Intel. What does that mean?
I actually I read your piece with Greg [Lavender, Intel‘s new CTO and head of the Software and Advanced Technology Group], which was amazing. I even texted Greg that it was really good. And it’s exactly what Greg said: Intel wants to meet where the developers are, right? He was saying that, and cnvrg.io is basically at the top of the stack — I think one of the [highest] in the stack at Intel. And we interact daily with developers, with data scientists, with software developers that are not driver developers or something like that. We meet developers, and that was basically one of the motivations for Intel to invest in cnvrg.io and to basically grow the business, because [Intel CEO] Pat [Gelsinger] and Greg are looking to be more close with the developers, and we provide a platform for top-of-the-stack developers, helping them build and deploy AI. So that’s one. Second, we have product integration with different Intel products, like OpenVINO and other software toolkits, that make it easier for data scientists to run their models faster, so many integrations around that.
I know one crucial aspect of OpenVINO is that it works with pretty much any Intel hardware, like whether it's a CPU, FPGA or integrated graphics. To me, that sounds like you could use Metacloud on pretty much any system that is running Intel hardware. Is that a correct assumption?
Yeah, and we even have native integration in production today [for] Habana, [Intel‘s standalone business that produces AI accelerator chips]. Habana just announced [on Oct. 26 that] they have their [Gaudi AI chip] offering on AWS in general availability, so with cnvrg.io, you will be able to spin up Habana resources and use those.
Unleash The Power Of Data
With respect to data scientists needing to run workloads across different kinds of infrastructure, is that something that data scientists actually want to be able to do?
Yeah, for sure. So with data science, I think with almost any type of application, you would want to move it to the cheapest place or to the most high-performance place. With AI, we have many customers doing their pre-processing on Xeon CPUs, and then they‘re doing their training — some of it is done on CPU, some of it is on a GPU — and then they do inference on CPU. So even in a single pipeline, you need this type of heterogeneous compute support. Now, if you add that to price and then also performance, then eventually what we will see in the next few years, it will be a requirement for any type of development platform to support this type of hybrid, heterogeneous compute approach.
Today with Metacloud, we allow customers basically in a single pipeline to run workloads on different platforms on different architectures and also on different clusters. Like the training could run on Dell, the pre-processing could run on AWS, the inference could be at the edge. So yeah, we basically make it seamless to run these types of workloads that without cnvrg.io Metacloud would probably take a few months or even more to set up and configure.
Cnvrg.io is pitching this as a way for data scientists to get past vendor lock-in. How big of a problem is vendor lock-in right now for AI developers?
It is a problem. We have customers that are looking to move between different cloud providers, between on-prem and cloud. Many of our customers are using cnvrg.io as a hybrid solution, so they basically bought an on-prem cluster because their cloud expenses got really high. And then they need to navigate: What type of workloads are they running on-prem, and what can they run on-prem? And what type of workloads will be running in the cloud? So this is indeed a problem.
Having the ability to run code with the data anywhere — some of it can run on spot Instances on AWS, some of it can run on Azure depending on the price and performance — I think our customers are very happy, and we have early adopters of AI as customers. And we‘re basically building Metacloud to make it seamless for any customer, for a small team, for a large enterprise, a startup, any type of customer.
Does Metacloud also work with AMD and Nvidia hardware?
Yep. Cnvrg.io is an independent company. I mean, we work closely with Intel, of course, and we have strategic projects with Intel. But, like I said in the beginning, cnvrg.io is a platform that was built by data scientists for data scientists, and we want to maintain this type of flexible core at cnvrg.io, open core type of things that we support, any type of hardware, any type of accelerator, any type of cloud, any partner. We‘re sort of like Switzerland for AI computing.
With Metacloud being able to support Nvidia-based systems, does that mean that you can have a DGX system on there alongside other compute resources?
Yeah, so the product itself, again, it‘s extremely flexible. There are no special requirements. You have a Kubernetes cluster. You can connect it to cnvrg.io Metacloud. It’s that simple. And we don’t care what type of hardware the user is running on and using. Just have it integrated with cnvrg.io Metacloud, and you’re good to go. So the CPU, GPU, accelerators, any type of hardware they want.
Is Metacloud supposed to serve as an open alternative to Nvidia's AI software stack?
So, yes and no. Basically no, because cnvrg.io supports any cloud, on-prem and hybrid type of deployments. A typical customer will sign up to cnvrg.io, connect their on-prem GPU cluster, connect their AWS account and run AI. Today, there is no real solution that provides this type of architecture.
But yes in the fact that Intel and cnvrg.io are going for software, and I think that‘s the key with the piece that you wrote about with [Intel CTO] Greg [Lavender]: that software is where Intel needs to invest in, and I would definitely say having that type of vision supports cnvrg.io really well with this new release.
What are Intel's expectations for how big of a business cnvrg.io becomes?
So I can‘t, of course, disclose any numbers, but I would say that cnvrg.io is something that is very important for Intel, and it fits well also with the narrative that you wrote about with [Intel CEO] Pat {Gelsinger] and Greg. And I think it‘s one of many other solutions that will be launched and offered to customers and developers in the space, so yeah, it’s important.
When you were in acquisition talks with Intel last year, what were the reasons that Intel wanted to acquire your company?
So we were acquired about a year ago. And Intel basically sees cnvrg.io as — so you have the Intel hardware, and you have the developer, the data scientists. We‘re in the middle. We’re interacting with both. And I think this type of holistic offering, an end-to end-offering, is something that was and still is very appealing to Intel.
What are the go-to-market plans for Metacloud and the operating system? Are you selling just direct, or do you have channel partners who can resell this?
So we have a direct sales approach, of course. We have an amazing sales team and marketing team. Today, we have a lot of customers in production, mainly thanks to the team.
With MetaCloud, we‘re basically doing that, plus we’re engaging with all the OEMs: Dell, Lenovo and everyone. Specifically with Dell now that we’re announcing [a partnership] at the Intel [Innovation event], they have a big motivation to sell cnvrg.io to their customers, because with cnvrg.io, they basically get a better-than-AWS experience for their customers. So they will sell cnvrg.io and promote cnvrg.io, and then they will have basically more customer reach, a better customer experience, so there is going to be a lot of work with Dell and the other [OEM] partners around that.
There’s obviously a lot of overlap between Intel’s channel partners and Dell’s channel partners, so you’re going to have coverage with OEM channel partners. But for those who are white-box builders — say they’re building systems using Supermicro parts or something like that — are you going to be working with those kinds of partners at some point?
So the new announcement is about the integration and the joint work we‘re doing with Dell. Eventually what we see is Metacloud providing OEMs and other infrastructure providers an easier way to consume their resources, so we do plan to get more [partners on board].
Also, from the developer perspective, the user perspective, we want to build an infrastructure marketplace. Like, I want to use specialized compute for autonomous driving from Dell on demand, so it‘s ready, seamlessly, like one click, and you have that. I want to use an edge type of architecture from another partner, so that would also be ready. And this is going to be big. With Metacloud, what we’re trying to say with this name is that we’re sort of building a “cloudless cloud.”