businessinsider.com

Amazon's AI chip executive tells BI why Nvidia is not a competitor, how Anthropic helps, and…

Amazon's AI chip executive tells BI why Nvidia is not a competitor, how Anthropic helps, and what AMD needs

Eugene Kim

2024-12-06T19:52:43Z

Gadi Hutt, AWS Annapurna Labs's senior director of customer and product engineering

Gadi Hutt, AWS Annapurna Labs's senior director of customer and product engineering Amazon

AWS launched new AI chips that compete with Nvidia's GPUs.

AWS says its goal is to provide more customer choice, not to dethrone Nvidia in the AI chip market.

Gadi Hutt, a senior director at AWS, also talked about partnerships with Intel, Anthropic, and AMD.

Amazon Web Services launched an upgraded line of AI chips this week, putting the company squarely in competition with Nvidia.

Except AWS doesn't see it that way.

AWS's new AI chips are not meant to go after Nvidia's lunch, said Gadi Hutt, senior director of customer and product engineering at the company's chip designing subsidiary, Annapurna Labs. The goal is to give customers a lower-cost option because the market is big enough for multiple vendors, Hutt told Business Insider in an interview at AWS's re:Invent conference.

"It's not about unseating Nvidia," Hutt said. "It's really about giving customers choices."

Hutt's comments are against the backdrop of AWS spending tens of billions of dollars on generative AI. This week, the company unveiled its most advanced AI chip, called Trainium 2, which can cost roughly 40% less than Nvidia's GPUs, and a new supercomputer cluster using the chips, called Project Rainier. Earlier versions of AWS's AI chips saw mixed results.

Hutt insists this is not a competition, but a joint effort to grow the overall size of the market. The customer profiles and AI workloads they target are also different. In the foreseeable future, Nvidia's GPUs will remain dominant, he added.

In the interview, Hutt discussed several topics, including AWS's partnership with Anthropic, which will be Project Rainer's first customer. The two companies have worked closely over the past year, as Amazon recently invested an additional $4 billion in the AI startup.

He also shared his thoughts on AWS's partnership with Intel, whose CEO Pat Gelsinger just retired. He said AWS will continue to work with the struggling chip giant because customer demand for Intel's server chips remains high.

Last year, AWS said it was considering selling AMD's new AI chips. But Hutt shared new insight, saying those chips are still not available on AWS because customers have not shown strong demand yet.

This Q&A has been edited for clarity and length.

Q: There have been a lot ofheadlines saying Amazon is out to get Nvidia with its new AI chips. Can you talk about that?

I usually look at these headlines, and I giggle a bit because, really, it's not about unseating Nvidia. Nvidia is a very important partner for us. It's really about giving customers choices.

We have a lot of work ahead of us to ensure that we continuously give more customers the ability to use these chips. And Nvidia is not going anywhere. They have a good solution and a solid roadmap. We just announced the P6 instances (AWS servers with Nvidia's latest Blackwell GPUs), so there's a continuous investment in the Nvidia product line as well. It's really to give customers options. Nothing more.

Nvidia is a great supplier of AWS and our customers love Nvidia. I would not discount Nvidia in any way, shape, or form.

Q: So, you want to see Nvidia's use case increase on AWS?

If customers believe that's the way they need to go, then they will do it. Of course, if it's good for customers, it's good for us.

The market is very big, so there's room for multiple vendors here. We're not forcing anybody to use those chips, but we are working very hard to ensure that our major tenants, which are high performance and lower cost, will materialize to benefit our customers.

Q: Does it mean AWS is OK being second place?

It's not a competition. There's no machine learning award ceremony every year.

In the case of a customer like Anthropic, there's very clear scientific evidence that the larger your compute infrastructure, it allows you to build larger models with more data. And if you do that, you get higher accuracy and more performance.

Our ability to scale capacity to hundreds of thousands of Trainium 2 chips (Project Rainier) gives them the opportunity to innovate on something they couldn't have done before. They get 5x boost in productivity.

Q: Is being number one important?

The market is big enough. Number two is a very good position to be in.

I'm not saying I'm number two or number one, by the way. But it's really not something I'm even thinking about. We are so early in our journey here in machine learning in general, the industry in general, and also on the chips specifically, we are just heads down serving customers like Anthropic, Apple, and all the others.

We're not even doing competitive analysis with Nvidia. I'm not running benchmarks against Nvidia. I don't need to.

For example, there's ML Perf, an industry performance benchmark. Companies that participate in ML Perf have performance engineers in the company working just to improve ML perf numbers.

That's completely a distraction for us. We are not participating in that because we don't want to waste time on a benchmark that isn't customer-focused.

Q: On the surface, it seems like helping companies grow on AWS is not always beneficial for AWS's own products because you're competing with them.

We are the same company that is the best place Netflix is running on. And we also have Prime video. It's part of our culture.

I will say that there are a lot of customers that are still on GPUs. A lot of customers love GPUs and they have no intention to move to Trainium anytime soon. And that's fine because, again, we are giving them the options and they decide what they want to do.

Q: Do you see these AI tools becoming more commoditized in the future?

I really hope so.

When we started this in 2016, the problem was that there was no operating system for machine learning. So we really had to invent all of the tools that go around these chips to make them work for our customers as seamlessly as possible.

If machine learning becomes commoditized on the software and hardware sides, it's a good thing for everybody. It means that it's easier to use those solutions. But running machine learning meaningfully is still an art.

Q: What are some of the different types of workloads customers might want to run on GPUs vs Trainium?

GPUs are more of a general-purpose processor of machine learning. All of the researchers and data scientists in the world know how to use Nvidia pretty well. If you invent something new, if you do that on GPU, then things will work.

If you invent something new on specialized chips, you will have to either ensure compiler technology understands what you just built or create your own compute kernel for that workload. We are focused mainly on use cases where our customers tell us, 'Hey, this is what we need'. Usually, the customers we get are the ones that are seeing increased costs as an issue and they are trying to look for alternatives.

Q: So the most advanced workloads are usually reserved for Nvidia chips?

Usually. If data science folks need to continuously run experiments, they will probably do that on a GPU cluster. When they know what they want to do, then that's where they have more options. That's where Trainium really shines because it gives high performance at a lower cost.

Q: AWS CEO Matt Garman previously said the vast majority of workloads will continue to be on Nvidia.

It makes sense. We give value to customers who have a large spend and they're trying to see how they can control the costs a bit better. When Matt says the majority of the workloads, it means medical imaging, speech recognition, weather forecasting, and all sorts of workloads that we are not really focused on right now because we have large customers who ask us to do bigger things. So, that statement is 100% correct.

In a nutshell, we want to continue to be the best place for GPUs and, of course, Trainium when customers need it.

Q: What has Anthropic done to help AWS in the AI space?

They have very strong opinions of what they need, and they come back to us and say, 'Hey, can we add feature A to your future chip?' It's a dialogue. Some ideas they came up with were not feasible to even implement in a piece of silicon. We actually implemented some ideas, and for others, we came back with a better solution.

Because they are such experts in building foundation models, this really helps us home in on building chips that are really good at what they do.

We just announced Project Rainier together. This is someone who wants to use a lot of those chips as fast as possible. It's not an idea—we're actually building it.

Q: Can you talk about Intel? AWS's Graviton chips are replacing a lot of Intel chips at AWS data centers.

I'll correct you here. Graviton is not replacing x86. It's not like we are yanking out x86 and putting Graviton in place. But again, following customer demand, more than 50% of our recent landings on CPUs were Graviton.

It means that the customer demand for Graviton is growing. But we are still selling a lot of x86 cores too for our customers and we think we're the best place to do that. We are not competing with these companies, but we're treating them as good suppliers and we have a lot of business to do together.

Q: How important is Intel going forward?

They will for sure continue to be a great partner for AWS. There are a lot of use cases that run really well on Intel cores. We're still deploying them. There is no intention to stop. It's really following customer demand.

Q: Is AWS still considering selling AMD's AI chips?

AMD is a great partner for AWS. We sell a lot of AMD CPUs to customers as instances.

The machine learning product line is always under consideration. If customers strongly indicate that they need it, then there's no reason not to deploy it.

Q: And you're not seeing that yet for AMD's AI chips?

Not yet.

Q: How supportive are Amazon CEO Andy Jassy and AWS's CEO Matt Garman of the AI chip business?

They're very supportive. We meet them on a regular basis. There's a lot of focus across leadership in the company to make sure that the customers who need ML solutions get them.

There is also a lot of collaboration within the company with science and service teams that are building solutions on those chips. Other teams within Amazon, like Rufus, the AI assistant available to all Amazon customers, run entirely on Inferentia and Trainium chips.

Do you work at Amazon? Got a tip?

Contact the reporter, Eugene Kim, via the encrypted-messaging apps Signal or Telegram (+1-650-942-3061) or email (ekim@businessinsider.com). Reach out using a nonwork device. Check out Business Insider'ssource guidefor other tips on sharing information securely.

Read full news in source page