Microsoft CEO Deep Interview: Azure's profits largely come from supporting services, model developers will fall into the "winner's curse," and platform value will not disappear

Nadella stated that Azure/AI workloads not only require AI accelerators but also a large amount of supporting services. In fact, a significant portion of our profit margin comes from these supporting services. The essence of large-scale cloud business is to make Azure the ultimate platform for long-tail workloads. Nadella emphasized that Microsoft will reduce total cost of ownership through closed-loop optimization with its own models and custom chips. Nadella believes that there will always be a fairly powerful open-source model available as long as you have the supporting data resources and infrastructure

On November 13th, the Dwarkesh Patel podcast released the latest in-depth interview with Dwarkesh Patel, SemiAnalysis founder Dylan Patel, and Microsoft CEO Satya Nadella. In the interview, they discussed Microsoft's AI strategy, self-developed chips, Azure/cloud business, business models for artificial general intelligence (AGI), industry profits, and more.

(Interview screenshot)

Regarding Azure/cloud strategy, Nadella stated, Azure/AI workloads not only require AI accelerators but also a large amount of supporting services. In fact, a significant portion of our profit margins comes from these supporting services. The essence of a large-scale cloud business is to make Azure the ultimate platform for long-tail workloads. In terms of self-developed chips, Nadella emphasized that Microsoft will reduce total ownership costs through closed-loop optimization of its own models and custom chips. This vertical integration strategy aims to provide cost advantages for large-scale AI workloads.

On the commercialization of models, Nadella believes that there will always be a fairly strong open-source model available as long as you have the supporting data resources and infrastructure. Model developers may fall into the "winner's curse"—despite completing the arduous innovation work, the results can easily be replicated and commoditized. Companies that possess foundational data, contextual engineering capabilities, and data fluidity can fully leverage these checkpoints for retraining.

Nadella revealed that under the new agreement, Microsoft has full IP licensing for all system-level innovations from OpenAI (including chip and system design), except for consumer hardware. This means that Microsoft effectively possesses two top-tier AI system design capabilities: its own MAI (Microsoft AI) + Maia team, and the OpenAI team. Microsoft can draw the best technology from both sides, even directly using OpenAI's designs.

As part of this interview, Nadella gave Dwarkesh Patel and Dylan Patel an exclusive preview of Microsoft's brand new Fairwater 2 data center. During the tour, Microsoft Cloud and AI Executive Vice President Scott Guthrie disclosed that the company's goal is to increase training capacity tenfold every 18 to 24 months, and the new generation Fairwater 2 architecture will enhance training capacity by a full ten times compared to GPT-5.

Highlights summarized by Wall Street Insights are as follows:

Regarding IP sharing with OpenAI:

"In our case, the good news is that OpenAI has a project we can access." (Dylan asked: How much access do you have to that project?) "All of it." (Dylan confirmed: You have direct access to all the intellectual property? That's it... By the way, we also gave them a bunch of intellectual property to help them get started... We built all these supercomputers together.

About self-developed chip strategy:

The biggest competitor of any new accelerator, you could even say, is the previous generation of NVIDIA products. In a cluster, what I want to look at is the overall total cost of ownership (TCO). The way we want to do it is to establish a closed loop between our own MAI model and our chips, because I feel this gives you the 'human right' to make your own chips, and you really design the microarchitecture based on what you are doing.

About Azure/cloud strategy:

We also deeply recognize that every AI workload not only requires AI accelerators but also a lot of supporting services. In fact, a large part of our profit margin comes from these supporting services. Therefore, we want to make Azure the ultimate platform for long-tail workloads—this is the essence of large-scale cloud business, and at the same time, we must maintain absolute competitiveness starting from the most basic high-end training hardware layer.

But this cannot crowd out other businesses, because we are not just signing five bare-metal service contracts with five customers. That is not Microsoft's business model. That may be the business direction of other companies, which is normal. We clearly state that we are engaged in large-scale cloud computing business, ultimately providing long-tail services for AI workloads. To this end, we will maintain leading bare-metal as a service capabilities for a range of models, including self-developed models. In my view, this is the balance you see.

About MAI (Microsoft AI):

Therefore, when I plan Microsoft's AI roadmap, we will form a top-notch super-intelligence team. We will gradually release some models—these models may be applied to products due to characteristics such as latency optimization and cost advantages, or they may play a role due to special capabilities. At the same time, we will conduct practical research to prepare for the breakthroughs needed to achieve super-intelligence in the next five to eight years, while fully leveraging our existing GPT model family as the foundation for research and development.

About Agent HQ strategy:

At GitHub Universe... we say Agent HQ is the conceptual thing we want to build. Sometimes I describe it as cable television for all these AI agents, because I would bundle Codex, Claude, Cognition's stuff, anyone's agents, Grok, all of these into one subscription.

If we need to build some kind of heads-up display, then let me quickly guide and classify the content generated by coding agents. For me, between VS Code, GitHub, and all these new foundational components we will build as Mission Control with a control plane.

About industry profits:

From an industry structure perspective, I believe there will always be a fairly strong open-source model available, as long as you have the supporting data resources and infrastructure. As a model developer, you may fall into the "winner's curse"—although you have completed the arduous innovation work, the results can easily be replicated and commoditized And those companies that master data foundations, situational engineering capabilities, and data fluidity can fully acquire these checkpoints for retraining.

The future of the company will be tool-based business, where I have a computer, and I use Excel... The second world is where the company literally provides computing resources for AI agents, which operate completely autonomously. Our business, which today is end-user tool business, will essentially become the infrastructure business that supports agent work. You need somewhere to store it, somewhere to archive it, somewhere to discover it, and somewhere to manage all these activities, even if you are an AI agent.

About Optical Devices:

Scott Guthrie, Executive Vice President of Microsoft Cloud and AI: We strive to increase training capacity by ten times every 18-24 months. This [Fairwater2] architecture will enhance training capacity by a full ten times compared to GPT-5. The number of optical devices in this building is almost equivalent to the total of all our Azure data centers worldwide two years ago... We will aggregate computing resources across sites to perform large training tasks. These resources will in turn be used for training, data generation, and inference—not just forever processing a single workload... Fairwater4 will also connect to that trillion-bit network for ultra-fast interconnect; the AI wide area network goes straight to Milwaukee, where multiple Fairwater facilities are being built. The design of the campus clearly shows its optimization for model parallelism and data parallelism.

The following is the full transcript of the in-depth interview with Nadella, assisted by AI translation:

Interview Guests: Microsoft CEO Satya Nadella;

Podcast Host: Dwarkesh Patel;

Co-Interviewee: SemiAnalysis founder Dylan Patel;

Guest: Scott Guthrie, Executive Vice President of Microsoft Cloud and AI

Dwarkesh Patel:

Today we are interviewing Satya Nadella. "We" refers to me and Dylan Patel, the founder of SemiAnalysis. Satya, welcome.

Satya Nadella:

Thank you. It's great. Thank you for coming to Atlanta.

Dwarkesh Patel:

Thank you for showing us the new facility. It's really cool to see these.

Satya Nadella:

Of course.

Dwarkesh Patel:

Satya and Scott Guthrie—Executive Vice President of Microsoft Cloud and AI—showed us their brand new Fairwater 2 data center, which is currently the most powerful data center in the world

Scott Guthrie:

We have been working hard to increase our training capacity by ten times every 18 to 24 months. So this is actually a tenfold increase compared to the training of GPT-5. In terms of the number of fiber optics, the network fiber in this building is almost equivalent to the total of all our Azure data centers from two and a half years ago.

Satya Nadella:

There are about 5 million network connections here.

Dwarkesh Patel:

You have such large bandwidth between different sites within a region and between two regions. So is this a big bet on future scalability? Do you anticipate that there will be some huge model in the future that requires two full regions to train?

Satya Nadella:

Our goal is to be able to aggregate this computing power for large training tasks and then integrate these resources across sites.

The reality is that you will use it for training, then use it to generate data, and use it for various inferences. It won't always be used for just one workload.

Scott Guthrie:

Fairwater 4, which is currently under construction nearby, will also connect to that 1 Petabit network, allowing us to connect the two at a very high rate.

We also have AI wide area network connections to Milwaukee, where we are building several other Fairwater data centers.

Satya Nadella:

You can really see model parallelism and data parallelism.

It is essentially built for the training tasks and superclusters of this campus. Then through the wide area network, you can connect to the data centers in Wisconsin.

You can really run a training task that aggregates all these resources together.

Scott Guthrie:

What we are seeing now is a unit that does not yet have servers or racks.

Dylan Patel:

How many racks are there in a unit?

Scott Guthrie:

We may not disclose that, but...

Dylan Patel:

That's why I asked.

Scott Guthrie:

You will see when you go upstairs.

Dylan Patel:

I'm going to start counting.

Scott Guthrie:

You can start counting. We let you start counting.

Dylan Patel:

How many units are there in this building?

Scott Guthrie:

I can't tell you this part either.

Dwarkesh Patel:

Well, division is pretty simple, right?

Satya Nadella:

Oh my, it's a bit noisy here.

Dwarkesh Patel:

When you look at these, do you think, "Now I know where my money is going."

Satya Nadella:

It's like, "I'm running a software company. Welcome to the software company."

Dwarkesh Patel:

Once you decide to use GB200 and NVLink, how large is the design space? How many other decisions need to be made?

Satya Nadella:

From model architecture to optimized physical solutions, the two are coupled.

In that sense, it's also a bit scary because new chips will be released. For example, Vera Rubin Ultra. Its power density will be very different, and the cooling requirements will also be very different.

So you don't want to build everything to just one specification.

This goes back to the topic we will discuss later, which is that you want to scale over time, rather than scaling all at once and then being stuck.

Business Model of AGI

Dylan Patel:

When you look at all the past technological transformations—whether it's railroads, the internet, interchangeable parts, industrialization, cloud computing, all of these—each revolution has taken less time from the discovery of the technology to its adoption and penetration in the economy.

Many people who have appeared on the Dwarkesh podcast believe this is the last technological revolution or transformation, and this time it is very, very different.

At least so far in the market, we have skyrocketed to the point where super-scale companies will spend $500 billion in capital expenditures next year, a pace that is unparalleled in past revolutions.

The end state seems quite different.

Your framework for understanding this seems very different from what I would call the "AI bros" who say "AGI (Artificial General Intelligence) is coming." I want to understand this more deeply.

Satya Nadella:

I first feel excited, and I also think this might be the most significant thing since the Industrial Revolution. I start from that premise, but at the same time, I am a bit grounded, thinking that this is still an early stage.

We have built some very useful things, and we have seen some good features that seem to be working with these scaling laws.

I am optimistically thinking they will continue to work.

Some of these do require real scientific breakthroughs, but there is also a lot of engineering work and so on.

That said, I also hold the view that even what has happened in the computer field over the past 70 years has been pushing us forward

I like Raj Reddy's metaphor for AI.

He is a Turing Award winner from Carnegie Mellon University. Even before AGI, he had this metaphor about AI.

He said AI should be a guardian angel or a cognitive amplifier. I like this metaphor.

It's a simple way to think about the issue.

Ultimately, what is its human utility?

It will become a cognitive amplifier and a guardian angel.

If I look at it this way, I see it as a tool.

But you can also say it very mysteriously that it's not just a tool.

It does all these things that only humans have done so far.

But many technologies have been like this in the past.

Only humans did many things, and then we had tools that could do those things.

Dwarkesh Patel:

We don't have to get bogged down in definitions, but one way to think about it is that it might take five years, ten years, twenty years.

At some point, machines will ultimately produce "Satya tokens," and the Microsoft board will consider "Satya tokens" very valuable.

Dylan Patel:

How much economic value did you waste by interviewing Satya?

Dwarkesh Patel:

I can't afford the API cost of "Satya tokens."

No matter what you want to call it, whether "Satya tokens" are tools or agents, whatever.

Now, if your model's cost per million tokens is in the dollar or cent range, there is huge profit expansion potential because a million "Satya tokens" are very valuable.

My question is, where do those profits go, and what proportion can Microsoft capture?

Satya Nadella:

In a sense, this goes back to what the essence of the economic growth picture will look like?

What will companies look like?

What will productivity look like?

For me, that's the crux of the matter, and to reiterate, if the Industrial Revolution created... you only started to see economic growth after 70 years of diffusion.

That's another thing to keep in mind.

Even if this technology diffuses quickly, for real economic growth to occur, it must diffuse to the extent that work, work outcomes, and workflows must change.

So this is where I think we shouldn't underestimate the change management required for a company to truly transform.

Looking ahead, will humans and the tokens they produce gain higher leverage, whether it's the future "Dwarkesh token" or "Dylan token"?

Think about the amount of technology you use now.

Could you operate SemiAnalysis or this podcast without technology?

Impossible, at the scale you can achieve, absolutely impossible.

So the question is, what is that scale?

Will it grow tenfold because of something? Absolutely.

Therefore, whether you reach a certain revenue number or a certain audience number or something else, I think that’s what’s going to happen.

The key is that what took the Industrial Revolution 70 years, maybe 150 years, could happen in 20 or 25 years.

If we are lucky, I would love to compress what happened in 200 years of the Industrial Revolution into 20 years.

Dylan Patel:

Microsoft can be said to be the greatest software company in history, the largest Software as a Service (SaaS) company.

You have gone through a transformation in the past; you used to sell Windows licenses and Windows disks or Microsoft products, and now you sell Office 365 subscription services.

When we transition from that transformation to your business today, there is another transformation underway.

The incremental cost per user for Software as a Service is very low.

There is a lot of R&D, a lot of customer acquisition costs.

This is somewhat why, not Microsoft, but SaaS companies are performing poorly in the market, because the COGS (Cost of Goods Sold) for AI is just too high, which completely breaks the way these business models operate.

As arguably the greatest Software as a Service company, how do you transition Microsoft to this new era where COGS is important and the incremental cost per user is different?

Because now your pricing is like, "Hey, Copilot is $20."

Satya Nadella:

That’s a great question, because in a sense, the leverage for the business model itself will remain similar.

If you look at the model menu from consumers to enterprises, there will be some advertising units, there will be some transactions, and there will be some gross margins for those building AI devices.

There will be subscriptions, for consumers and enterprises, and then there will be consumption-based billing.

So I still think these are all the measurement methods.

You are right, what is a subscription?

So far, people like subscriptions because they can budget for them.

They are essentially an authorization for some consumption rights, which are encapsulated in the subscription.

So I think in a sense this becomes a pricing decision.

How much consumption are you entitled to? If you look at all the coding subscriptions, that’s basically it, right?

Then you have professional, standard versions, and so on.

So I think that’s how the pricing and profit structure will layer.

Interestingly, at Microsoft, the good news for us is that we are involved in all these measurement methods.

At the portfolio level, we have almost all consumption-based billing, subscriptions, and all other consumer leverage

I believe time will tell us which of these models are meaningful in what categories.

Regarding SaaS, since you mentioned it, I've thought a lot about it.

Take Office 365 or Microsoft 365 as an example.

A low ARPU (average revenue per user) is good because there’s something interesting about it.

During the transition from servers to the cloud, one question we often asked ourselves was, "Oh my, if all we’re doing is migrating the same users who use our Office licenses and the Office servers at that time to the cloud, and we have COGS, this will not only shrink our margins but we will basically become a lower-margin company."

What actually happened was that the migration to the cloud crazily expanded the market.

We sold a few servers in India, and we didn’t sell much.

But in the cloud, suddenly everyone in India could also purchase servers and IT costs proportionally.

In fact, one of the biggest things I didn’t realize was the amount of money people spent on storage under SharePoint.

In fact, EMC's largest division might be the storage servers for SharePoint.

All of this dropped in the cloud because no one had to go purchase it.

In fact, this is working capital, which basically means cash outflow.

So it massively expanded the market.

So this AI thing will be like that too.

If you look at coding, we spent decades building things with GitHub and VS Code, and suddenly coding assistants reached such a large scale in just a year.

I think this is also what is going to happen, which is massive market expansion.

Copilot

Dwarkesh Patel:

One question is, the market will expand, but will the part of the revenue related to Microsoft expand? Copilot is an example.

If you look earlier this year, according to Dylan's data, GitHub Copilot's revenue was around $500 million, and there were no close competitors.

And now you have Claude Code, Cursor, and Copilot, all with similar revenues, around $1 billion. Codex is catching up, around $700-800 million.

So the question is, in all the areas Microsoft can touch, what advantages does Microsoft’s Copilot have over its peers?

Satya Nadella:

By the way, I love this chart.

I love this chart for many reasons. One is that we are still at the top.

The second is that all these companies listed here are companies that have been born in the last four or five years.

To me, this is the best sign. You have new competitors and new survival issues

When you say, who is it now? Claude wants to take you down, Cursor wants to take you down, this is not Borland (an old software company). Thank goodness. This means our direction is right.

That's it. The fact that we have reached this scale from nothing is market expansion.

It's like something in cloud computing. Fundamentally, the category of coding and AI may become one of the largest categories.

This is the software factory category. In fact, it may be larger than knowledge work. I want to keep an open mind about this. We will face fierce competition.

That's your point, and it's a good point. But I'm glad we have transformed what we have into this, and now we have to compete.

In terms of competition, even in the last quarter we just finished, we made a quarterly announcement, and I think we grew from 20 million to 26 million subscribers.

I feel good about our subscription growth and development direction. But the more interesting thing is, guess where all those other codebases generating massive amounts of code went?

They all went to GitHub. GitHub is at an all-time high in terms of codebase creation, PRs (pull requests), and everything else.

In a sense, we want to maintain this openness, by the way. This means we want to own that. We don't want to confuse it with our own growth.

Interestingly, we have one developer joining GitHub every second, and I think that's the statistic.

80% of them just entered some GitHub Copilot workflow because they were there. By the way, many of these things even use some of our code review agents, which are on by default just because you can use them.

We will have many, many structural opportunities. We will also do things like we did with Git.

The core elements of GitHub, starting from Git, to Issues (issue tracking), to Actions (automation workflows), these are all powerful and beautiful things because they are all built around your codebase.

We want to expand that. Last week at GitHub Universe, that's what we did.

We said Agent HQ is the conceptual thing we want to build.

For example, here you have something called Mission Control. You go to Mission Control, and now I can launch.

Sometimes I describe it as cable TV for all these AI agents because I would bundle Codex, Claude, Cognition stuff, anyone's agents, Grok, all of these into one subscription, and they would all be there.

So I get a package, and then I can really issue a task and guide them, so they will all work in their respective independent branches. I can monitor them I think this will be one of the biggest areas of innovation because now I want to be able to use multiple agents.

I want to be able to digest the outputs of multiple agents.

I want to be able to control my codebase. If I need to build some kind of heads-up display, then let me quickly guide and categorize the content generated by coding agents. For me, this will be Mission Control with a control plane among VS Code, GitHub, and all these new foundational components we will build.

Observability... think about everyone who needs to deploy all of this. It will require a whole set of observability regarding which agent did what to which codebase at what time. I feel this is the opportunity.

Ultimately, your point makes a lot of sense, which is that we better be competitive and innovate. If we don't, we will be overthrown.

But I like this chart, at least as long as we are at the top, even with competition.

Dylan Patel:

The key point here is that whoever's coding agent wins, GitHub will continue to grow.

But the growth rate of this market is only 10%, 15%, 20%, which is much higher than GDP. This is a good compound growth. But these AI coding agents have grown from about $500 million in annual operating revenue at the end of last year—when there was only GitHub Copilot—

to now, the current operating revenue for GitHub Copilot, Claude Code, Cursor, Cognition, Windsurf, Replit, OpenAI Codex in the fourth quarter of this year... the current annual operating revenue is $5-6 billion.

That's 10 times. When you look at the total addressable market (TAM) for software agents, is it the $2 trillion in wages you pay people, or does it go beyond that?

Because every company in the world can now develop more software?

There is no doubt that Microsoft is getting a piece of the pie.

But you have gone from nearly 100%, or certainly well above 50%, to below 25% market share in just one year.

How can people have confidence that Microsoft will continue to win?

Satya Nadella:

Dylan, this goes back to a point that there is no entitlement here, and we should have any confidence other than to say we should innovate. In a sense, we are lucky that this category will be much larger than anything we have a high share in.

Let me put it this way. You could say we have a high share in VS Code, and we have a high share in GitHub's codebase; that's a good market.

But the key is that even having a decent share in a much broader market...

You could say we have a high share in client-server computing. Our share in hyperscale computing is much lower than that

But is it a much larger business? An order of magnitude larger. So at least this is proof of existence, even if our share position is not as strong as it used to be, Microsoft has been doing quite well, as long as

the market we compete in is creating more value. And there are multiple winners. That's the key. But I accept your point that ultimately

all of this means you have to be competitive. I'm watching this every quarter. That's why I'm very optimistic about what we will do with Agent HQ, turning GitHub into a place where all these agents gather.

As I said, we will have multiple scoring opportunities there. It doesn't have to be... some of these people can succeed alongside us, so it doesn't necessarily have to be just one winner and one subscription service.

Whose margins will grow the most?

Dwarkesh Patel:

The reason I want to focus on this question is that it's not just about GitHub, but fundamentally about Office and all the other software Microsoft provides.

Regarding how AI evolves, you can have a vision where the models will continue to be limited, and you will always need this directly visible observability.

Another vision is that over time, these models that currently perform tasks that take two minutes will, in the future, perform tasks that take 10 minutes or 30 minutes.

In the future, perhaps they will autonomously complete workloads equivalent to several days. Then model companies might charge thousands of dollars for access, essentially being a colleague that can communicate with humans using any user interface and migrate between platforms.

If we are getting closer to that situation, why wouldn't those increasingly profitable model companies capture all the profits?

Why is it so important where the scaffolding that becomes increasingly irrelevant as AI gets stronger happens?

This relates to the relationship between the current Office and just doing knowledge work as a colleague.

Satya Nadella:

That's a great point. Will all value migrate to the models?

Or will it be distributed between the scaffolding and the models?

I think time will tell.

But my fundamental point is that the incentive structure has become clear.

Let's take information work as an example, or even coding as an example.

In fact, one of my favorite settings in GitHub Copilot is called auto, which automatically optimizes.

I actually purchased a subscription, and the auto mode starts selecting and optimizing what I ask it to do.

It can even be fully autonomous. It can arbitrage the available tokens between multiple models to complete tasks.

If you accept this argument, the commodity there will be the models.

Especially with open-source models, you can choose a checkpoint, you can take a batch of your data, and then you can see it

I think all of us will start to see some internal models, whether from Cursor or Microsoft.

Then you will offload most of the tasks to it.

So one argument is that if you win the scaffolding—today it is dealing with all the limitation issues or the unevenness of these intelligent problems, you have to do this—if you win it,

then you will vertically integrate yourself into the model, simply because you will have the liquidity of data and so on.

There will be enough checkpoints available. That’s another thing.

Structurally, I think there will always be a fairly powerful open-source model in the world, and then you can use it,

as long as you have something to work with, namely data and scaffolding.

I could argue that if you are a model company, you might have the winner's curse.

You might have done all the hard work, made incredible innovations, except it only needs to be copied once to be commoditized.

Then those who have the data for foundational and contextual engineering and data liquidity can take that checkpoint and train it.

So I think this argument can be viewed from two perspectives.

Dylan Patel:

To interpret what you said, there are two worldviews here.

One is that there are so many different models out there. Open source exists. There will be differences between models, which will drive to some extent who wins and who loses.

But "scaffolding" is the key to your victory.

The other perspective is that, in fact, the models are the key intellectual property.

Everyone is in fierce competition, kind of like "Hey, I can use Anthropic or OpenAI."

You can see this in the revenue charts. Once OpenAI eventually had a code model similar to Anthropic's capabilities, albeit in a different way, their revenue started to soar.

There is a viewpoint that model companies are the ones capturing all the profits.

Because if you look at this year, at least in Anthropic, their gross margin for reasoning has grown from far below 40% to over 60% by the end of the year.

Despite having more Chinese open-source models than ever, where the profit margins are still expanding.

OpenAI is competitive, Google is competitive, and X/Grok is now also competitive.

All these companies are competitive now, yet despite that, the profit margins at the model layer have significantly expanded.

How do you view this issue?

Satya Nadella:

That’s a great question. Maybe a few years ago, people said, "Oh, I can just package a model and build a successful company."

That may have been overturned, simply because of the model capabilities and especially the tools being used.

But interestingly, when I look at Office 365, let’s take this little thing we built, the Excel Agent, as an example

Very interesting. Excel Agent is not a UI-level wrapper.

It is actually a model located in the middle layer.

In this case, because we own all the intellectual property of the GPT series, we are leveraging it and embedding it into the core middle layer of the Office system, teaching it to understand the meaning of Excel and everything within it.

This is not just "Hey, I only have a pixel-level understanding."

I have a complete understanding of all the native components of Excel.

Because if you think about it, if I want to give it some reasoning tasks, I need to even fix the reasoning errors I make.

This means I need to not only see pixels, but I need to be able to see "Oh, I got that formula wrong," and I need to understand that.

In a way, all of this is not done at the UI wrapper layer with some prompt,

but is accomplished in the middle layer by teaching it all the tools of Excel.

I am essentially giving it a Markdown document to teach it the skills needed to become an advanced Excel user.

It's a bit strange; it goes back to the concept of the AI brain.

You are not just building Excel, traditional business logic in the conventional sense.

You are taking traditional Excel business logic and essentially wrapping it with a cognitive layer,

using this model that knows how to use the tools.

In a sense, Excel will come bundled with an analyst and all the tools used.

This is the type of thing everyone will build.

So even for model companies, they have to compete.

If they price high, guess what, if I am a builder of such a tool, I will replace you.

I might use you for a while.

So as long as there is competition... there is always a winner-takes-all situation.

If there is one model that is significantly better than all the others, with a huge gap, yes, that is winner-takes-all.

But as long as there is competition, with multiple models, like hyperscale competition, and open-source checks, there is enough space to build value on top of the models.

At Microsoft, my view is that we will engage in hyperscale business that will support multiple models.

We will have access to OpenAI models over the next seven years, and we will innovate on that basis.

Essentially, I believe we have a cutting-edge model that we can use and innovate on completely flexibly.

We will build our own models with MAI (Microsoft AI).

So we will always have a model layer.

Then we will build—whether in security, knowledge work, coding, or science—our own application scaffolding that will be model-oriented.

It will not be a wrapper on the model, but the model will be wrapped into the application

Dwarkesh Patel:

I have many questions about the other things you mentioned.

But before we move on to those topics, I still want to know if this is not a forward-looking view of AI capabilities, the model you envision is like what exists today.

It takes a screenshot of your screen, but it cannot see inside each cell and what the formulas are.

I think a better mental model here is to imagine these models being able to use computers like humans do.

A human knowledge worker using Excel can look at formulas, can use alternative software, and if migration is necessary, can migrate data between Office 365 and another software, and so on.

Satya Nadella:

That's what I mean.

Dwarkesh Patel:

But if that's the case, then the integration with Excel becomes less important.

Satya Nadella:

No, no, don't worry about Excel integration. After all, Excel was built as a tool for analysts.

Good. So whoever this AI is, as an analyst, they should have tools they can use.

Dwarkesh Patel:

They have computers.

Just like humans can use computers.

That's their tool.

Satya Nadella:

The tool is the computer.

So what I'm saying is that I'm building an analyst that essentially acts as an AI agent,

which happens to come with prior knowledge of how to use all these analytical tools.

Dwarkesh Patel:

To make sure we're talking about the same thing, is this something like a human using Excel, like me...

Satya Nadella:

No, it's fully autonomous.

So we should perhaps articulate what I think the future of the company is.

The future of the company will be a tool business where I have a computer, and I use Excel.

In fact, in the future, I might even have a Copilot, and that Copilot will also have an agent.

But it will still be me guiding everything, and everything will feed back to me.

That's one world.

The second world is that the company literally just provides computing resources for AI agents,

which work completely autonomously.

That fully autonomous agent will essentially have an embodied set of the same tools available to it.

So this AI tool coming in not only has a raw computer,

because using tools to get work done will be more token-efficient

In fact, I look at it and say, our business,

today is the end-user tools business,

will essentially become the infrastructure business that supports agent work.

This is another way of thinking.

In fact, everything we build under M365 will still be very relevant.

You need somewhere to store it, somewhere to archive it, somewhere to discover it,

somewhere to manage all these activities, even if you are an AI agent.

This is a new infrastructure.

Dwarkesh Patel:

To make sure I understand, you are saying that theoretically a future AI with actual computer usage capabilities—all these model companies are researching now—could use,

even if it doesn't partner with Microsoft or is under our umbrella, Microsoft software.

But you are saying that if you work with our infrastructure, we will give them lower-level access, enabling you to do the same things you could do more effectively?

Satya Nadella:

100%.

What happens is we have servers,

then we have virtualization,

then we have more servers.

This is another way of thinking.

Don't think of tools as the end thing.

What is the entire foundation under the tools used by humans?

The entire foundation is also the bootstrapping for AI agents,

because AI agents need computers.

In fact, one of the most fascinating things we see a lot of growth in is all these people doing these Office artifacts, etc., wanting to provide Windows 365 as autonomous agents.

They really want to be able to provide computers for these agents.

Absolutely.

That’s why we will essentially have the end-user computing infrastructure business,

which will continue to grow because it will grow faster than the number of users.

This is one of the other questions people ask me, "Hey, what will happen to the user-based business?"

At least early signs may suggest that thinking about the user-based business

is not just about users, but about agents.

If you say it’s about users and agents,

the key is what to provide for each agent?

A computer, a set of security measures around it, its identity.

All these things, observability, etc., are management.

All of this will be incorporated.

Dylan Patel:

The framework way—at least the way I’m currently thinking about it, I’d like to hear your perspective—is

these model companies are building environments to train their models using Excel or Amazon shopping or whatever, booking flights.

But at the same time, they are also training these models for transfer

Because that might be the most directly valuable thing: converting mainframe-based systems to standard cloud systems, converting Excel databases to real databases using SQL,

or transforming work done in Word and Excel into something more programmatic and

more efficient in the classical sense, which can also be done by humans.

It's just that it's not cost-effective for software developers to do this.

This seems to be something everyone will have to do with AI in the next few years to drive value at scale.

If models can leverage tools themselves to migrate to something, how does Microsoft fit in?

Yes, Microsoft leads in databases, storage, and all these other categories,

but the use of the Office ecosystem will significantly decline, just as the use of mainframe ecosystems may decline.

Mainframes have actually been growing over the past twenty years, even if no one talks about them anymore.

They are still growing.

Satya Nadella:

100%, I agree.

Dylan Patel:

What does this process look like?

Satya Nadella:

Ultimately, there will be a hybrid world for quite some time, as people will use tools that will work alongside agents that must use those tools,

and they must communicate with each other.

What are the artifacts I generate, and what does a human need to see?

All these things will be real considerations everywhere, outputs, inputs.

I don't think it's just about "oh, I migrated."

The bottom line is I have to live in this hybrid world.

But that doesn't fully answer your question, as there may be a truly new effective frontier where agents just work with agents and are fully optimized.

Even when agents work with agents,

what primitives are needed?

Do you need a storage system?

Does that storage system need to have e-discovery?

Do you need observability?

Do you need an identity system that will use multiple models with one identity system?

These are all core underlying infrastructures we have today for Office systems and so on.

This is also what we will have in the future.

You talked about databases.

I mean, man,

I want all Excel to have a database backend.

I want all of this to happen immediately.

That database is a good database.

The database will indeed be a big deal that will grow.

If I think about all Office artifacts being better structured, the ability to connect between structured and unstructured due to the agent world will grow

Underlying infrastructure business.

Coincidentally, all this consumption is driven by agents.

You could say that all of this is just software generated on-demand by model companies.

That could also be true.

We will also be such a model company.

We will build... competition may be that we will build a model plus all the infrastructure and provide it, and then there will be competition among those who can do that.

MAI (Microsoft AI)

Dwarkesh Patel:

Speaking of model companies, you mentioned that you will not only have the infrastructure but also the models themselves.

Now, the model recently released by Microsoft AI was launched two months ago and ranks 36th in the chatbot arena.

You clearly have the intellectual property of OpenAI. Given your agreement on this, it seems to be lagging behind.

Why is that, especially considering that you theoretically have the right to fork OpenAI's single codebase (monorepo) or distill their models, particularly if having a leading model company is a crucial part of your strategy?

Satya Nadella:

First of all, we will absolutely maximize the use of OpenAI models across all our products.

This is the core thing we will continue to do for the next seven years, not just using it but adding value to it.

This is where analysts and this Excel agent come in; these are the things we will do, and we will conduct reinforcement learning fine-tuning.

We will conduct some mid-term training runs based on the GPT family, where we have unique data assets and build capabilities.

For the MAI model, I think the way we think about it is that the good news of this new protocol is that we can very, very clearly indicate that we will build a world-class super-intelligent team and pursue it with high ambition.

But at the same time, we will also use this time wisely to think about how to use both things simultaneously.

This means we will be very focused on products on one end and very focused on research on the other.

Because we can access the GPT family, the last thing I want to do is use my computing power in a way that just repeats without adding much value.

I want to be able to use the computing power we used to generate the GPT family and maximize its value, while my MAI computing power is used for... let's take the image model we launched as an example; I think it ranks ninth in the image arena.

We use it for cost optimization; it's in Copilot, it's in Bing, we will use it.

We have an audio model in Copilot.

It has personality and so on. We optimized it for our products.

So we will do these things. Even in LMArena, we started with text, and when it debuted, it ranked about 13th

By the way, it only completed on about 15,000 H100s.

This is a very small model.

So this is again to demonstrate core capabilities, instruction following, and everything else.

We want to ensure that we can match the state-of-the-art level at that time.

This shows us what we could do if we give it more computing power, given the scaling laws.

What we are going to do next is an all-purpose model, where we will combine the work we have done in audio, image, and text.

This will be the next milestone in MAI.

So when I think about the MAI roadmap, we will build a world-class superintelligence team.

We will continue to release and release some of these models in an open manner.

They will either be used in our products because they will be latency-friendly, cost-friendly, or something else, or they will have certain special capabilities.

We will conduct real research to prepare for the next five, six, seven, or eight breakthroughs needed on the path to superintelligence—while leveraging the advantages we have with the GPT family, which we can build upon.

Dylan Patel:

Suppose we fast forward seven years, and you can no longer access OpenAI models.

What will Microsoft do to ensure they are at the forefront or have a leading AI lab?

Today, OpenAI has developed many breakthroughs, whether in scaling or reasoning.

Or Google has developed all the breakthroughs, like transformers.

But this is also a huge talent game.

You see Meta spending over $20 billion on talent.

You see Anthropic poaching the entire Blueshift reasoning team from Google last year.

You see Meta recently poaching a large reasoning and post-training team from Google.

These talent wars require a lot of capital.

It can be said that if you spend $100 billion on infrastructure, you should also spend X amount on the people using that infrastructure so they can achieve these new breakthroughs more effectively.

How can people believe that Microsoft will have a world-class team to achieve these breakthroughs?

Once you decide to open the funding tap—you are doing well in terms of capital efficiency now, looking smart, not wasting money on redundant work—but once you decide you need to do this, how can people say, “Oh yes, now you can rush into the top five models”?

Satya Nadella:

Ultimately, we will build a world-class team, and we already have a world-class team being formed.

We have Mustafa joining, we have Karen.

We have Amar Subramanya, who did a lot of post-training work on Gemini 2.5, and he is now at Microsoft

Nando, he has done a lot of multimedia work at DeepMind, and he is there.

We will build a world-class team.

In fact, later this week, Mustafa will release something that more clearly outlines what our lab is going to do.

What I want the world to know is that, perhaps, we will build infrastructure that supports multiple models.

Because from a hyperscale perspective, we want to build the most scalable infrastructure fleet capable of supporting all the models the world needs, whether they come from open source or, obviously, from OpenAI and other companies.

This is a job.

Secondly, in terms of our own model capabilities, we will absolutely use OpenAI models in our products, and we will start building our own models.

We might even—just like using Anthropic in GitHub Copilot—include other cutting-edge models in our products.

I think that’s what every time... ultimately, the evaluation of a product in meeting a specific task or job is what matters most.

We will start from there and work back to the necessary vertical integration, knowing that as long as you serve the market well with the product, you can always optimize costs.

Dwarkesh Patel:

There’s a future question.

Right now, we have this distinction between training and inference in our models.

Some might say that the differences between different models are becoming smaller.

Looking ahead, if you really expect human-level intelligence, humans learn on the job.

If you think about the past 30 years, what has made Satya's token so valuable?

It’s the wisdom and experience you’ve gained at Microsoft over the past 30 years.

We will eventually have models that, if they reach human level, will have this ability to continuously learn on the job.

This will bring so much value to leading model companies, at least from my perspective, because you have a copy of a model widely deployed across the economy, learning how to do every job.

Unlike humans, they can aggregate learning into that model.

So there’s this continuous learning exponential feedback loop that almost looks like some kind of intelligence explosion.

If that happens, and Microsoft is not the leading model company by then...

You say we replace one model with another, and so on.

Wouldn’t that be less important by then?

Because it’s as if there’s one model that knows how to do every job in the economy, while the other long-tail models do not.

Satya Nadella:

Your point is that if there is only one model that is the most widely deployed model in the world, it sees all the data and learns continuously, then the game is over, and you stop operating

At least the reality I see is that in today's world, despite any single model dominating, that is not the case.

Take coding as an example, there are multiple models.

In fact, this situation is decreasing every day.

No single model is widely deployed.

There are multiple models being deployed.

It's like databases.

There is always this question, "Can one database become the only database used everywhere?" The fact is that it is not.

There are various types of databases being deployed for different use cases.

I think any single model will have some network effects of continuous learning—I call it data liquidity.

Will it happen across all domains? I don't think so.

Will it happen across all geographies? I don't think so.

Will it happen across all market segments? I don't think so.

Will it happen simultaneously across all categories? I don't think so.

Therefore, I feel that the design space is so large, with many opportunities.

But your basic point is to have capabilities at the infrastructure layer, model layer, and scaffolding layer, and then be able to not only combine these things into a vertical stack but also be able to combine everything according to its purpose.

You cannot build infrastructure optimized for a single model.

What if you do that, and you fall behind?

In fact, all the infrastructure you build will be a waste.

You need to build infrastructure that can support multiple families and model lineages.

Otherwise, the capital you invest is optimized for one model architecture, which means you are just one step away from some breakthrough, like a certain MoE-type breakthrough, and your entire network topology will fail.

That is a terrible thing.

Therefore, you need infrastructure to support anything that may arise in your own model family and other model families.

You must remain open.

If you are serious about hyperscale business, you must take this seriously.

If you are serious about becoming a model company, you basically have to say, "What are the ways people can do things on top of models so that I can have an ISV ecosystem?"

Unless I think I will have every category, this is not possible.

Then you won't have an API business, which by definition means you will never become a platform company successfully deployed everywhere.

Therefore, the industry structure is such that it will truly force people to specialize.

In this specialization, companies like Microsoft should compete at every layer on their merits, but do not think that it is entirely about the road to the finish line; I am just vertically stacking all these layers.

That kind of thing will not happen.

Hyperscale cloud business

Dylan Patel:

So last year, Microsoft was on the path to becoming the largest infrastructure provider so far

You were the earliest in 2023, so you went out and acquired all the resources—leasing data centers, starting construction, ensuring power supply, everything.

At that time, you were expected to beat Amazon in 2026 or 2027.

Of course, by 2028, you would definitely beat them.

Since then, it can be said that in the second half of last year, Microsoft took a big pause, abandoning a bunch of sites that were originally planned to be leased, and then Google, Meta, Amazon in some cases, and Oracle took over those sites.

We are now sitting in one of the largest data centers in the world, so obviously this is not all; you are still expanding rapidly.

But there are some sites where you just stopped development.

Why did you do that?

Satya Nadella:

This goes back to a point: what exactly is a hyperscale cloud business?

One key decision we made is that if we want to build Azure to excel in all stages of AI—from training to mid-training to data generation to inference—we just need the interchangeability of clusters.

So the whole thing basically led us not to build a lot of capacity for specific generations.

Because another thing you have to realize is that so far, every 18 months, we have expanded the training capacity of various OpenAI models by ten times, and we realized that the key is to stay on that path.

But more importantly, there needs to be a balance, not just training, but the ability to serve these models around the world.

Because ultimately, the monetization rate will enable us to continue providing funding.

Then the infrastructure needs to support multiple models.

So once we said this is the reality, we adjusted our direction to the path we are on now.

If I look at the path we are on now, we have launched more projects.

We are also purchasing as much hosting capacity as possible, whether it's building, leasing, or even GPU as a service.

But we are building based on the demand we see, service demand, and training demand.

We don't want to just be a hosting provider for one company with a large amount of business.

That is not a business; you should vertically integrate with that company.

Given that OpenAI will become a successful independent company, that's great. It makes sense. Even Meta may use third-party capacity, but ultimately they will all be first-party.

For anyone with scale, they will become hyperscale providers themselves.

For me, it's about building a hyperscale cluster and our own research computing.

That's what the adjustment is about. So I feel very, very good.

By the way, another thing is that I don't want to be trapped by a generation of scale.

We just saw GB200, and GB300 is coming soon.

By the time I get to Vera Rubin and Vera Rubin Ultra, the data centers will look very different because the power per rack and power per row will be so different

The cooling requirements will be so different.

This means I don't want to just build a lot of gigawatts, and these are only for one generation, one series.

So I think the pace is important, interchangeability and location are important, workload diversity is important, customer diversity is important, and that is the goal we are building towards.

Another thing we learned is that every AI workload not only needs AI accelerators but also a lot of other things.

In fact, a lot of our profit structure will be in those other things.

Therefore, we want to build Azure to be excellent for long-tail workloads because that is the hyperscale business, while knowing we have to be extremely competitive on bare metal for the highest-end training.

But this cannot crowd out the rest of the business because we are not a business that only does five contracts and provides bare metal services for five customers. That is not Microsoft's business.

That might be someone else's business, and that's a good thing.

We are talking about engaging in hyperscale business, and ultimately this is the long-tail business of AI workloads.

To do this, we will have some leading bare metal as a service capabilities for a set of models, including our own.

I think that is the balance you see.

Dylan Patel:

Another question around the whole interchangeability topic.

Okay, it's not where you want it to be, you'd rather it be in a good population center like Atlanta. We're here. Another question is, as the scope of AI tasks expands, how important is this?

A reasoning prompt takes 30 seconds, or deep research takes 30 minutes, or at some point a software agent needs hours, days, etc., for human-computer interaction time.

Does it matter if it's in location A, B, or C?

Satya Nadella:

That's a great question. It is. In fact, this is also another reason we want to think about what Azure regions look like and what the network between Azure regions is.

This is, I think, as model capabilities evolve and the evolution of these token usages, whether synchronous or asynchronous, you don't want to be in a disadvantaged position.

And on that basis, by the way, what are the data residency laws?

There is the whole EU thing, and we actually have to create an EU data boundary.

This basically means you can't just call back and forth to anywhere, even asynchronously.

So you need to have possibly high-density regional things, and then power costs, etc.

But you are 100% right to point out that the topology we are building will have to evolve.

First, what is the economics of tokens per dollar per watt?

Overlay that with usage patterns, what are the usage patterns?

Usage patterns in synchronous and asynchronous terms. But also what is the compute storage?

Because delays can be important for certain things. Storage is best located there. If I have a Cosmos DB close to this for session data or even for autonomous things, then that also has to be somewhere nearby, and so on.

All these considerations will shape the hyperscale cloud business.

Dylan Patel:

Before we pause, your prediction is that by 2028 you will reach 12-13 gigawatts.

Now we are around 9.5.

But the more relevant thing—I just want you to specify more clearly that this is a business you don’t want to engage in—is that Oracle will grow from one-fifth of your scale to larger than you by the end of 2027.

While this is not the Microsoft level of investment capital return quality, they still achieve a 35% gross margin.

So the question is, maybe engaging in this is not Microsoft’s business, but by rejecting this business, giving up the right of first refusal, etc., you are creating a hyperscale cloud provider.

Satya Nadella:

First of all, I don’t want to belittle Oracle’s success in building their business, I wish them good luck.

What I think I’m answering for you is that for us, becoming a managed service provider for a model company with a limited time frame RPO doesn’t make sense.

Let’s put it this way.

What you have to consider is not what you are doing in the next five years, but what you are doing in the next 50 years.

We have made our series of decisions.

I feel very good about our partnership with OpenAI and what we are doing.

We have a nice business book. We wish them success.

In fact, we are buyers of Oracle’s capacity. We wish them success.

But at this point, I think the industrial logic of what we are trying to do is very clear, which is not about chasing... By the way, I track those things of yours, whether it’s AWS or Google and ours, I think that’s very useful.

But that doesn’t mean I have to chase those.

What I have to chase is not just the gross margins they may represent over a period of time.

What is this business book, and what can Microsoft uniquely clean up that makes sense for us to clean up? That’s what we are going to do.

Dwarkesh Patel:

I have a question, even stepping back from this perspective, I accept your point that if other conditions are the same, obtaining higher profits from long-tail customers is a better business than providing bare metal services for a few labs.

But then there’s a question, which direction is the industry heading?

If we believe we are on the path to increasingly intelligent AI, then why isn’t the shape of the industry such that OpenAI, Anthropic, and DeepMind are the platforms, and long-tail enterprises are actually doing business on them?

They need bare metal, but they are platforms.

What is the long tail of directly using Azure?

Because you want to use the general cognitive core.

Satya Nadella:

But these models will all be available on Azure, so any workload that says, "Hey, I want to use some open-source models and an OpenAI model," if you go to Azure Foundry today, you have all these models to configure, purchase PTU, get Cosmos DB, get SQL DB, get some storage, get some compute.

This is what real workloads look like.

Real workloads are not just API calls to models.

Real workloads require all these things to build applications or instantiate applications.

In fact, model companies need that to build anything.

It's not just like, "I have a token factory."

I have to have all these things.

This is hyperscale business. And it's not on any one model, but on all these models.

So if you want Grok plus, say, OpenAI plus an open-source model, come to Azure Foundry, configure them, build your application.

Here’s a database. This is what business looks like.

There is a separate business called selling raw bare metal services only to model companies.

This is the debate about how much of this business you want to engage in, how much you don't want to engage in, and what that is.

This is a very different segment of the business that we are in, and we also have constraints, limits on how much it will crowd out the rest of the business.

But at least this is how I see it.

Dylan Patel:

Here are two questions. One is, why can't you do both?

The other is, based on our estimates of your capacity in 2028, it is short by 3.5 gigawatts.

Of course, you could have dedicated it to OpenAI training and inference capacity, but you could also have dedicated it to actually running Azure, running Microsoft 365, running GitHub Copilot.

I could have just built it and not given it to OpenAI.

Satya Nadella:

Or I might want to build it in different locations.

I might want to build it in the UAE, I might want to build it in India, I might want to build it in Europe.

One thing is, as I said, we are now really facing capacity constraints, considering regulatory requirements and data sovereignty needs, we have to build around the world.

First of all, domestic capacity in the U.S. is very important, and we want to build everything.

But as I look to 2030, I have a global view of Microsoft's business forms divided by first-party and third-party

Third-party segmentation by frontier laboratories, how much they want, and the inference capacity we want to build for multiple models, as well as our own research computing needs.

All of this goes into my calculations.

You correctly pointed out the pause, but the pause is not because we said, "Oh my, we don't want to build that."

We realized that what we want to build is slightly different in terms of workload types, geographic types, and timing.

We will continue to increase our gigawatts; the question is at what speed and in what locations.

How do I leverage Moore's Law, that is, do I really want to overbuild 3.5 gigawatts by 2027, or do I want to spread these out over 2027-28, knowing that even... one of our biggest lessons learned with NVIDIA is that they accelerated their pace in migration.

This is an important factor. I don't want to be stuck with depreciation on a generation for four or five years.

In fact, Jensen's advice to me was two things.

One is to execute at the speed of light.

That's why the execution at this Atlanta data center...

I mean, from the time we get it to handover to real workloads is 90 days.

That is real speed of light execution.

I want to do well in that regard.

Then this way, I am building every generation in expansion.

And then every five years, you have something more balanced.

So it actually turns into a flow of large-scale industrial operations like this, where you suddenly are no longer unbalanced; you build a lot at once, and then you take a long break because you are stuck with all this, to your point, in a location that might be good for training or might not be good for inference because I can't serve, even if all this is asynchronous, because Europe won't let me go back and forth to Texas.

So these are all things that need to be considered.

Dylan Patel:

How do I reconcile this statement with what you've been doing over the past few weeks?

You announced deals with Iris Energy, Nebius, and Lambda Labs, along with some upcoming ones.

You are acquiring capacity there, renting capacity from "new clouds" instead of building it yourself.

Satya Nadella:

That's fine for us because now when you have a clear understanding of demand, it’s good to provide services where people are building.

In fact, we will lease, we will custom build, and we will even adopt GPU as a service in places where we don't have capacity but need capacity while others have capacity.

By the way, I would even welcome every emerging cloud just to be part of our market.

Because guess what? If they bring their capacity to our market, customers coming through Azure will use the emerging clouds, which is a huge win for them, and will utilize Azure's computing, storage, databases, and everything else

So I don't think this is "Hey, I should swallow all of that myself."

Self-developed Chips and Collaboration with OpenAI

Dwarkesh Patel:

You mentioned that this depreciating asset accounts for 75% of the total cost of ownership (TCO) of data centers over five to six years. And Jensen Huang made 75% profit on that. So all the hyperscale cloud service providers are trying to develop their own accelerators so they can reduce this overwhelming equipment cost and improve their profit margins.

Dylan Patel:

When you look at where they are now, Google is far ahead of everyone else. They have been doing this the longest.

They will produce about five to seven million of their own TPU chips.

Then you look at Amazon, they are trying to produce three to five million (lifetime shipment).

But when we look at the number of self-developed chips ordered by Microsoft, it is far below that number.

Your project has also been around for the same length of time. What’s going on with your internal chip project?

Satya Nadella:

That's a good question. There are a few points.

First, the biggest competitor to any new accelerator can even be said to be the previous generation of NVIDIA products.

In a cluster, what I want to look at is the overall total cost of ownership (TCO).

The standard I set, even for our own products... By the way, I just looked at the Maia 200 data, and it looks great, but one thing we've learned in computing is...

We used to have a lot of Intel chips, then we brought in AMD, and then we introduced Cobalt (Microsoft's self-developed CPU).

That's how we scaled. We have at least a good track record in core computing to prove how to build our own chips and then manage a coexistence of three, maintaining some balance in the cluster.

Because by the way, even Google is buying NVIDIA, and so is Amazon.

It makes sense because NVIDIA is innovating, and it is a general-purpose product.

All models run on it, and customer demand is there.

Because if you build your own vertical product, you better have your own model, either to train it or to infer it, you have to create your own demand for it or subsidize its demand.

Therefore, you need to ensure that you scale it appropriately.

The way we want to do it is to establish a closed loop between our own MAI model and our chips, because I think that gives you the "birthright" to make your own chips, you really design the microarchitecture based on what you are doing, and then you keep it in sync with your own model.

In our case, the good news is that OpenAI has a project we can access.

So, thinking that Microsoft won't have some sort of thing——

Dylan Patel:

How much access do you have to that project?

Satya Nadella:

All of it.

Dylan Patel:

You have direct access to all the intellectual property? So the only intellectual property you don't have is consumer hardware?

Satya Nadella:

That's right.

Dylan Patel:

Oh, okay. Interesting.

Satya Nadella:

By the way, we also gave them a bunch of intellectual property to help them get started. That's one of the reasons for them... Because we built all these supercomputers together.

We built it for them, and they naturally benefit from it.

Now when they innovate, even at the system level, we get all of that.

We want to instantiate what they built first, but then we will scale it.

So, if there's anything, I think about your question as Microsoft wanting to be an excellent, what I call, light-speed execution partner for NVIDIA.

Because frankly, that cluster is life itself.

Obviously, Jensen Huang's margins are doing very well, but total cost of ownership (TCO) has many dimensions, and I want to do well on that TCO.

On top of that, I hope to really collaborate with the OpenAI lineage and MAI lineage and system design because we know we have intellectual property on both ends.

Dwarkesh Patel:

Speaking of rights, you mentioned a few days ago in an interview that in your new agreement with OpenAI, you have exclusive rights to the stateless API calls made by OpenAI.

We're a bit confused if there is any state involved.

You just mentioned that all these upcoming complex workloads require memory, databases, storage, and so on.

If ChatGPT stores things in a conversation, then isn't that stateful now?

Satya Nadella:

That's the reason. The strategic decision we made was also to accommodate the flexibility that OpenAI needs for procuring computing resources...

Basically, you can think of OpenAI as having both PaaS (Platform as a Service) and SaaS (Software as a Service) businesses at the same time.

The SaaS business is ChatGPT.

Their PaaS business is their API. That API is exclusive to Azure.

The SaaS business, they can run anywhere.

Dylan Patel:

Can they build SaaS products with anyone they want to partner with?

Satya Nadella:

If they want to find a partner, and that partner wants to use stateless APIs, then Azure is where they can get stateless APIs.

Dylan Patel:

It sounds like they have a way to build products together, and it's something stateful...

Satya Nadella:

No, even then, they must come to Azure. Again, this is done in the spirit of "as part of our partnership, what we value."

We ensure that while providing OpenAI with all the flexibility they need, we are also their good partners.

Dylan Patel:

So for example, if Salesforce wants to integrate OpenAI. It's not through APIs.

They are actually collaborating together, training a model together, and then deploying it on, say, Amazon now.

Is that allowed, or do they have to use your...

Satya Nadella:

For any customized agreements like that, they must run on Azure...

We have made a few exceptions, like for the U.S. government, etc., but other than that, they must come to Azure.

Explosive Growth in Capital Expenditures

Dwarkesh Patel:

Stepping back, as we go back and forth through this factory,

One thing you mentioned is that Microsoft, you can think of it as a software business, but now it is really turning into an industrial business.

There is all this capital expenditure, all this construction.

If you just look at the past two years, your capital expenditures have nearly tripled.

Maybe if you project this trend forward, it actually becomes a huge industrial explosion.

Dylan Patel:

Other hyperscale cloud providers are taking out loans. Meta borrowed $20 billion in Louisiana.

They also did corporate loans.

It seems obvious that everyone's free cash flow will go to zero, and I believe if you dare to do that, Amy (Microsoft CFO) will give you a hard lesson, but what exactly happened?

Satya Nadella:

I think the structural changes you mentioned are significant.

I describe it as we are now both a capital-intensive business and a knowledge-intensive business.

In fact, we must use our knowledge to improve the return on investment (ROIC) of capital expenditures.

Hardware manufacturers have done an excellent job in marketing Moore's Law, and I think that's incredible and great

But if you look at some of the statistics I mentioned during the earnings call, for a given GPT family, the software improvements in token throughput per dollar per watt, whether quarter-over-quarter or year-over-year, are significant.

In some cases, it could be 5 times, 10 times, or even 40 times, just based on how you optimize.

This is the capital efficiency brought by knowledge intensity.

To some extent, this is something we must master.

Someone asked me, what is the difference between traditional custodians and hyperscale cloud service providers? It's software.

Yes, it is capital-intensive, but as long as you have system knowledge and software capabilities to optimize by workload and by cluster...

This is why we say there is so much software involved in substitutability.

It's not just about the cluster itself.

It's the ability to evict one workload and then schedule another workload.

Can I manage that scheduling algorithm well? This is the kind of thing we must do at a world-class level.

So yes, I think we will still be a software company, but it's a different business that we will manage.

Ultimately, the cash flow that Microsoft has allows us to fire on both cylinders.

Dwarkesh Patel:

It seems that in the short term, you believe things will take time and will be more bumpy.

But perhaps in the long run, you think those who talk about AGI and ASI (superintelligence) are right.

Sam (Altman) will ultimately be correct.

I have a broader question about what makes sense for a hyperscale cloud service provider, considering you have to invest heavily in something that will depreciate within five years.

So if you have a timeline for what someone like Sam expects to happen in three years, but you have a 2040 timeline, what makes sense in that world?

Satya Nadella:

There needs to be a portion of resource allocation for what I call research computing.

This needs to be done like you do R&D. Frankly, this is even the best accounting treatment.

We should view it as R&D expenses, and you should ask, "What is the scale of research computing, and how do you want to expand it?"

We can even say it has an order of magnitude expansion over a certain period. Choose your timeframe, is it two years? Is it 16 months? Whatever.

This is part of it, this is the basic input, it's R&D expenses.

The rest is demand-driven. Ultimately, you are allowed to build ahead of demand, but you'd better have a demand plan that doesn't go completely off the rails.

Dwarkesh Patel:

Do you believe that... these labs are now predicting revenues will reach $100 billion by 2027-28, and they predict revenues will continue to grow at 3 times, 2 times a year

Satya Nadella:

In the market, there are various incentives now, and rightly so.

What do you expect an independent lab trying to raise funds to do?

They have to publish some numbers so that they can really raise funds to pay for their computing costs and so on.

This is a good thing. There will always be someone willing to take some risks and invest in it, and they have already shown attractiveness.

It's not like taking all the risks without seeing them perform well, whether it's OpenAI or Anthropic.

So I feel great about everything they are doing, and we have a lot of business dealings with these guys. So it's all good.

But overall, there are ultimately two simple things.

One is that you have to allocate resources for R&D. You mentioned talent.

AI talent comes at a premium. You have to spend money there. You have to spend money on computing.

So in a sense, the ratio of researchers to GPUs has to be high.

That's what it takes to be a leading R&D company in this world.

It requires scaling, and you have to have a balance sheet that allows you to scale it long before it becomes a common consensus.

That's one thing. But the other is entirely about how to make predictions.

Will the world trust American companies to lead AI?

Dylan Patel:

Globally, the U.S. has dominated many tech stacks.

The U.S. has Windows through Microsoft, which is even deployed in China, and is the primary operating system.

Of course, there is open-source Linux, but Windows is ubiquitous on personal computers in China.

Look at Word; it's everywhere too.

Look at all these various technologies; they are deployed all over the world.

Microsoft and other companies are also developing elsewhere.

They are building data centers in Europe, India, and all these other places, in Southeast Asia, Latin America, and Africa.

In all these different places, you are building capacity.

But this seems very different.

Today, the political aspect of technology, the political aspect of computing... the U.S. government doesn't care about the internet bubble.

But it seems that the U.S. government, as well as all other governments in the world, are very concerned about AI.

The question is, we are somewhat in a bipolar world, at least between the U.S. and China, but Europe, India, and all other countries are saying, "No, we also want sovereign AI."

How does Microsoft navigate the difference from the 90s—when there was only one significant country in the world, which was the U.S., and our companies sold products worldwide, thus Microsoft reaped huge benefits—to a bipolar world?

In this world, Microsoft cannot take for granted the right to win over all of Europe, India, or Singapore

In fact, there are efforts towards sovereign AI. What is your thought process on this, and how do you view this issue?

Satya Nadella:

This is an extremely critical part.

I believe that the key, key priority for the American tech industry and the U.S. government is to ensure that we not only lead in innovation but also build trust in our technology stack globally together.

Because I always say, America is an incredible place.

It is unique in history.

It has 4% of the world's population, 25% of GDP, and 50% of market capitalization.

I think you should reflect on these ratios.

The reason that 50% of market capitalization exists, frankly, is because the world trusts America, whether it’s in its capital markets, its technology, or its management of leading industries at any given time.

If that trust is broken, it will not be good days for America.

Starting from that point, I think President Trump, the White House, David Sacks, everyone, I really believe, understands this.

Therefore, I appreciate anything that the U.S. government and the tech industry do together, such as collectively as an industry putting our own capital out into the world to take risks.

I hope the U.S. government can take credit for the foreign direct investment of American companies around the world.

This is the least talked about but should be the best marketing for America, which is that not only does all foreign direct investment flow into America, but the leading industries, namely these AI factories, are being created around the world.

By whom? By America and American companies.

So you start from there, and then you can even build other agreements around it, concerning their continuity, their legal sovereignty concerns about data residency, etc., giving them real autonomy and guarantees in terms of privacy.

In fact, our commitments to Europe are worth reading.

We have made a series of commitments to Europe regarding how we will manage our massive investments there to ensure that the EU and European countries have sovereignty.

We are also building sovereign clouds in France and Germany.

We have something called "Azure Sovereign Services," which actually provides key management services and confidential computing, including confidential computing in GPUs, where we have done great innovative work with NVIDIA.

So I feel very good about being able to build this trust in the American technology stack through technology and policy.

Dwarkesh Patel:

How do you see things evolving with the emergence of continuous learning and network effects at the model level?

Perhaps there is something similar at the level of hyperscale cloud service providers.

Do you expect countries to say, "Look, it’s clear that one or a few models are the best, so we will use them, but we will make some laws that require the weights to be hosted in our country"?

Do you still expect there to be such a push, that it must be a model trained in our country?

Perhaps an analogy is that semiconductors are very important to the economy, and people want to have their own sovereign semiconductors, but TSMC is just better.

And semiconductors are so important to the economy that you would go to Taiwan to buy semiconductors.

You have to do that. Will AI be like this?

Satya Nadella:

Ultimately, what matters is using AI in their economies to create economic value.

This is the diffusion theory; ultimately, what matters is not leading industries, but the ability to leverage leading technologies to create one's own comparative advantages.

So I think this will fundamentally become the core driving force.

That said, they will want this capability to have continuity.

So in a sense, I believe this is why there will always be a counterbalancing force against "Hey, can this model have all the uncontrolled deployments?" This is why open source will always exist.

By definition, there will be multiple models.

This will be one way.

This is another way people demand continuity and avoid concentrated risks, in other words.

So you say, "Hey, I want multiple models, and then I want one that is open source."

I think as long as there are these, every country will feel, "Well, I don't have to worry about deploying the best model and widely diffusing it, because I can always transfer my own data and liquidity to another model, whether it's open source or from another country, and so on."

Concentrated risks and sovereignty—true autonomy—these two things will drive market structure.

Dylan Patel:

On this point, there is no such situation in the semiconductor field. All refrigerators and cars use chips made in Taiwan.

Satya Nadella:

It hasn't been the case until now.

Dylan Patel:

Even so, if Taiwan is cut off, there will be no more cars or refrigerators.

TSMC's Arizona factory cannot replace any real proportion of production.

This sovereignty, if you will, is somewhat of a scam. It's worth having, it's important to have, but it's not true sovereignty. We are a global economy.

Satya Nadella:

I think this is a bit like saying, "Hey, so far we have no idea what resilience means and what we need to do." Any nation-state, including the United States, will take necessary measures to become more self-sufficient in some critical supply chains at this point.

So I, as a multinational company, must see this as a top priority.

If I don't do this, then I am not respecting that country's long-term policy interests. I'm not saying they won't make practical decisions in the short term

Absolutely, globalization cannot just roll back like this. All this capital investment cannot be completed at the speed of...

But at the same time, think about it, if someone shows up in Washington and says, "Hey, we don't plan to build any semiconductor factories," they would be kicked out of the United States.

The same thing would happen in every other country.

Therefore, as a company, we must respect the lessons we have learned, whether it was the pandemic that awakened us or something else.

But in any case, people are saying, "Look, globalization is great. It helps supply chains globalize and become super efficient.

But there is something called resilience, and we want resilience." So this characteristic will be built.

At what speed, I think, is where your point lies.

You can't just snap your fingers and say all TSMC's factories are now in Arizona, with all their capabilities.

They won't be. But is there a plan? There will be a plan. Should we respect that plan? Absolutely.

So I feel that's how the world is.

I want to adapt to the world itself and what it wants to do in the future, rather than saying, "Hey, we have a viewpoint that doesn't respect yours."

Dwarkesh Patel:

Just to make sure I understand, the idea here is that every country wants some form of data residency, privacy, etc.

And Microsoft has a particular advantage here because you have relationships with these countries, and you have expertise in building such sovereign data centers.

Therefore, Microsoft is particularly suited for a world with more sovereign requirements.

Satya Nadella:

I don't want to describe it as we have some unique privilege. I just want to say, I think this is a business requirement we have been working on for decades, and we plan to continue doing so.

So my answer to Dylan's earlier question is, I take it seriously—whether in the U.S., or when the White House and the U.S. government say, "We want you to allocate more wafer capacity to U.S. fabs"—we take it seriously.

Or whether it's data centers and EU borders, we take it seriously.

So for me, respecting the legitimate reasons countries care about sovereignty and building software and physical facilities for that is what we need to do.

Dylan Patel:

As we move towards a bipolar world—U.S., China—the competition is not just about you versus Amazon, or you versus Anthropic, or you versus Google.

There is a whole bunch of competition. How does the U.S. rebuild trust? What do you do to rebuild trust? To say, "Actually, no, U.S. companies will be your primary providers."

How do you view competition with emerging Chinese companies, whether it's ByteDance and Alibaba, or Deepseek and Moonshot?

Dwarkesh Patel:

To add to this question, one concern is that we are talking about how AI is turning into an industrial capital expenditure race, and you have to build quickly across all supply chains.

When you hear this, at least so far, you can only think of China. That is their comparative advantage.

Especially if we are not going to leap to ASI next year, but rather need decades of construction and infrastructure, how do you respond to competition from China? Do they have an advantage in that world?

Satya Nadella:

That’s a great question. In fact,

you just pointed out why trust in American technology might be the most important characteristic. It may not even be the capability of the model.

But rather, “Can I trust your company? Can I trust your country and its institutions to be a long-term supplier?” That might be the key to winning the world.

Dwarkesh Patel:

That’s a great closing remark. Satya, thank you for taking this interview.

Satya Nadella:

Thank you very much.

Dylan Patel:

Thank you.

Satya Nadella:

Awesome. You two are a great team