AI Dose Daily
#2

NVIDIA's Agentic Bet: Inside GTC 2026's Biggest Announcements

At GTC 2026, NVIDIA unveiled BlueField-4 STX, the Nemotron Coalition, and restarted H200 shipments to China. We unpack why Jensen Huang is framing agentic AI as an infrastructure problem — and what it means for the compute supply chain.

March 19, 2026·22:20·Episode 2

Transcript

Host

Jensen Huang has a message for the entire AI industry: stop obsessing over models. The real bottleneck? Plumbing. Your billion-dollar GPUs are sitting there *idle* — waiting on storage, waiting on networking, not waiting on smarter algorithms.

Co Host

And at GTC 2026 this week, he put three big bets behind that argument. First — a new storage architecture called BlueField-4 STX that NVIDIA claims delivers five times the token throughput. Five X.

Host

Second — something called the Nemotron Coalition. Eight AI labs, including Mira Murati's new company, joining forces to build open frontier models on NVIDIA's cloud.

Co Host

And third — the geopolitical one — Jensen confirmed NVIDIA is restarting H200 chip manufacturing for China. Purchase orders in hand. Supply chain, quote, "getting fired up."

Host

So three announcements. One thesis. And a question we're going to spend this episode pulling apart — [SFX: RISER]

Host

Is NVIDIA building the next great platform monopoly… by selling pipes?

Host

Selling pipes. That's the question, right? I'm Holden Carter.

Co Host

And I'm Naomi Zhao. Welcome to the show.

Host

So GTC 2026 just wrapped — March 16th through 19th, San Jose — and if you were expecting Jensen Huang to get on stage and talk about who has the smartest model or the best benchmark score, you were watching the wrong conference.

Co Host

Yeah, this was not a model event. This was an infrastructure event. NVIDIA positioned the entire four days around agentic systems, AI factories, and full-stack plumbing. The word "infrastructure" was doing more work than any GPU on that stage.

Host

And that framing is deliberate, which is what we're going to unpack today. Here's where we're headed. First, we're going to dig into Jensen's "five-layer cake" — his argument that AI is an infrastructure stack from energy all the way up to applications, and why that reframe matters strategically.

Co Host

Then we go deep on BlueField-4 STX — the new storage architecture NVIDIA says can deliver five-times throughput gains by rethinking how data moves during inference.

Host

After that, the Nemotron Coalition — eight AI labs, including Mira Murati's new company, teaming up to build open frontier models on NVIDIA's cloud. We'll talk about what that's really about.

Co Host

And then the geopolitical headline — H200 chips shipping to China again, what the new licensing regime actually requires, and what it means for an already strained supply chain.

Host

We'll close with analysis, practical takeaways, and rapid fire.

Co Host

So here's the thread running through all of it. The AI conversation is shifting. It's no longer just "who has the best model."

Host

It's "who controls the infrastructure stack." And after this week in San Jose, NVIDIA is making a very loud argument that the answer should be them. Let's get into it.

Host

Okay, but before we get into the announcements themselves, we need to talk about the setup. Because Jensen didn't just walk on stage at GTC and start dropping product names. He laid the groundwork six days earlier.

Co Host

Yeah, March 10th. He publishes this blog post — "AI Is a 5-Layer Cake" — and it's basically a manifesto.

Host

It really is. And the framework is deceptively simple. He lays out AI as five layers, bottom to top: energy, chips, infrastructure, models, applications. That's it. That's his whole argument. And the kicker is he says we're still early — that this buildout will drive multi-trillion-dollar capital expenditure.

Co Host

Multi-trillion with a T.

Host

With a T. And the thing that jumps out is where he puts the emphasis. Not on models. Not on applications. The three thickest layers of his cake are the bottom three — energy, chips, and infrastructure. He's telling the industry: stop obsessing over who has the best benchmark score and start thinking about plumbing.

Co Host

Which is a very convenient argument for the company that sells the plumbing.

Host

Fair! But here's why it's more than just salesmanship. The core reframe Jensen is making is this: agentic AI is not a better chatbot. These are long-running, tool-using agents that push bottlenecks down the stack. Away from the GPU itself and into networking, storage, memory tiering, and power.

Co Host

Okay, slow down for a second. When you say bottlenecks move down the stack — what actually stalls the GPU? Like, what is the GPU waiting on? [SFX: WOOSH]

Host

Great question. The answer is context. Specifically, something called the KV cache. So when you have a multi-turn agent — it's running for minutes, maybe hours, calling tools, retrieving documents, building up this massive conversational state — all of that context accumulates. And it gets big. Really big. Bigger than the GPU's own high-bandwidth memory can hold.

Co Host

So the context spills over.

Host

Exactly. It spills from HBM into DRAM, and then into NVMe storage. And here's the problem — the path to get that data back to the GPU runs through the CPU. It's a traditional storage I/O stack. The GPU is literally sitting there, one of the most expensive pieces of silicon on the planet, idling, while data crawls through a legacy pipeline that was never designed for this workload.

Co Host

So it's not that we need faster GPUs. We need faster everything around the GPU.

Host

That's the whole thesis. This is a utilization fight, not a FLOPS fight. Jensen is saying: I can sell you the most powerful GPU in the world, but if your storage path, your network fabric, your memory hierarchy can't keep up, you're wasting half the capability you paid for.

Co Host

And the supply chain side of this is just as constrained, right? It's not like you can just snap your fingers and build these systems.

Host

Not even close. Think about everything that has to come together — high-bandwidth memory from SK Hynix and Samsung, advanced packaging from TSMC, substrates, network interface cards, switches, rack-level power and cooling, and then the systems integration to make it all work as what NVIDIA keeps calling an "AI factory." Every single one of those layers has its own supply constraints. And that's before you layer in geopolitical volatility — which, spoiler, we will absolutely get to.

Co Host

So Jensen publishes this five-layer cake framework on March 10th, basically priming the entire industry to think in infrastructure terms. And then six days later, GTC opens and every announcement maps perfectly onto that frame.

Host

Every single one. BlueField-4 STX targets the storage and networking layers. The Nemotron Coalition targets the models layer to pull demand into the infrastructure layer. And the H200 China restart? That's the chips layer colliding with the energy of global trade policy. It's all one coherent argument.

Co Host

It's honestly kind of impressive as a piece of strategic communication, whether you buy the substance or not.

Host

And that's what we need to figure out — how much of this is real engineering insight and how much is platform strategy dressed up as thought leadership. So let's get into the announcements themselves.

Host

So the GPU is sitting there, waiting on storage. That's the bottleneck. And here's what NVIDIA built to fix it.

Co Host

BlueField-4 STX. Unveiled March 16th, opening day of GTC. And Holden, this is not just a new chip. It's a modular reference architecture — a whole blueprint for how storage should work in an agentic AI system.

Host

And it didn't come out of nowhere. NVIDIA actually pre-announced the concept back on January 5th as an "Inference Context Memory Storage Platform." Jensen even said at the time — and I'm quoting here — "AI is revolutionizing the entire computing stack — and now, storage." But the GTC reveal? That was the full STX reference design with real partner commitments behind it.

Co Host

So let's talk about what this thing actually does, because it's clever. The core problem, right, is that your agentic AI system is running long multi-turn conversations. The KV cache — that's the key-value cache, basically the running memory of the conversation — it grows and grows until it blows past what the GPU's high-bandwidth memory can hold. So the system has to offload that context to DRAM or NVMe storage.

Host

And in a traditional setup, that offload goes through the CPU. The data has to pass through a CPU-centric storage path. And while that's happening —

Co Host

The GPU is idle. It's just sitting there burning power, waiting.

Host

Exactly. So STX redesigns that entire data path. BlueField-4, which is NVIDIA's DPU — data processing unit — manages the NVMe storage directly. No CPU in the middle. And it uses RDMA over NVIDIA's Spectrum-X Ethernet fabric to move that KV cache data around. You can share context across nodes. You can do fast page ingestion. The storage and the network become, in NVIDIA's framing, a first-class accelerator domain.

Co Host

Which is a wild thing to say about storage, by the way. But the numbers back it up. NVIDIA and their partners are claiming that STX and these ICMS-class designs can deliver up to roughly five times the token throughput compared to traditional CPU-based storage architectures. Five X. Plus major power efficiency gains.

Host

Five X throughput is the kind of number that makes infrastructure buyers sit up. And NVIDIA is not trying to do this alone. The partner ecosystem they rolled out is massive. On the storage vendor side, you've got DDN, Dell, HPE, IBM, NetApp, VAST Data. Manufacturing partners include AIC, Supermicro, Quanta Cloud Technology. And they're saying partner platforms ship second half of 2026.

Co Host

So within six months, you could be buying systems built on this architecture. That's fast for an infrastructure reference design.

Host

It is fast. And that's the point — NVIDIA wants to set the standard before anyone else defines what agentic storage looks like. [SFX: WOOSH]

Co Host

Okay, so that's the plumbing. Now let's talk about the demand side, because that's where the Nemotron Coalition comes in.

Host

Also announced March 16th at GTC. This is eight AI labs and companies coming together to collaboratively build open frontier models. And they're all doing it on NVIDIA's DGX Cloud, with the outputs feeding into the upcoming Nemotron 4 model family.

Co Host

And the member list is genuinely interesting. You've got Black Forest Labs, Cursor, LangChain, Mistral AI, Perplexity, Reflection AI, Sarvam — and Thinking Machines Lab, which is Mira Murati's company.

Host

Mira Murati, former CTO of OpenAI. That name on the list got a lot of attention.

Co Host

For good reason. But Holden, let's be clear about the strategic logic here. Because this looks like an open-source initiative, and it is in some ways, but it's also a very deliberate demand-generation lever.

Host

Walk me through that.

Co Host

It's simple. Open frontier models proliferate. They get good enough that more enterprises, more sovereign governments, more startups can build agentic systems on top of them. That drives sustained inference demand. And sustained inference demand means more AI factory buildouts — which means more chips, more networking gear, more storage, more power infrastructure. All of which flows through NVIDIA's stack.

Host

So you give away the models, or at least make them open, and you sell the infrastructure to run them. It's the classic razor-and-blades play, except the razors are frontier AI models and the blades are entire data centers.

Co Host

Pretty much. Jensen's five-layer cake in action — seed the top of the stack, harvest at every layer below it.

Host

Alright. So we've got the plumbing play with BlueField-4 STX. We've got the demand play with the Nemotron Coalition. And then, on day two of GTC — March 17th — Jensen dropped the geopolitical bomb.

Co Host

H200 chips. Shipping to China. Again.

Host

Here's what he said, and I'm quoting from his remarks reported by Axios: "We've been licensed for many customers in China for H200… we have received purchase orders… and we're in the process of restarting our manufacturing." [PAUSE: 2s] [SFX: IMPACT]

Co Host

Restarting manufacturing. Not "exploring options." Not "in discussions." Purchase orders received, supply chain spinning up. That is a concrete, material development with enormous implications for compute supply globally.

Host

And that's exactly what we need to unpack next — because adding China demand back into an already constrained supply chain? That changes the math for everyone.

Host

"Our supply chain is getting fired up." That's what Jensen said. But fired up into what, exactly? Because when you look at the policy mechanics and the competitive dynamics here, this gets really complicated really fast.

Co Host

Yeah, let's start with the China piece because we only scratched the surface. The reason NVIDIA can ship H200s to China at all is a specific policy change. January 13th, 2026 — the Bureau of Industry and Security revised its license review framework. H200-class chips are now considered case by case, not blanket-denied. But there are two hard conditions.

Host

And these conditions have teeth.

Co Host

Real teeth. Number one — independent third-party testing has to be conducted here in the United States before any shipment. Number two — NVIDIA has to demonstrate that these exports will not reduce the capacity available to American customers. So you can sell to China, but not at America's expense, and we're going to verify.

Host

Which means compute is now a regulated resource. Full stop. NVIDIA isn't just managing a supply chain anymore — they're managing a supply chain with a government compliance layer baked into manufacturing decisions, allocation decisions, even forecasting.

Co Host

And here's where it gets tense for the Western cloud customers. You're Microsoft, you're Google, you're Oracle — you've been fighting for every H200 allocation you can get. The pipeline is already constrained by HBM availability, advanced packaging, substrates. And now Jensen is saying, "By the way, we're also restarting a whole manufacturing line for China." You have to wonder — are those hyperscalers making phone calls right now?

Host

They're absolutely making phone calls. Because even if the BIS rule says U.S. supply can't be impacted, the practical reality is that you're adding significant new demand into a pipeline that was already oversubscribed. That creates allocation volatility. It creates uncertainty in delivery timelines. And for companies planning billion-dollar data center buildouts eighteen months in advance, uncertainty is poison. [SFX: IMPACT]

Co Host

Okay, so let's zoom out from China and talk about the bigger strategic picture. Because all three of these announcements — BlueField-4 STX, the Nemotron Coalition, the H200 restart — they point in the same direction.

Host

They do. And I want to put a frame on it. "Agentic AI as infrastructure" is not just a tagline Jensen came up with for a keynote. It is a strategy to control more of the stack. Think about what's happening. With STX, NVIDIA is defining the reference architecture for how storage talks to GPUs during inference. Dell, HPE, NetApp, VAST Data — they're all building to NVIDIA's spec. With the Nemotron Coalition, NVIDIA is funding open model development, which sounds great, sounds democratizing — but all of that training and inference runs on DGX Cloud, on NVIDIA's tooling, on the NIM stack.

Co Host

Right, and this is the tension I keep coming back to. The coalition includes Mistral, Perplexity, Mira Murati's Thinking Machines Lab — these are serious, independent-minded organizations. And yet every one of them is now building on NVIDIA-controlled infrastructure. The models are open. The platform underneath them is not.

Host

So here's where I want us to actually disagree for a second. Because I think there's a real debate here. Is NVIDIA solving a genuine bottleneck, or are they manufacturing lock-in?

Co Host

I'll take the "genuine bottleneck" side. Look — the KV cache problem is real. GPUs stalling while they wait on CPU-bound storage paths is a real performance killer. If you're an enterprise trying to run agentic systems at scale, you need something like STX. The five-x throughput improvement claim, even if you discount it by half, that's still transformative. NVIDIA is solving a problem that actually exists.

Host

I don't disagree that the problem is real. What I'm skeptical about is the solution being exclusively NVIDIA-defined. When you're the one defining the reference architecture, the DPU, the networking fabric, and the model stack — and your partners are building to your blueprints — that's not just solving a bottleneck. That's setting the terms for an entire industry. Every storage vendor, every OEM, every cloud provider is now orbiting NVIDIA's design choices.

Co Host

But isn't that what platforms do? Somebody has to define the interface. Somebody has to make the pieces fit together.

Host

Sure, and historically the company that defines the interface captures the most value. That's the playbook. And look, from the enterprise buyer's perspective — you're going to be judged on cost per task, latency, reliability. You need observability, governance, security offload for these agentic systems. That means you're buying the whole stack. Storage tiers, networking, DPUs — it's not optional spend anymore. It's table stakes.

Co Host

Which is exactly why Jensen framed it as infrastructure in the first place. If AI is a five-layer cake, NVIDIA just told you they want to bake every layer.

Host

And sell you the oven.

Host

Okay, but let's bring this down from the thirty-thousand-foot view. Whether NVIDIA's building lock-in or solving a real problem — what should people actually be doing with this information right now?

Co Host

Yeah, let's get practical. Number one — if you are an infrastructure buyer, circle the second half of 2026 on your calendar. That's when STX-compatible platforms start shipping from Dell, HPE, NetApp, and the rest of that partner list. The KV cache offload architecture NVIDIA just unveiled? That could become table stakes for running agentic inference at scale. So you need to budget for storage-as-accelerator spend — not just GPUs.

Host

And that's a real mindset shift. For years the conversation has been "how many GPUs can I get my hands on." Now it's "can my storage and networking keep those GPUs fed."

Co Host

Exactly. Number two — if you're evaluating open models for your enterprise stack, watch what comes out of the Nemotron Coalition closely. Mistral, Perplexity, LangChain, Mira Murati's Thinking Machines Lab — these are serious players. But go in with eyes open. The models are open, the training infrastructure is NVIDIA's DGX Cloud. Understand what dependency you're signing up for.

Host

And number three — supply chain planning just got harder. If you are in the queue for H200s or next-gen NVIDIA silicon, the China restart means there's a new, significant source of demand competing for the same constrained supply of HBM, advanced packaging, and chips. The BIS rules say U.S. customers can't be shorted, but allocation pressure is real.

Co Host

So talk to your vendors now. Don't wait for Q3 to find out your delivery window slipped.

Host

And honestly, the meta-takeaway here? Start thinking about AI infrastructure the way Jensen wants you to — as a full stack problem. Energy, chips, networking, storage, models, apps. If you're only planning at the GPU layer, you're already behind.

Co Host

The plumbing matters. It's not glamorous, but it's where the bottlenecks live.

Host

Alright, let's burn through some quick hits from the GTC firehose. BlueField-4 STX is claiming five-x token throughput over traditional CPU-based storage — if that holds up in production, storage vendors are about to have a very interesting year.

Co Host

Speaking of vendors, the partner list on STX is stacked — DDN, Dell, HPE, IBM, NetApp, VAST Data all building around it, with platforms shipping second half of this year.

Host

The Nemotron Coalition has eight founding members, and the roster reads like an AI startup all-star team — Mistral, Perplexity, Cursor, LangChain, and yes, Mira Murati's Thinking Machines Lab.

Co Host

Jensen's "five-layer cake" blog dropped March tenth — energy, chips, infrastructure, models, applications — and he's arguing we're still in the early innings of a multi-trillion-dollar buildout.

Host

BIS revised its China export rules back on January thirteenth — case-by-case licensing with mandatory third-party testing on U.S. soil.

Co Host

And manufacturing partners AIC, Supermicro, and Quanta Cloud Technology are already lined up to build STX-compatible hardware. The ecosystem is moving fast.

Host

That is going to do it for us today. Holden Carter here.

Co Host

And Naomi Zhao. Thanks so much for spending your morning with us.

Host

If you got something out of this one, share it with someone who's still thinking about AI as just a model race. The infrastructure story is the story now.

Co Host

We'll have links to everything we referenced — the BIS policy doc, Jensen's five-layer cake blog post, the STX spec sheets — all in the show notes.

Host

We'll be back tomorrow. Until then, have a great one.

NVIDIA's Agentic Bet: Inside GTC 2026's Biggest Announcements | AI Dose Daily