NVIDIA's Agentic Bet: Inside GTC 2026's Biggest Announcements
At GTC 2026, NVIDIA unveiled BlueField-4 STX, the Nemotron Coalition, and restarted H200 shipments to China. We unpack why Jensen Huang is framing agentic AI as an infrastructure problem — and what it means for the compute supply chain.
Transcript
Jensen Huang has a message for the entire AI industry: stop obsessing over models. The real bottleneck? Plumbing. Your billion-dollar GPUs are sitting there *idle* — waiting on storage, waiting on networking, not waiting on smarter algorithms.
And at GTC 2026 this week, he put three big bets behind that argument. First — a new storage architecture called BlueField-4 STX that NVIDIA claims delivers five times the token throughput. Five X.
Second — something called the Nemotron Coalition. Eight AI labs, including Mira Murati's new company, joining forces to build open frontier models on NVIDIA's cloud.
And third — the geopolitical one — Jensen confirmed NVIDIA is restarting H200 chip manufacturing for China. Purchase orders in hand. Supply chain, quote, "getting fired up."
So three announcements. One thesis. And a question we're going to spend this episode pulling apart — [SFX: RISER]
Is NVIDIA building the next great platform monopoly… by selling pipes?
Selling pipes. That's the question, right? I'm Holden Carter.
And I'm Naomi Zhao. Welcome to the show.
So GTC 2026 just wrapped — March 16th through 19th, San Jose — and if you were expecting Jensen Huang to get on stage and talk about who has the smartest model or the best benchmark score, you were watching the wrong conference.
Yeah, this was not a model event. This was an infrastructure event. NVIDIA positioned the entire four days around agentic systems, AI factories, and full-stack plumbing. The word "infrastructure" was doing more work than any GPU on that stage.
And that framing is deliberate, which is what we're going to unpack today. Here's where we're headed. First, we're going to dig into Jensen's "five-layer cake" — his argument that AI is an infrastructure stack from energy all the way up to applications, and why that reframe matters strategically.
Then we go deep on BlueField-4 STX — the new storage architecture NVIDIA says can deliver five-times throughput gains by rethinking how data moves during inference.
After that, the Nemotron Coalition — eight AI labs, including Mira Murati's new company, teaming up to build open frontier models on NVIDIA's cloud. We'll talk about what that's really about.
And then the geopolitical headline — H200 chips shipping to China again, what the new licensing regime actually requires, and what it means for an already strained supply chain.
We'll close with analysis, practical takeaways, and rapid fire.
So here's the thread running through all of it. The AI conversation is shifting. It's no longer just "who has the best model."
It's "who controls the infrastructure stack." And after this week in San Jose, NVIDIA is making a very loud argument that the answer should be them. Let's get into it.
Okay, but before we get into the announcements themselves, we need to talk about the setup. Because Jensen didn't just walk on stage at GTC and start dropping product names. He laid the groundwork six days earlier.
Yeah, March 10th. He publishes this blog post — "AI Is a 5-Layer Cake" — and it's basically a manifesto.
It really is. And the framework is deceptively simple. He lays out AI as five layers, bottom to top: energy, chips, infrastructure, models, applications. That's it. That's his whole argument. And the kicker is he says we're still early — that this buildout will drive multi-trillion-dollar capital expenditure.
Multi-trillion with a T.
With a T. And the thing that jumps out is where he puts the emphasis. Not on models. Not on applications. The three thickest layers of his cake are the bottom three — energy, chips, and infrastructure. He's telling the industry: stop obsessing over who has the best benchmark score and start thinking about plumbing.
Which is a very convenient argument for the company that sells the plumbing.
Fair! But here's why it's more than just salesmanship. The core reframe Jensen is making is this: agentic AI is not a better chatbot. These are long-running, tool-using agents that push bottlenecks down the stack. Away from the GPU itself and into networking, storage, memory tiering, and power.
Okay, slow down for a second. When you say bottlenecks move down the stack — what actually stalls the GPU? Like, what is the GPU waiting on? [SFX: WOOSH]
Great question. The answer is context. Specifically, something called the KV cache. So when you have a multi-turn agent — it's running for minutes, maybe hours, calling tools, retrieving documents, building up this massive conversational state — all of that context accumulates. And it gets big. Really big. Bigger than the GPU's own high-bandwidth memory can hold.
So the context spills over.
Exactly. It spills from HBM into DRAM, and then into NVMe storage. And here's the problem — the path to get that data back to the GPU runs through the CPU. It's a traditional storage I/O stack. The GPU is literally sitting there, one of the most expensive pieces of silicon on the planet, idling, while data crawls through a legacy pipeline that was never designed for this workload.
So it's not that we need faster GPUs. We need faster everything around the GPU.
That's the whole thesis. This is a utilization fight, not a FLOPS fight. Jensen is saying: I can sell you the most powerful GPU in the world, but if your storage path, your network fabric, your memory hierarchy can't keep up, you're wasting half the capability you paid for.
And the supply chain side of this is just as constrained, right? It's not like you can just snap your fingers and build these systems.
Not even close. Think about everything that has to come together — high-bandwidth memory from SK Hynix and Samsung, advanced packaging from TSMC, substrates, network interface cards, switches, rack-level power and cooling, and then the systems integration to make it all work as what NVIDIA keeps calling an "AI factory." Every single one of those layers has its own supply constraints. And that's before you layer in geopolitical volatility — which, spoiler, we will absolutely get to.
So Jensen publishes this five-layer cake framework on March 10th, basically priming the entire industry to think in infrastructure terms. And then six days later, GTC opens and every announcement maps perfectly onto that frame.
Every single one. BlueField-4 STX targets the storage and networking layers. The Nemotron Coalition targets the models layer to pull demand into the infrastructure layer. And the H200 China restart? That's the chips layer colliding with the energy of global trade policy. It's all one coherent argument.
It's honestly kind of impressive as a piece of strategic communication, whether you buy the substance or not.
And that's what we need to figure out — how much of this is real engineering insight and how much is platform strategy dressed up as thought leadership. So let's get into the announcements themselves.
So the GPU is sitting there, waiting on storage. That's the bottleneck. And here's what NVIDIA built to fix it.
BlueField-4 STX. Unveiled March 16th, opening day of GTC. And Holden, this is not just a new chip. It's a modular reference architecture — a whole blueprint for how storage should work in an agentic AI system.
And it didn't come out of nowhere. NVIDIA actually pre-announced the concept back on January 5th as an "Inference Context Memory Storage Platform." Jensen even said at the time — and I'm quoting here — "AI is revolutionizing the entire computing stack — and now, storage." But the GTC reveal? That was the full STX reference design with real partner commitments behind it.
So let's talk about what this thing actually does, because it's clever. The core problem, right, is that your agentic AI system is running long multi-turn conversations. The KV cache — that's the key-value cache, basically the running memory of the conversation — it grows and grows until it blows past what the GPU's high-bandwidth memory can hold. So the system has to offload that context to DRAM or NVMe storage.
And in a traditional setup, that offload goes through the CPU. The data has to pass through a CPU-centric storage path. And while that's happening —
The GPU is idle. It's just sitting there burning power, waiting.
Exactly. So STX redesigns that entire data path. BlueField-4, which is NVIDIA's DPU — data processing unit — manages the NVMe storage directly. No CPU in the middle. And it uses RDMA over NVIDIA's Spectrum-X Ethernet fabric to move that KV cache data around. You can share context across nodes. You can do fast page ingestion. The storage and the network become, in NVIDIA's framing, a first-class accelerator domain.
Which is a wild thing to say about storage, by the way. But the numbers back it up. NVIDIA and their partners are claiming that STX and these ICMS-class designs can deliver up to roughly five times the token throughput compared to traditional CPU-based storage architectures. Five X. Plus major power efficiency gains.
Five X throughput is the kind of number that makes infrastructure buyers sit up. And NVIDIA is not trying to do this alone. The partner ecosystem they rolled out is massive. On the storage vendor side, you've got DDN, Dell, HPE, IBM, NetApp, VAST Data. Manufacturing partners include AIC, Supermicro, Quanta Cloud Technology. And they're saying partner platforms ship second half of 2026.
So within six months, you could be buying systems built on this architecture. That's fast for an infrastructure reference design.
It is fast. And that's the point — NVIDIA wants to set the standard before anyone else defines what agentic storage looks like. [SFX: WOOSH]
Okay, so that's the plumbing. Now let's talk about the demand side, because that's where the Nemotron Coalition comes in.
Also announced March 16th at GTC. This is eight AI labs and companies coming together to collaboratively build open frontier models. And they're all doing it on NVIDIA's DGX Cloud, with the outputs feeding into the upcoming Nemotron 4 model family.
And the member list is genuinely interesting. You've got Black Forest Labs, Cursor, LangChain, Mistral AI, Perplexity, Reflection AI, Sarvam — and Thinking Machines Lab, which is Mira Murati's company.
Mira Murati, former CTO of OpenAI. That name on the list got a lot of attention.
For good reason. But Holden, let's be clear about the strategic logic here. Because this looks like an open-source initiative, and it is in some ways, but it's also a very deliberate demand-generation lever.
Walk me through that.
It's simple. Open frontier models proliferate. They get good enough that more enterprises, more sovereign governments, more startups can build agentic systems on top of them. That drives sustained inference demand. And sustained inference demand means more AI factory buildouts — which means more chips, more networking gear, more storage, more power infrastructure. All of which flows through NVIDIA's stack.
So you give away the models, or at least make them open, and you sell the infrastructure to run them. It's the classic razor-and-blades play, except the razors are frontier AI models and the blades are entire data centers.
Pretty much. Jensen's five-layer cake in action — seed the top of the stack, harvest at every layer below it.
Alright. So we've got the plumbing play with BlueField-4 STX. We've got the demand play with the Nemotron Coalition. And then, on day two of GTC — March 17th — Jensen dropped the geopolitical bomb.
H200 chips. Shipping to China. Again.
Here's what he said, and I'm quoting from his remarks reported by Axios: "We've been licensed for many customers in China for H200… we have received purchase orders… and we're in the process of restarting our manufacturing." [PAUSE: 2s] [SFX: IMPACT]
Restarting manufacturing. Not "exploring options." Not "in discussions." Purchase orders received, supply chain spinning up. That is a concrete, material development with enormous implications for compute supply globally.
And that's exactly what we need to unpack next — because adding China demand back into an already constrained supply chain? That changes the math for everyone.
"Our supply chain is getting fired up." That's what Jensen said. But fired up into what, exactly? Because when you look at the policy mechanics and the competitive dynamics here, this gets really complicated really fast.
Yeah, let's start with the China piece because we only scratched the surface. The reason NVIDIA can ship H200s to China at all is a specific policy change. January 13th, 2026 — the Bureau of Industry and Security revised its license review framework. H200-class chips are now considered case by case, not blanket-denied. But there are two hard conditions.
And these conditions have teeth.
Real teeth. Number one — independent third-party testing has to be conducted here in the United States before any shipment. Number two — NVIDIA has to demonstrate that these exports will not reduce the capacity available to American customers. So you can sell to China, but not at America's expense, and we're going to verify.
Which means compute is now a regulated resource. Full stop. NVIDIA isn't just managing a supply chain anymore — they're managing a supply chain with a government compliance layer baked into manufacturing decisions, allocation decisions, even forecasting.
And here's where it gets tense for the Western cloud customers. You're Microsoft, you're Google, you're Oracle — you've been fighting for every H200 allocation you can get. The pipeline is already constrained by HBM availability, advanced packaging, substrates. And now Jensen is saying, "By the way, we're also restarting a whole manufacturing line for China." You have to wonder — are those hyperscalers making phone calls right now?
They're absolutely making phone calls. Because even if the BIS rule says U.S. supply can't be impacted, the practical reality is that you're adding significant new demand into a pipeline that was already oversubscribed. That creates allocation volatility. It creates uncertainty in delivery timelines. And for companies planning billion-dollar data center buildouts eighteen months in advance, uncertainty is poison. [SFX: IMPACT]
Okay, so let's zoom out from China and talk about the bigger strategic picture. Because all three of these announcements — BlueField-4 STX, the Nemotron Coalition, the H200 restart — they point in the same direction.
They do. And I want to put a frame on it. "Agentic AI as infrastructure" is not just a tagline Jensen came up with for a keynote. It is a strategy to control more of the stack. Think about what's happening. With STX, NVIDIA is defining the reference architecture for how storage talks to GPUs during inference. Dell, HPE, NetApp, VAST Data — they're all building to NVIDIA's spec. With the Nemotron Coalition, NVIDIA is funding open model development, which sounds great, sounds democratizing — but all of that training and inference runs on DGX Cloud, on NVIDIA's tooling, on the NIM stack.
Right, and this is the tension I keep coming back to. The coalition includes Mistral, Perplexity, Mira Murati's Thinking Machines Lab — these are serious, independent-minded organizations. And yet every one of them is now building on NVIDIA-controlled infrastructure. The models are open. The platform underneath them is not.
So here's where I want us to actually disagree for a second. Because I think there's a real debate here. Is NVIDIA solving a genuine bottleneck, or are they manufacturing lock-in?
I'll take the "genuine bottleneck" side. Look — the KV cache problem is real. GPUs stalling while they wait on CPU-bound storage paths is a real performance killer. If you're an enterprise trying to run agentic systems at scale, you need something like STX. The five-x throughput improvement claim, even if you discount it by half, that's still transformative. NVIDIA is solving a problem that actually exists.
I don't disagree that the problem is real. What I'm skeptical about is the solution being exclusively NVIDIA-defined. When you're the one defining the reference architecture, the DPU, the networking fabric, and the model stack — and your partners are building to your blueprints — that's not just solving a bottleneck. That's setting the terms for an entire industry. Every storage vendor, every OEM, every cloud provider is now orbiting NVIDIA's design choices.
But isn't that what platforms do? Somebody has to define the interface. Somebody has to make the pieces fit together.
Sure, and historically the company that defines the interface captures the most value. That's the playbook. And look, from the enterprise buyer's perspective — you're going to be judged on cost per task, latency, reliability. You need observability, governance, security offload for these agentic systems. That means you're buying the whole stack. Storage tiers, networking, DPUs — it's not optional spend anymore. It's table stakes.
Which is exactly why Jensen framed it as infrastructure in the first place. If AI is a five-layer cake, NVIDIA just told you they want to bake every layer.
And sell you the oven.
Okay, but let's bring this down from the thirty-thousand-foot view. Whether NVIDIA's building lock-in or solving a real problem — what should people actually be doing with this information right now?
Yeah, let's get practical. Number one — if you are an infrastructure buyer, circle the second half of 2026 on your calendar. That's when STX-compatible platforms start shipping from Dell, HPE, NetApp, and the rest of that partner list. The KV cache offload architecture NVIDIA just unveiled? That could become table stakes for running agentic inference at scale. So you need to budget for storage-as-accelerator spend — not just GPUs.
And that's a real mindset shift. For years the conversation has been "how many GPUs can I get my hands on." Now it's "can my storage and networking keep those GPUs fed."
Exactly. Number two — if you're evaluating open models for your enterprise stack, watch what comes out of the Nemotron Coalition closely. Mistral, Perplexity, LangChain, Mira Murati's Thinking Machines Lab — these are serious players. But go in with eyes open. The models are open, the training infrastructure is NVIDIA's DGX Cloud. Understand what dependency you're signing up for.
And number three — supply chain planning just got harder. If you are in the queue for H200s or next-gen NVIDIA silicon, the China restart means there's a new, significant source of demand competing for the same constrained supply of HBM, advanced packaging, and chips. The BIS rules say U.S. customers can't be shorted, but allocation pressure is real.
So talk to your vendors now. Don't wait for Q3 to find out your delivery window slipped.
And honestly, the meta-takeaway here? Start thinking about AI infrastructure the way Jensen wants you to — as a full stack problem. Energy, chips, networking, storage, models, apps. If you're only planning at the GPU layer, you're already behind.
The plumbing matters. It's not glamorous, but it's where the bottlenecks live.
Alright, let's burn through some quick hits from the GTC firehose. BlueField-4 STX is claiming five-x token throughput over traditional CPU-based storage — if that holds up in production, storage vendors are about to have a very interesting year.
Speaking of vendors, the partner list on STX is stacked — DDN, Dell, HPE, IBM, NetApp, VAST Data all building around it, with platforms shipping second half of this year.
The Nemotron Coalition has eight founding members, and the roster reads like an AI startup all-star team — Mistral, Perplexity, Cursor, LangChain, and yes, Mira Murati's Thinking Machines Lab.
Jensen's "five-layer cake" blog dropped March tenth — energy, chips, infrastructure, models, applications — and he's arguing we're still in the early innings of a multi-trillion-dollar buildout.
BIS revised its China export rules back on January thirteenth — case-by-case licensing with mandatory third-party testing on U.S. soil.
And manufacturing partners AIC, Supermicro, and Quanta Cloud Technology are already lined up to build STX-compatible hardware. The ecosystem is moving fast.
That is going to do it for us today. Holden Carter here.
And Naomi Zhao. Thanks so much for spending your morning with us.
If you got something out of this one, share it with someone who's still thinking about AI as just a model race. The infrastructure story is the story now.
We'll have links to everything we referenced — the BIS policy doc, Jensen's five-layer cake blog post, the STX spec sheets — all in the show notes.
We'll be back tomorrow. Until then, have a great one.
