The Inference Shift
Market
C-suite read on where AI economic value will land — and therefore where multi-year compute and platform commits should be placed.
Trend
Thompson splits "answer inference" (human-in-the-loop, latency-sensitive) from "agentic inference" (no human, runs for minutes-to-hours per task) and argues the agentic side will dwarf the answer side by total revenue. The corollary: companies that own the model-plus-harness stack — explicitly Anthropic and OpenAI — are positioned to be materially more profitable than the prevailing API-margin narrative implies, and the heterogeneous compute landscape (Cerebras, Groq, TPU/Trainium) gets re-rated as agent workloads dominate.
Tech Highlight
The actionable CTO primitive is a portfolio split inside the AI budget: answer-inference SLAs (interactive copilots) and agentic-inference SLAs (background workers) should be procured against different price curves, different compute substrates, and different vendor lock-in tolerances. Treating them as one line item under-prices the agentic budget and overprices the answer budget.
6-Month Outlook
Watch for the first hyperscaler or frontier-lab to publish an explicit "agentic inference" SKU separate from chat APIs, and for at least one custom-silicon vendor to disclose a multi-billion-dollar agentic backlog. Confirming signal: an enterprise IT shop publicly disclosing distinct unit-economics for interactive vs. background AI workloads.