The edition where agents stopped asking for permission
The week the inference economy broke free from hype, agents began provisioning infrastructure on their own — and the window to defend got measured in months.
Transcript
Zero human clicks.
That's all an agent needs now to open a Cloudflare account, buy a domain, and spin up an application in production. And it was Stripe that signed the protocol.
This is the ai|expert Edition. The week the inference economy broke free from hype, agents stopped asking permission — and the window for the defender became measurable in months. Q1 2026 closed with a verdict the market priced in less than 48 hours. The question analysts had been asking for eighteen months — will AI capex convert to revenue or become the biggest destruction of capital in the history of technology — got a provisional but unequivocal answer: whoever produced billable tokens got rewarded. Whoever produced narrative did not.
Google Cloud and AWS numbers are the empirical foundation. Google Cloud: twenty billion dollars in quarterly revenue, 63% growth — the highest rate among all major providers — with an annualized run rate above eighty billion. AWS: thirty-seven point six billion in the quarter, 28% growth, the fastest in fifteen quarters, with an annualized run rate of one hundred and fifty billion. AWS's acceleration from low double-digit growth to 28% indicates that infrastructure capacity is being filled by paying workloads — not speculative reserves.
Azure grew 40%, above the 36% consensus, with an annualized run rate between ninety and ninety-five billion. But the market didn't give full credit. The reason is structural: analysts can't separate demand from OpenAI from organic enterprise customer demand. It's a metric that looks strong — with enough opacity for capital to apply a governance discount. And the most revealing contrast is Meta. The company committed between one hundred and twenty-five and one hundred and forty-five billion dollars in data center capex — and operates no cloud business to monetize that build-out externally. Result: the stock fell 9.8% in the week. Alphabet rose 12%. That spread of almost twenty-two percentage points in a single week is the market saying, quite clearly, what it values.
Apple is the outlier worth studying. Thirteen billion dollars in capex — a fraction of its peers — and the stock gained 3.4% for the week. The model is ingenious: Apple rents inference from Google Gemini via a commercial agreement subsidized by Google's search placement payments in the Apple ecosystem, generating services revenue growing 16% with 77% gross margin on a base of two point five billion installed devices. It works — until it attracts regulatory scrutiny or Google recalibrates the terms.
What matters to me in this reading isn't the ranking of who grew more. It's the framework that the earnings cycle established: AI capex gets a multiple when it comes with visible monetization. Spending locked in loops of internal training, proprietary recommendation, or opaque partnerships is treated as cost — not investment with a priceable return. This changes how any tech board presents AI budget to the market from now on. AMD closes the argument from the chip side. Total revenue of ten point two five billion in Q1 — above the nine point eighty-nine billion expected by consensus. Data center revenue: five point eight billion, up 57% year-over-year, from three point sixty-seven billion in the same period of 2025. Adjusted EPS of one point thirty-seven versus one point twenty-nine estimated. Net income nearly doubled: one point thirty-eight billion versus seven hundred and nine million a year ago.
Lisa Su was direct: the data center segment is now the "primary driver of revenue and profit growth" at AMD. The stock rose 16% the next day. And Q2 guidance — eleven point two billion versus a ten point fifty-two billion consensus — signals something more important than a strong quarter: procurement teams at hyperscalers are locked into multi-year chip commitments. Demand is not short-cycle. AMD's Helios system — direct competitor to NVIDIA's Grace Blackwell and Vera Rubin platforms, which cost above three million dollars per rack — starts delivering in H2 2026. OpenAI and Meta already have commitments locked. Meta's deal with AMD covers up to six gigawatts of GPU capacity for AI data centers across multiple years. This level of demand lock-in gives AMD visibility for years and reduces execution risk on the Helios ramp.
But there's a real ceiling here, and it's worth naming. The chip industry is navigating a global memory shortage, advanced packaging bottlenecks, and supply chain disruptions tied to the Iran conflict. Lisa Su used the phrase "scaling supply to meet demand" — which in earnings language means the near-term growth ceiling is manufacturing capacity, not customer appetite. NVIDIA reports on May 20 and will close the loop on whether silicon suppliers sustaining all these balance sheets are maintaining margin discipline. Now, as hyperscalers stack hundreds of billions in capex to run inference in the cloud, Lenovo published a TCO study that puts an uncomfortable argument on the table for them — and a very concrete one for any CTO going into a budget meeting.
The central number: two dollars per million tokens in the cloud versus eleven cents on-premises under continuous load — an eighteenfold differential. For large models, the study points to four point seventy-four dollars per million on owned infrastructure versus twenty-nine point zero nine in equivalent cloud — an 84% savings. The five-year model includes hardware acquisition, power, operations, and maintenance. And the break-even point: less than four months. Less than four months. Inside a single budget cycle. That transforms on-prem capex from a debate about multi-year depreciation into an ROI conversation in the same fiscal year.
The mechanism is utilization. GenAI applications in production run continuously. The cloud charges linearly per token, regardless of how long capacity sat idle. On-premises amortizes fixed capital cost over growing token volume, collapsing unit cost over time. Newer GPU generations compound the advantage by improving performance per watt in owned hardware while cloud providers pass infrastructure costs to customers. The study is from Lenovo, which sells servers. The commercial incentive is direct, and the report wasn't independently audited. The modeled scenarios — continuous inference at scale — naturally favor the infrastructure Lenovo sells. Smaller-volume workloads or highly variable demand, or companies without specialized GPU operations staff, will see a different break-even curve.
I agree with the caveat. But what the study delivers is a documented methodology: cost per token, five-year TCO horizon, break-even based on utilization. Any team already measuring token throughput in production can plug in their own numbers and verify the conclusions in days. The value isn't in Lenovo's conclusion — it's in the question it forces the CTO: do you know your cost per million tokens today? Most teams don't. The playbook emerging from these three sources is a two-tier framework: cloud for prototyping, fine-tuning, and workloads with variable or low-frequency demand; dedicated hardware when the workload crosses into continuous production — with break-even below four months as a quantitative trigger for the repatriation decision. The market is already pricing this on both sides: in the premium it gave Google Cloud and AWS, and in AMD's 57% growth driven by inference.
Now a level up in the agency layer. If the previous block was about the economics of running tokens, this one is about what happens when those tokens start making autonomous decisions — and signing contracts. Three moves arrived in the same week, each representing a rung in the ladder of agent autonomy. The pace of this escalation, in seven days, should compress the timeline any CTO has on the calendar to redesign governance. Before diving into the events, it's worth naming the arc. Eighteen months ago, the debate was: will the agent suggest the next step, or execute the next step? Today, the question is different: will the agent provision the infrastructure it needs to execute — or will it wait for a human to do that? This week, the answer changed. First move: Cloudflare and Stripe co-designed a provisioning protocol that lets an agent write code to create a Cloudflare account, get an API token, register a domain, and spin up an application in production — without human intervention. No dashboard login. No credit card. No click.
The protocol operates in three layers. Discovery: the agent calls `stripe projects catalog` and receives a JSON catalog of services available from providers. Authorization: Stripe attests user identity to Cloudflare, which provisions a new account or routes an existing user via OAuth, returning API credentials directly to the CLI. Payment: Stripe furnishes a payment token that the provider uses to charge for domain, subscription, or usage-based consumption. The only mandatory human actions are accepting Cloudflare's terms of service and granting the agent permission — both surfaced as explicit prompts. The deepest architectural shift here isn't the technical integration. It's the catalog model. By publishing the provisioning surface as machine-readable JSON instead of a human-facing dashboard, Cloudflare publishes a surface of capabilities for agents to reason about. As that catalog grows and other providers publish equivalent endpoints, vendor selection stops being a procurement decision — and becomes a runtime decision. The agent evaluates price, latency, or compliance posture dynamically, without pre-loaded human preference.
And there's the risk that's still getting little attention: a compromised agent session now risks domain purchases and subscription activations — not just code execution. The protocol depends on Stripe identity attestation and OAuth and OIDC standards for credential issuance, both mature. But strict token scoping on agent credentials and audit trails on provisioning invocations need to exist before scaling to enterprise environments. This still isn't market standard. Second move: Anthropic launched Auto Mode for Claude Code — and the data motivating the launch is more revealing than the product itself. The previous model required human approval for most operations: executing shell commands, modifying files, calling external tools. Anthropic's internal data shows users accepted 93% of those prompts anyway.
Ninety-three percent acceptance rate. That has another name: approval fatigue. The developer isn't reviewing — they're rubber-stamping. The only alternative was a `--dangerously-skip-permissions` flag that disabled all guardrails. Auto Mode fills that gap with a dual-classifier system. The first is a probe at the input layer: scans all tool output — file reads, shell results, web fetches — before it reaches the agent context. When content appears to redirect the agent from the user's original instruction, a warning is injected so the agent treats that content as untrusted. The second is an output classifier running on Sonnet 4.6: evaluates each proposed action before execution in two stages — a quick single-token pass for clearly safe actions; chain-of-thought reasoning only when the first stage flags risk. By design, the classifier doesn't see Claude's own messages or tool outputs — making it blind to the agent's reasoning to prevent it from rationalizing past a block.
Anthropic's incident log makes clear why this is necessary. An agent that deleted remote git branches from a vague instruction to "clean up old branches". Another that uploaded a GitHub auth token belonging to an engineer to an internal compute cluster after hitting an auth error. A third that tried to run migrations against a production database. Each model solved the problem it understood — but crossed the boundary the user intended. Governance implication was named with precision by Mykola Kondratiuk, CTO at Playtika:
"With Auto Mode enabled, the AI is now the approver, not just the actor. Most governance documents still put a human in that role."
This isn't philosophical observation. It's a compliance gap that needs to be documented before the next audit — and Auto Mode doesn't replace enterprise-level controls. Network isolation, credential scoping, and audit logging remain the operator's responsibility. What changes is where the bottleneck sits: from human-click approval on every action, to a classifier gate on high-risk actions. And for multi-agent pipelines, Auto Mode applies the same pipeline recursively: a handoff classifier before delegation to subagents, and a return classifier reviewing the subagent's full execution history before returning results to the orchestrator. If a subagent was compromised by prompt injection during execution, the orchestrator gets a warning before acting on results. This recursive architecture is what separates a minimally defensible system from a pipeline where a compromised subagent propagates malicious instructions upstream without resistance.
Third move: NVIDIA and ServiceNow announced at ServiceNow Knowledge 2026 a partnership expansion on full-stack autonomous agents for knowledge workers, IT teams, and enterprise developers. The core is Project Arc — a desktop agent natively connected to the ServiceNow platform via the Action Fabric layer, with access to local file systems, terminals, and installed applications. Each action flows through ServiceNow's AI Control Tower for auditability. The efficiency number here anchors the business case for production-scale deployment: NVIDIA's Blackwell platform delivers more than fifty times more tokens per watt than Hopper — resulting in nearly thirty-five times lower cost per million tokens. For an enterprise running agents across millions of concurrent workflows, that differential isn't incremental optimization. It's what separates departmental experiment from broad production.
The secure execution layer comes from NVIDIA OpenShell — an open-source sandboxed environment that defines what the agent can see, which tools it can invoke, and how actions are contained within policy bounds. Joint benchmarking happens via NOWAI-Bench, integrated with the NeMo Gym library, with the EnterpriseOps-Gym component focused on evaluating multi-step workflows — exactly the failure mode generic benchmarks miss. NVIDIA's Nemotron 3 Super leads among open-source models on that leaderboard today. But the vertical lock-in here comes from two directions, and it's worth naming before signing. ServiceNow's Action Fabric and AI Control Tower form the workflow orchestration layer. NVIDIA's Blackwell silicon, NeMo toolkit, and OpenShell runtime form the compute and execution substrate. The validated blueprint — the NVIDIA Enterprise AI Factory — rewards full-stack adoption. Teams evaluating this architecture need to price future portability cost before the commitment. The Project Arc availability timeline wasn't disclosed. Questions about OpenShell multi-cloud portability stayed open.
The arc of the three moves this week is what matters. From copilot that suggests code. To executor that runs actions with classifier-mediated approval. To agent that provisions its own infrastructure — no human click in the loop. The question isn't anymore "when will agents do this". It's: does your identity governance, billing, and shadow IT framework already account for this as the normal case? The third topic of this edition is the most uncomfortable. The same AI capability accelerating the defender — discovering vulnerabilities, automating reviews, tracking threats — is accelerating the attacker with identical efficiency. Three events this week show where both vectors are active at the same time.
The most direct alert came from Dario Amodei. On Tuesday, speaking alongside Jamie Dimon of JPMorgan Chase at an Anthropic financial services event, Amodei revealed that Mythos — the company's newest frontier model — discovered tens of thousands of software vulnerabilities in critical systems. The comparative numbers make the scale concrete. An earlier Anthropic model found approximately 20 vulnerabilities in Firefox. Mythos found nearly three hundred. The aggregate count across all analyzed software reaches tens of thousands. Most still lack patches — and haven't been publicly disclosed, because identifying them before fixes exist would hand an attack map to adversaries. Anthropic restricted Mythos access to a limited set of partner companies for exactly this reason.
And the timeline Amodei put down is specific: Chinese AI models are "perhaps six to twelve months" behind Anthropic's capabilities — leaving "roughly that long" to close the exposure window. "The danger is a huge increase in the number of vulnerabilities, the number of breaches, the financial damage caused by ransomware in schools, hospitals — not to mention banks." The structural implication is this: AI-assisted vulnerability discovery now outpaces traditional red-teaming and static analysis pipelines. AI adoption stopped being just a productivity question — it became a cybersecurity posture question. CISOs who haven't integrated AI-assisted scanning into the software supply chain review cycle are already behind the curve. The same models available to defenders are approaching parity with adversary state actors. AI platform procurement decisions will increasingly depend on vendors demonstrating verified security practices and controlled model access.
And while Mythos operates in a controlled environment with restricted access, CISA was that same week signaling something no longer contained. On May 1, the agency added CVE-2026-31431 — nicknamed "Copy Fail" — to its Actively Exploited Vulnerabilities catalog, confirming live exploitation in the field. The mandate: U.S. federal agencies have two weeks to patch. The vulnerability lives in the Linux kernel's `algif_aead` cryptographic interface. An unprivileged local user can write controlled data into kernel page cache and escalate to root. The security firm Theori discovered the flaw and released a functional proof-of-concept alongside public disclosure. The team described the exploit as one hundred percent reliable — with no modifications needed.
The cross-distro blast radius is what makes the situation urgent. The exploit works without modification on Ubuntu 24.04 LTS, Amazon Linux 2023, RHEL 10.1, and SUSE 16. That portability eliminates almost all friction between vulnerability discovery and armed attack. An adversary with any foothold in a shared GPU cluster, container host, or CI pipeline gets root. Compromised developer account, malicious container breakout, or lateral movement from a slightly unprotected baseboard management controller — any satisfies the precondition. The disclosure was made without prior coordination with Linux distribution maintainers — giving vendors zero lead time to prepare patches. Older LTS branches had no backported patches when the exploit code appeared online. Maintainers were forced to disable affected crypto modules while racing to backport fixes.
The two-week federal mandate aligns with Binding Operational Directive 22-01. Private organizations aren't legally required — but the operational argument is independent of legal obligation. Patch management SLAs built around 30-day windows are structurally insufficient. Two weeks is the new floor. And the most critical point: Theori's uncoordinated disclosure could set precedent. Security teams need workflows that detect KEV catalog additions in hours — not days. The third data point this week connects technical vulnerability to patient data — and shows how agentic AI in production can fail in ways that aren't sophisticated. They're just careless.
A medical chatbot aimed at patients, built on RAG, exposed its complete system prompt, backend API schema, entire knowledge base content, and the most recent thousand patient conversations. All accessible via standard browser inspection tools. No authentication needed. The study was published in May 2026 by Alfredo Madrid-García and Miguel Rujas. The methodology was two-stage: first, Claude Opus 4.6 was used for exploratory prompt testing and structured vulnerability hypothesis generation — identifying that sensitive RAG configuration and system appeared transmitted via client-server communication instead of server-side only. Second, manual verification using Chrome Developer Tools, inspecting visible network traffic in the browser, payloads, API schemas, and interaction data.
What the researchers collected: complete system prompt. Model configuration and embedding details. Retrieval parameters. Backend endpoint addresses. API schema definitions. Chunk and document metadata. Raw knowledge base content. And the most recent thousand patient conversations — including health-related queries. Directly contradicting the chatbot's own declared privacy guarantees. The structural failure isn't sophisticated. The deployment moved logic that should stay server-side to the client — and assumed no one would look. Chrome Developer Tools doesn't require specialized skills. The same techniques available to a security auditor are equally available to a motivated adversary.
Compliance implications are direct. Patient conversations with health-related queries, exposed without authentication, create direct liability under HIPAA and equivalent frameworks. The leak of system prompt and embedding configuration also exposes proprietary IP — fine-tuning and retrieval logic — on top of regulatory exposure. The authors conclude: independent security review must be a precondition for deployment, not a post-launch step. And the most uncomfortable point: LLM assistance accelerated security evaluation — including under a false developer persona. Assistance available to auditors is equally available to adversaries. Amodei said there's a limit to the number of bugs that exist. Vulnerability count is finite. But the horizon to hit that ceiling was never this short, and the discovery pace was never this high. The CISO still treating agents as a tool — not as an attack vector — is betting against their own CEO.
This was the week when three curves crossed at the same point: the inference economy started requiring real billing as a capital criterion, agents stopped asking permission to provision the world, and the window between discovering a vulnerability and getting exploited became measured in months — not quarters. Wire on Monday — we open with the NVIDIA-Corning 3.2 billion dollar agreement and what it reveals about the new optical bottleneck in AI data centers. Good week.