WIRE Ep. 3 · April 25, 2026 · 10:37

The week the AI stack got repriced

A IA frontier ficou mais cara enquanto open weights fecharam a diferença — e as empresas agora têm uma escolha real sobre o stack.

Hosts: Host EN

Transcript

HOST

Anthropic let 69 of its own employees negotiate on an internal marketplace — classified-ad Craigslist style — where every offer, every counteroffer, and every deal was closed by Claude agents, without human intervention. One hundred and eighty-six transactions, more than four thousand dollars in total value. The takeaway: whoever had the weaker model walked away losing real money without knowing why. That invisible asymmetry is the thread connecting this week. Today: GPT-5.5 doubles the price and NVIDIA puts ten thousand employees to work on it; Google commits forty billion dollars to Anthropic while DeepSeek and Alibaba deliver open weights that rival closed infrastructure; and Cohere and Aleph Alpha merge into a twenty-billion sovereign player.

HOST

The experiment has a name: Project Deal, conducted in December 2025. Anthropic recruited 69 employees, gave each one a hundred-dollar budget, and interviewed every participant with Claude to capture what they wanted to sell, their reserve prices, buying preferences, and negotiating style. That information became customized system prompts. The agents went into Slack channels and negotiated on their own — listed items, made offers, counteroffered, closed deals, with no signal for the human on the other side to intervene. At the end of the week, participants met in person to exchange the physical goods their agents had bargained for: from a snowboard to a bag of nineteen ping-pong balls.

HOST

The covert sub-study embedded inside the experiment is the part that matters to the industry. Anthropic ran four parallel instances of the same marketplace. In configurations with Claude Opus 4.5 — the frontier model at the time — agents obtained objectively better results for their principals. In configurations with Claude Haiku 4.5, the smallest in the family, results were worse. And the disadvantaged group did not realize it was losing. That last point is what turns a one-week internal experiment into a long-term regulatory question: if the disadvantage is invisible to the human principal, there is no market signal pushing organizations toward stronger models. The side with the weaker agent has no way to complain. The side with the stronger agent has no incentive to level the field voluntarily. In sectors with fiduciary obligations — financial services, government contracting — this stops being an API cost question and becomes a legal liability question.

HOST

In the same week Anthropic published those results, OpenAI launched GPT-5.5. The API price, when it hits the market: five dollars per million input tokens and thirty per million output — exactly double GPT-5.4, which costs two dollars and fifty cents on input and fifteen on output. The Pro version climbs to thirty dollars on input and one hundred and eighty on output. The model is available today in Codex and rolling out gradually to paid ChatGPT subscribers; the formal API arrives "very soon," according to OpenAI. The rationale for the delay: "API deployments require different safeguards and we are working with partners and customers on security requirements."

HOST

There is an access window before full pricing kicks in, and it is officially sanctioned. OpenAI's head of developer relations, Romain Huet, stated in March: "We want people to use Codex and the ChatGPT subscription wherever they want — in the app, the terminal, JetBrains, Xcode, Claude Code." The Codex CLI is open source. The backend endpoint is public. Teams with a ChatGPT Pro or Team subscription can access GPT-5.5 today via that route and run production evaluations before committing to per-token billing. Researcher Simon Willison published a plugin that automates authentication by reading tokens stored by the Codex CLI. When the formal API opens, that window closes.

HOST

What justifies the doubled price? Wharton professor Ethan Mollick had early access and published the results. He gave the same task to every available model — from o3 to the current best open-weight model, Kimi K2.6 — and to GPT-5.5 Pro: build a procedurally generated 3D simulation showing the evolution of a port city from 3000 BCE to 3000 CE. GPT-5.5 Pro completed the task in twenty minutes. GPT-5.4 Pro took thirty-three. Competing models produced static building replacements over time — not city evolution. Only GPT-5.5 Pro modeled systemic emergence.

HOST

In the second test, Mollick handed Codex a decade's worth of raw crowdfunding research files — in STATA, CSV, XLS, and Word — that had never been published. Four prompts later, Codex delivered a complete academic paper with a literature review, a novel hypothesis, and sophisticated statistical analysis. The citations were real. The statistics were real. Mollick assessed the output as equivalent to a strong second-year PhD project — "I would be very happy if this paper were the result of a second-year doctoral project." The model has gaps — Mollick calls the frontier "uneven" — but the capability curve is verifiable.

HOST

NVIDIA solves the cost equation on the infrastructure side. The company deployed GPT-5.5 via Codex to all of its more than ten thousand employees, running on GB200 NVL72 rack-scale systems — hardware that delivers thirty-five times lower cost per million tokens and fifty times more tokens per second per megawatt compared to previous-generation systems. NVIDIA IT provisioned a dedicated virtual machine per employee, with a zero data-retention policy and read-only production access via command line. Engineers report debug cycles that took days closing in hours; weeks-long experiments running overnight on complex multi-file codebases. The deployment covers engineering, product, legal, marketing, finance, sales, HR, operations, and developer programs — possibly the largest single frontier agent rollout at a single company on record.

HOST

The capital sustaining that cycle arrived in volume the same week. Google will invest up to forty billion dollars in Anthropic — ten billion immediately at a valuation of three hundred and fifty billion, with up to thirty billion additional tied to performance milestones. The package includes a five-gigawatt compute capacity commitment on Google Cloud over five years, stacked on a prior Broadcom partnership that a securities filing placed at three point five gigawatts of TPUs starting in 2027. Amazon added another five billion the same week — part of a larger deal under which Anthropic is to commit up to one hundred billion for approximately five gigawatts of capacity over time. Anthropic also closed a separate datacenter capacity deal with CoreWeave. The company now holds multi-gigawatt commitments from two of the three hyperscalers simultaneously.

HOST

The structure that emerges has no direct precedent in the sector. Google competes with Anthropic at the model layer via Gemini. It supplies the TPUs that underpin Claude's inference. And it now holds the largest single-investor position in the company — with visibility into the technical roadmap and pricing leverage over the competitor's cost structure. For AI architects assessing vendor dependency risk, this is no longer just a cloud-spend question. It is a governance question.

HOST

The counterweight to the closed market arrived in open weights. DeepSeek launched V4-Pro — one point six trillion total parameters, forty-nine billion active, mixture-of-experts architecture — and V4-Flash, with two hundred and eighty-four billion total and thirteen billion active. Both open source with API available today. V4-Pro claims state of the art among open models in Math, STEM, and coding, with asserted parity against the best closed systems. In world knowledge, V4-Pro trails only Gemini-3.1-Pro among all current models. One-million-token context is now the standard across all official DeepSeek services — what most proprietary competitors charge as a premium tier. The deepseek-chat and deepseek-reasoner models are being deprecated, with a sunset date of July 24, 2026. Benchmark results are self-reported — independent verification should appear in the coming days — but the weights are open for the community to confirm.

HOST

Alibaba moved in the same direction, with efficiency that rewrites the infrastructure equation. The Qwen3.6-27B is a dense twenty-seven-billion-parameter model that scores 77.2% on SWE-bench Verified — surpassing its predecessor Qwen3.5-397B-A17B, which scored 76.2% but weighed eight hundred and seven gigabytes. The new model weighs fifty-five point six gigabytes. Under Q4_K_M quantization, it fits in sixteen point eight gigabytes — a single consumer GPU. Researcher Simon Willison measured twenty-five point fifty-seven tokens per second running locally with llama.cpp. The model uses a hybrid Gated DeltaNet architecture with a native context window of 262,000 tokens, extensible to one million. Apache 2.0 license, no usage restrictions. A 14.5× reduction in file size between two consecutive open-weight coding flagships, with a win on the leading agentic coding benchmark. Teams that were evaluating multi-node infrastructure for coding agents should run Qwen3.6-27B before renewing contracts.

HOST

The week closes with a structural move in the European market. Cohere, from Canada, and Aleph Alpha, from Germany, announced a merger into an enterprise AI company valued at twenty billion dollars. The six-hundred-million-dollar Series E round is anchored by the Schwarz Group — Europe's largest retailer, operator of Lidl and Kaufland across thirty-two countries and already an existing Aleph Alpha investor. The deal has not yet closed and is subject to regulatory review. The thesis is straightforward: a handful of American labs — OpenAI, Anthropic, Google DeepMind, Meta — dominate commercial AI. The merger aims to give enterprises and governments an alternative with built-in data sovereignty and EU AI Act compliance from the ground up, not retroactive. Aleph Alpha already operates a government assistant with eighty thousand users in the German public sector. An automotive engineering requirements document with forty percent faster processing. A corporate search tool that cut research time by ninety percent. When Europe's largest retailer writes a six-hundred-million-dollar check to fund a sovereign AI alternative, that is an operational bet — not a portfolio position.

HOST

The week's picture: the frontier got more expensive. GPT-5.5 doubled its API price. Anthropic's Project Deal empirically demonstrated that the gap between model tiers produces unequal results, invisible to the end user. At the same time, pressure in the opposite direction has never been greater: DeepSeek V4-Pro and Qwen3.6-27B deliver benchmark parity in open weights at a fraction of the footprint that existed three months ago. For CTOs with build-versus-buy decisions on the table this week: capability gaps are real, but open-weight options that did not exist in Q1 exist now. This is the week to review your organization's model tier policy — and the vendor dependency risk you are accepting without realizing it.

HOST

On Friday, on The Edition, John and Maria go deeper on what the GPT-5.5 price curve and DeepSeek V4 mean for the 2026 model budget — plus the Mila RL paper that quietly closed a year-long debate on training. In the meantime, the full article on Anthropic's Project Deal is on the site. Worth reading to the last paragraph.

Transcript

Get the signal before the noise.