LIVE · THU, JUL 02, 2026 --:--:-- ET
Issue Nº 72 COST TOTAL $14648.64 ARTICLES TODAY 6 TOKENS TOTAL 9.28B
aiexpert
Running the wire
Breaking Anthropic launches Claude Science, AI workbench integrating 60+ scientific databases for drug discovery Market OpenAI proposes 5% U.S. government stake worth ~$43B to ease Washington pressure Funding Ramp raises $750M Series F at $44B valuation, targeting token spend management and AI Chips NVIDIA Opens AI Factory Compute to Capital Partners Via DSX Revenue-Share Model Breaking Swedish court awards Klarna PriceRunner $1.97B in antitrust damages from Google; largest Swedish competition judgment Breaking Cloudflare opens Monetization Gateway for x402 stablecoin micropayments; agents pay per request without signup Breaking Hugging Face + Cerebras unlock real-time voice AI for robots; Gemma 4 at 1,800 TPS enables low-latency speech-to-speech on 7.5K+ Reachy Mini units Funding Wayve launches $85M employee tender on LSE Pisces platform, first major test of UK private markets system Funding Ant Group leads $73.58M funding round in humanoid robot startup Zeroth; 12th robotics bet in 18 months Market Samsung, SK Hynix shares slide 7%+ on Nasdaq opening jitters as chipmakers bear brunt of tech selloff Breaking Google launches Gemini Omni Flash video model at $0.10/sec and Nano Banana 2 Lite image model into GA Chips Tesla hires Gary Jiang, 17-year Intel veteran, as Director of Terafab chip project Market Meta launches cloud business to sell excess AI compute capacity; stock +8% Market NVIDIA projects $1 trillion AI infrastructure demand through 2027; doubles prior forecast Chips Samsung HBM4 surpasses $1B in sales within 4 months; projects $10B full-year run rate Funding Oxmiq Labs raises $35M Series A for licensable GPU IP, eyes Arm-like architecture Research ChatGPT crosses 1 billion monthly active users, fastest consumer app milestone in history Chips NVIDIA and TSMC mark first US-made Blackwell wafer in Phoenix, plan $500B infrastructure spend over 4 years Funding Oxmiq raises $35M Series A for RISC-V GPU IP, expands data center architecture focus Breaking Klarna's PriceRunner wins $1.97B antitrust verdict against Google in Swedish court Breaking Anthropic launches Claude Science, AI workbench integrating 60+ scientific databases for drug discovery Market OpenAI proposes 5% U.S. government stake worth ~$43B to ease Washington pressure Funding Ramp raises $750M Series F at $44B valuation, targeting token spend management and AI Chips NVIDIA Opens AI Factory Compute to Capital Partners Via DSX Revenue-Share Model Breaking Swedish court awards Klarna PriceRunner $1.97B in antitrust damages from Google; largest Swedish competition judgment Breaking Cloudflare opens Monetization Gateway for x402 stablecoin micropayments; agents pay per request without signup Breaking Hugging Face + Cerebras unlock real-time voice AI for robots; Gemma 4 at 1,800 TPS enables low-latency speech-to-speech on 7.5K+ Reachy Mini units Funding Wayve launches $85M employee tender on LSE Pisces platform, first major test of UK private markets system Funding Ant Group leads $73.58M funding round in humanoid robot startup Zeroth; 12th robotics bet in 18 months Market Samsung, SK Hynix shares slide 7%+ on Nasdaq opening jitters as chipmakers bear brunt of tech selloff Breaking Google launches Gemini Omni Flash video model at $0.10/sec and Nano Banana 2 Lite image model into GA Chips Tesla hires Gary Jiang, 17-year Intel veteran, as Director of Terafab chip project Market Meta launches cloud business to sell excess AI compute capacity; stock +8% Market NVIDIA projects $1 trillion AI infrastructure demand through 2027; doubles prior forecast Chips Samsung HBM4 surpasses $1B in sales within 4 months; projects $10B full-year run rate Funding Oxmiq Labs raises $35M Series A for licensable GPU IP, eyes Arm-like architecture Research ChatGPT crosses 1 billion monthly active users, fastest consumer app milestone in history Chips NVIDIA and TSMC mark first US-made Blackwell wafer in Phoenix, plan $500B infrastructure spend over 4 years Funding Oxmiq raises $35M Series A for RISC-V GPU IP, expands data center architecture focus Breaking Klarna's PriceRunner wins $1.97B antitrust verdict against Google in Swedish court
Market

NVIDIA Inference Stack Reduces Token Costs by Up to 5x on Blackwell in One Month

NVIDIA's full-stack inference software on the Blackwell GPU platform has cut token costs by up to 5x for the DeepSeek V4 model within a single month, according to benchmark data released June 30. The gains come from layered optimizations across production serving (disaggregated inference, autoscaling), runtime acceleration (kernel fusion, multi-token prediction), and hardware exposure (NVLink bandwidth, NVFP4 precision). Combined, these optimizations yield up to 20x throughput per GPU—but realizing that gain requires coordination across all layers of the stack.

Real-world adoption is already underway: Baseten deployed DeepSeek V4 Pro on Blackwell with 50% higher token throughput; Deep Infra and Together AI are serving frontier open models at scale; Cognition uses NVIDIA's Dynamo framework to manage inference GPUs for reinforcement-learning workloads without building custom infrastructure. NVIDIA's ecosystem leverage—PyTorch natively supports Tensor Cores and NVFP4; open projects like vLLM and SGLang integrate CUDA optimizations at release—means new research breakthroughs (DFlash speculative decode, FastVideo) translate to production performance in weeks, not months.

For infrastructure architects, this signals a maturation of the inference commodity: raw tokens-per-dollar are no longer competitive moats; the game is now vertical integration and software-hardware co-design. Teams running large inference fleets can no longer justify generic GPU utilization targets—they need to instrument full-stack cost per token and measure ROI on software stack updates. Expect rapid deprecation of older Hopper deployments as Blackwell benchmarks spread; renewal cycles are compressing.

Sources