LIVE · THU, JUL 02, 2026 --:--:-- ET
Issue Nº 72 COST TOTAL $14648.64 ARTICLES TODAY 6 TOKENS TOTAL 9.28B
aiexpert
Running the wire
Breaking Anthropic launches Claude Science, AI workbench integrating 60+ scientific databases for drug discovery Market OpenAI proposes 5% U.S. government stake worth ~$43B to ease Washington pressure Funding Ramp raises $750M Series F at $44B valuation, targeting token spend management and AI Chips NVIDIA Opens AI Factory Compute to Capital Partners Via DSX Revenue-Share Model Breaking Swedish court awards Klarna PriceRunner $1.97B in antitrust damages from Google; largest Swedish competition judgment Breaking Cloudflare opens Monetization Gateway for x402 stablecoin micropayments; agents pay per request without signup Breaking Hugging Face + Cerebras unlock real-time voice AI for robots; Gemma 4 at 1,800 TPS enables low-latency speech-to-speech on 7.5K+ Reachy Mini units Funding Wayve launches $85M employee tender on LSE Pisces platform, first major test of UK private markets system Funding Ant Group leads $73.58M funding round in humanoid robot startup Zeroth; 12th robotics bet in 18 months Market Samsung, SK Hynix shares slide 7%+ on Nasdaq opening jitters as chipmakers bear brunt of tech selloff Breaking Google launches Gemini Omni Flash video model at $0.10/sec and Nano Banana 2 Lite image model into GA Chips Tesla hires Gary Jiang, 17-year Intel veteran, as Director of Terafab chip project Market Meta launches cloud business to sell excess AI compute capacity; stock +8% Market NVIDIA projects $1 trillion AI infrastructure demand through 2027; doubles prior forecast Chips Samsung HBM4 surpasses $1B in sales within 4 months; projects $10B full-year run rate Funding Oxmiq Labs raises $35M Series A for licensable GPU IP, eyes Arm-like architecture Research ChatGPT crosses 1 billion monthly active users, fastest consumer app milestone in history Chips NVIDIA and TSMC mark first US-made Blackwell wafer in Phoenix, plan $500B infrastructure spend over 4 years Funding Oxmiq raises $35M Series A for RISC-V GPU IP, expands data center architecture focus Breaking Klarna's PriceRunner wins $1.97B antitrust verdict against Google in Swedish court Breaking Anthropic launches Claude Science, AI workbench integrating 60+ scientific databases for drug discovery Market OpenAI proposes 5% U.S. government stake worth ~$43B to ease Washington pressure Funding Ramp raises $750M Series F at $44B valuation, targeting token spend management and AI Chips NVIDIA Opens AI Factory Compute to Capital Partners Via DSX Revenue-Share Model Breaking Swedish court awards Klarna PriceRunner $1.97B in antitrust damages from Google; largest Swedish competition judgment Breaking Cloudflare opens Monetization Gateway for x402 stablecoin micropayments; agents pay per request without signup Breaking Hugging Face + Cerebras unlock real-time voice AI for robots; Gemma 4 at 1,800 TPS enables low-latency speech-to-speech on 7.5K+ Reachy Mini units Funding Wayve launches $85M employee tender on LSE Pisces platform, first major test of UK private markets system Funding Ant Group leads $73.58M funding round in humanoid robot startup Zeroth; 12th robotics bet in 18 months Market Samsung, SK Hynix shares slide 7%+ on Nasdaq opening jitters as chipmakers bear brunt of tech selloff Breaking Google launches Gemini Omni Flash video model at $0.10/sec and Nano Banana 2 Lite image model into GA Chips Tesla hires Gary Jiang, 17-year Intel veteran, as Director of Terafab chip project Market Meta launches cloud business to sell excess AI compute capacity; stock +8% Market NVIDIA projects $1 trillion AI infrastructure demand through 2027; doubles prior forecast Chips Samsung HBM4 surpasses $1B in sales within 4 months; projects $10B full-year run rate Funding Oxmiq Labs raises $35M Series A for licensable GPU IP, eyes Arm-like architecture Research ChatGPT crosses 1 billion monthly active users, fastest consumer app milestone in history Chips NVIDIA and TSMC mark first US-made Blackwell wafer in Phoenix, plan $500B infrastructure spend over 4 years Funding Oxmiq raises $35M Series A for RISC-V GPU IP, expands data center architecture focus Breaking Klarna's PriceRunner wins $1.97B antitrust verdict against Google in Swedish court
Research

DeepSeek V4 DSpark speculative decoding cuts inference latency 85%, hits Together AI

DeepSeek released DSpark, a speculative decoding framework for V4-Pro and V4-Flash, on June 27, 2026, claiming up to 85% reduction in inference latency without requiring new hardware or model retraining. Speculative decoding generates low-cost draft tokens using a smaller model, then verifies them against the full model, trading higher prefill cost for reduced decode tokens and lower overall latency. DeepSeek claims the technique works across both its hosted API and self-hosted open weights, though independent benchmarks had not been published as of June 28. The speedup figures derive from DeepSeek's own benchmarks on DeepSeek infrastructure against its own prior baseline (MTP-1), so claims merit third-party verification before production deployment planning.

Together AI launched DeepSeek V4 Pro on its Serverless Inference platform on June 27-28, 2026, with cached input pricing for cost-effective long-context reasoning. V4 Pro is a 1.6T MoE model (49B activated) supporting 512K context on Together (expandable to 1M on dedicated), offering three reasoning modes (Non-Think, Think High, Think Max) and 90.1% GPQA-Diamond + 95.2% HMMT-2026 math performance. The availability reflects a structural shift in open-source inference economics: models like V4-Pro now rival or exceed closed-source alternatives on agentic and coding tasks, with cost-per-token competitive with smaller proprietary offerings once serving costs are optimized.

For teams evaluating open-source reasoning models for production agents and long-document codebases, V4-Pro availability on Together (plus self-hosting optionality) is a material change in the build-vs-buy calculation. The combination of hybrid attention architecture (reducing KV cache 90% vs V3.2 at 1M context), aggressive quantization (FP4+FP8 mixed), and DSpark speculative decoding suggests inference cost per token for V4 could undercut comparable closed-source workloads in 2027. Watch third-party latency benchmarks; if independent confirmation validates the 85% speedup claim on production inference patterns, it reshapes the ROI on both custom silicon (Jalapeño, B200) and inference infrastructure purchasing decisions.

Sources