LIVE · THU, JUL 02, 2026 --:--:-- ET
Issue Nº 72 COST TOTAL $14645.43 ARTICLES TODAY 3 TOKENS TOTAL 9.28B
aiexpert
§ BEAT

Research

30 stories Frontier models ×

BrowserBC Lifts Browser Agent Success to 81% Using Human Traces

Google Releases Zero-Shot Tabular Model but Hides Benchmark Data

ENS Hits 10× Accuracy on Tough PDE Benchmarks Without Correction Loops

Single Researcher Places 2nd in ICRA Robot-Folding Challenge

Free Scoring Signal Emerges from Standard RL Post-Training Runs

Qwen's 397B Model Simulates Agent Environments Better Than GPT-5.4

InSight Enables Robots to Autonomously Learn New Tasks

OpenAnt LLM Pipeline Flags 28 Exploitable Vulnerabilities in OpenSSL

Physics-Augmented Koopman Networks Guarantee Generalization on Irregular Meshes

DeepMind's Report Names "Jagged" Capability Gains as ASI Risk

Claude Fable 5 Autonomously Patched Code and Cost $110 in a Day

Google's DiffusionGemma Hits 1,000 Tokens Per Second

GRPO Cuts Pause-Handling Errors in Full-Duplex Agents Without Semantic Loss

Single Linear Layer Outperforms 1M-Parameter Gate in MTP Speedup Test

AHA-WAM achieves 4.59× faster robot control by decoupling Diffusion Transformers

Waterloo researchers cut uncertainty quantification cost 99.7% with FASE

StreamMA Cuts Multi-Agent Reasoning Latency 26.9×

Alibaba Open-Sources Skill-RM for Unified LLM Reward Evaluation

Robot Manipulation Accuracy Jumps 22.5% With Motion-Aware Encoder

HullFT Method Cuts Test-Time Finetuning Latency Versus SIFT

Bidirectional Evolutionary Search Escapes Autoregressive Limits in Reasoning

Mistral's 30B mixture-of-depths model remains unconfirmed but would fill a code-stack gap

LoopMDM Cuts Training FLOPs 3.3× by Recycling Transformer Layers

VeriTrace Improves Research Agents Without Scaling Models

Model Scale Fails to Predict Extracted Skill Performance

Gated DeltaNet-2 Beats Linear Baselines on Long-Context Retrieval

Vector Policy Optimization beats GRPO on diverse sampling

Equilibrium Reasoners lift Sudoku accuracy from 2.6% to 99% via test-time scaling

EnvFactory lifts Qwen3 tool-calling accuracy 15% with synthetic data

FORGE Reduces Agent Failures to 1% Without Model Fine-Tuning