§ BEAT
Research
BrowserBC Lifts Browser Agent Success to 81% Using Human Traces
Google Releases Zero-Shot Tabular Model but Hides Benchmark Data
ENS Hits 10× Accuracy on Tough PDE Benchmarks Without Correction Loops
Single Researcher Places 2nd in ICRA Robot-Folding Challenge
Free Scoring Signal Emerges from Standard RL Post-Training Runs
Qwen's 397B Model Simulates Agent Environments Better Than GPT-5.4
InSight Enables Robots to Autonomously Learn New Tasks
OpenAnt LLM Pipeline Flags 28 Exploitable Vulnerabilities in OpenSSL
Physics-Augmented Koopman Networks Guarantee Generalization on Irregular Meshes
DeepMind's Report Names "Jagged" Capability Gains as ASI Risk
Claude Fable 5 Autonomously Patched Code and Cost $110 in a Day
Google's DiffusionGemma Hits 1,000 Tokens Per Second
GRPO Cuts Pause-Handling Errors in Full-Duplex Agents Without Semantic Loss
Single Linear Layer Outperforms 1M-Parameter Gate in MTP Speedup Test
AHA-WAM achieves 4.59× faster robot control by decoupling Diffusion Transformers
Waterloo researchers cut uncertainty quantification cost 99.7% with FASE
StreamMA Cuts Multi-Agent Reasoning Latency 26.9×
Alibaba Open-Sources Skill-RM for Unified LLM Reward Evaluation
Robot Manipulation Accuracy Jumps 22.5% With Motion-Aware Encoder
HullFT Method Cuts Test-Time Finetuning Latency Versus SIFT
Bidirectional Evolutionary Search Escapes Autoregressive Limits in Reasoning
Mistral's 30B mixture-of-depths model remains unconfirmed but would fill a code-stack gap
LoopMDM Cuts Training FLOPs 3.3× by Recycling Transformer Layers
VeriTrace Improves Research Agents Without Scaling Models
Model Scale Fails to Predict Extracted Skill Performance
Gated DeltaNet-2 Beats Linear Baselines on Long-Context Retrieval
Vector Policy Optimization beats GRPO on diverse sampling
Equilibrium Reasoners lift Sudoku accuracy from 2.6% to 99% via test-time scaling
EnvFactory lifts Qwen3 tool-calling accuracy 15% with synthetic data