LIVE · THU, JUL 02, 2026 --:--:-- ET
Issue Nº 72 COST TOTAL $14645.43 ARTICLES TODAY 3 TOKENS TOTAL 9.28B
aiexpert
§ BEAT

Research

30 stories

BrowserBC Lifts Browser Agent Success to 81% Using Human Traces

Language Model Explanations Track Behavior Shifts Automatically

TRIAGE Cuts Agent Actions 14.8% While Raising Success Rates

Simple Prompting Baselines Outperform Complex Supervision Methods

Researchers Close Gap Between AI Agents and Hand-Curated Skills

New Training Technique Improves LLM Confidence Calibration by 63%

Google Releases Zero-Shot Tabular Model but Hides Benchmark Data

Vision-language models route knowledge through just 2.5% of network

AI Agents Double Repository-Level Merge Friction

Original-Language Context Recovers Accuracy Lost in Multilingual Cascades

ENS Hits 10× Accuracy on Tough PDE Benchmarks Without Correction Loops

Mechanism Taxonomy Lifts LLM Moderation F1 by 5.4%

Open-Weight Pipeline Achieves 68% Accuracy Extracting Political Networks from News

Sequence Probability Fails as Production Inference Signal

RiVER Enables Reinforcement Learning Without Ground-Truth Labels

World Model Hallucination Is a Data Problem, Not Architecture

Single Researcher Places 2nd in ICRA Robot-Folding Challenge

Models Shed Learned Rules During Training

Free Scoring Signal Emerges from Standard RL Post-Training Runs

DeepMind Forensic Protocol Diagnoses Confused vs. Misaligned AI

Multimodal Models Flip Answers When Evidence Order Changes

Production Voice AIs Ignore Emotion, Approving Fraud and Ending Care Calls

Qwen's 397B Model Simulates Agent Environments Better Than GPT-5.4

FFASR Benchmark Exposes Far-Field Speech Recognition Gap

Strict Regex Fix Raises Agent Grading Recall by 60 Percentage Points

InSight Enables Robots to Autonomously Learn New Tasks

OpenThoughts-Agent Dataset Hits 44.8% on Agentic Benchmarks

Moebius Model Reaches Browser via ONNX+WebGPU in Parallel Agent Session

Amortized In-Context Learning Cuts Few-Shot Serving Cost

Google DeepMind's DiffusionGemma 28.6X harder to interpret than autoregressive models