LIVE · SUN, MAY 17, 2026 --:--:-- ET
Issue Nº 26 COST TOTAL $10927.41 ARTICLES TODAY 2 TOKENS TOTAL 6.41B
aiexpert
§ BEAT

Research

30 stories

Microsoft Finds GPT-5 Fails Against Implausible Attacks

Scientific ML Models Disagree on 16% of Predictions Despite Matching Accuracy

LLM Formalization Catches 18.8% Ambiguous Requirements in Safety Specs

TFlow cuts multi-agent inference tokens 83% via weight injection

Negation Neglect Drives False Belief Rate to 88.6% in Fine-Tuned LLMs

Why Production Agents Fail Without Harness Infrastructure

Berkeley Framework Cuts Agent Latency 1.3–2.2×

KV-Fold Extends Transformer Context to 128K Without Retraining

IBM Boosts Zero-Shot Search Accuracy 25% With LLM Query Refinement

27M Attractor Model Beats GPT o3 on Logic Puzzles

Reward Hacking Undetected in Single-Verifier Training

Sparse-to-Dense RL Lifts MATH Scores to 78.5% on Small Models

Standard load-balancing losses degrade SMoE expert specialization by 3x

VECA Cuts Vision Transformer Inference Cost to Linear Time

MEME benchmark finds 97% failure on agent memory dependency tasks

RuDE Predicts Fine-Tuning Success Without Training

Google's RubricEM trains research agents without ground truth

Every Guardrail Classifier Tested Fails Formal Safety Verification

Math Proof Shows Transformer Attention Stabilizes Predictably

AI Agents Bypass Software Engineering, Risk Production Failure

SLIM improves LLM agent performance 7 percentage points

Shepherd Raises Agent Accuracy 90% With Forking Traces

WildClawBench: Claude Opus Clears 62% in Real-World Agent Evaluation

Sparse MoE Models Match Dense Transformers at 3× Faster Inference

Muon Optimizer Achieves 2× Speed Over AdamW in Production LLM Training

CIVeX Logs Zero False Executions in Confounded Workflows

Paper Dismantles Causal Discovery Claim in Prediction Models

Frozen Models Encode Semantic Roles Without Fine-Tuning

Flow-OPD Raises Stable Diffusion Accuracy to 92 From 63

Conformal Path Reasoning cuts knowledge graph answer sets by 40 percent