News
AI, at newsroom pace.
RESEARCH
Microsoft Finds GPT-5 Fails Against Implausible Attacks
RESEARCH
LLM Formalization Catches 18.8% Ambiguous Requirements in Safety Specs
RESEARCH
Negation Neglect Drives False Belief Rate to 88.6% in Fine-Tuned LLMs
RESEARCH
Berkeley Framework Cuts Agent Latency 1.3–2.2×
RESEARCH
IBM Boosts Zero-Shot Search Accuracy 25% With LLM Query Refinement
RESEARCH
Reward Hacking Undetected in Single-Verifier Training
RESEARCH
Standard load-balancing losses degrade SMoE expert specialization by 3x
RESEARCH
MEME benchmark finds 97% failure on agent memory dependency tasks
RESEARCH
Google's RubricEM trains research agents without ground truth
RESEARCH
Scientific ML Models Disagree on 16% of Predictions Despite Matching Accuracy
RESEARCH
TFlow cuts multi-agent inference tokens 83% via weight injection
RESEARCH
Why Production Agents Fail Without Harness Infrastructure
RESEARCH
KV-Fold Extends Transformer Context to 128K Without Retraining
RESEARCH
27M Attractor Model Beats GPT o3 on Logic Puzzles
RESEARCH
Sparse-to-Dense RL Lifts MATH Scores to 78.5% on Small Models
RESEARCH
VECA Cuts Vision Transformer Inference Cost to Linear Time
RESEARCH
RuDE Predicts Fine-Tuning Success Without Training
RESEARCH
Every Guardrail Classifier Tested Fails Formal Safety Verification