§ BEAT
Research
Language Model Explanations Track Behavior Shifts Automatically
Vision-language models route knowledge through just 2.5% of network
Models Shed Learned Rules During Training
Multimodal Models Flip Answers When Evidence Order Changes
Google DeepMind's DiffusionGemma 28.6X harder to interpret than autoregressive models
MIT Extracts Attention Logic Into Swappable Python Code
Sparse Attention Heads Redirect Vision-Language Models With 83% Accuracy
Label-Free Test Catches LLM Reasoning Failures Better Than Self-Consistency
New Tool Finds 1,060 Hidden Training Dependencies Across Major LLMs
Kamai's Phase Diagram Predicts Multimodal Failure Before GPU Commit
Real EHR Benchmark Exposes Limits of LLMs in Clinical Action
Echo-Memory Shows World Models Fail the Revisit Test
64 Percent of Audio-Text Conflicts in AI Models Are Fixable
Stanford Framework Keeps AI Agents Within Violation Targets
Self-Generated Replay Cuts Catastrophic Forgetting in Fine-Tuned Models
Study: AI Narrative Explanations Boost User Trust, Not Accuracy
DelTA Framework Improves Reasoning by Fixing Token-Level Credit Assignment
RELEX reconstructs RLVR checkpoints from 15% training data
SAEBench Metrics Rank SAEs Backwards, Audit Finds
Math Proof Shows Transformer Attention Stabilizes Predictably
SLIM improves LLM agent performance 7 percentage points
Shepherd Raises Agent Accuracy 90% With Forking Traces
Sparse MoE Models Match Dense Transformers at 3× Faster Inference
Frozen Models Encode Semantic Roles Without Fine-Tuning
Rice and Apple researchers cut image-generation FID 22% with token fix
First-Token Entropy Rivals Multi-Sample Hallucination Detection
Purdue and Georgia Tech Prove Transformers Extract Nonlinear Features in Context
Claude's Safety Tests Fail When Model Hides Suspicions Inside
Fixed-Threshold AI Detector Shows Cross-Domain Robustness