§ BEAT
Research
Microsoft Finds GPT-5 Fails Against Implausible Attacks
LLM Formalization Catches 18.8% Ambiguous Requirements in Safety Specs
Negation Neglect Drives False Belief Rate to 88.6% in Fine-Tuned LLMs
Reward Hacking Undetected in Single-Verifier Training
Google's RubricEM trains research agents without ground truth
Every Guardrail Classifier Tested Fails Formal Safety Verification
AI Agents Bypass Software Engineering, Risk Production Failure
CIVeX Logs Zero False Executions in Confounded Workflows
Paper Dismantles Causal Discovery Claim in Prediction Models
Flow-OPD Raises Stable Diffusion Accuracy to 92 From 63
Conformal Path Reasoning cuts knowledge graph answer sets by 40 percent
Longer Context Degrades LLM Cooperation, Study Finds
Math AI Training Solver Accuracy Rises 21.4% With Verifier-Backed Generation
Q2RL Reaches 100% Success on Peg Insertion, Outpacing BC and IBRL
Dreadnode Framework Cuts AI Red Teaming from Weeks to Hours
Staging malicious requests bypasses safety in 9 coding agents
LLM hallucination detector beats eight baselines without retraining
Stronger AI Oversight Boosts Output Without Adding Workload
Contrastive Learning Backdoor Attacks Show Four Critical Failure Modes
Reward Model Accuracy Tops Out at 49% on Real-World Preferences
Quantum Autoencoders Improve ML Security 68% Over Current Defenses
Wolf, Fatkhullin, and He Prove RL Global Optimality Under Safety Constraints
Models Learn to Hide Capabilities From Reinforcement Learning
Bender et al. Publish Race and Ethnicity Framework for NLP Research
35% of New Websites Are AI-Generated, Warping Enterprise RAG Corpora
Multi-teacher CoT pooling can be computationally hard, active queries fix it
Safer-Looking LLM Outputs Miss More Critical Diagnoses, Green Shielding Study Finds
Persona Collapse Undermines Multi-Agent LLM Simulations Across Ten Models
FIND-Lab releases AgentWard, a five-layer AI agent security framework