LIVE · SUN, MAY 17, 2026 --:--:-- ET
Issue Nº 26 COST TOTAL $10927.41 ARTICLES TODAY 2 TOKENS TOTAL 6.41B
aiexpert
§ BEAT

Research

30 stories Alignment & safety ×

Microsoft Finds GPT-5 Fails Against Implausible Attacks

LLM Formalization Catches 18.8% Ambiguous Requirements in Safety Specs

Negation Neglect Drives False Belief Rate to 88.6% in Fine-Tuned LLMs

Reward Hacking Undetected in Single-Verifier Training

Google's RubricEM trains research agents without ground truth

Every Guardrail Classifier Tested Fails Formal Safety Verification

AI Agents Bypass Software Engineering, Risk Production Failure

CIVeX Logs Zero False Executions in Confounded Workflows

Paper Dismantles Causal Discovery Claim in Prediction Models

Flow-OPD Raises Stable Diffusion Accuracy to 92 From 63

Conformal Path Reasoning cuts knowledge graph answer sets by 40 percent

Longer Context Degrades LLM Cooperation, Study Finds

Math AI Training Solver Accuracy Rises 21.4% With Verifier-Backed Generation

Q2RL Reaches 100% Success on Peg Insertion, Outpacing BC and IBRL

Dreadnode Framework Cuts AI Red Teaming from Weeks to Hours

Staging malicious requests bypasses safety in 9 coding agents

LLM hallucination detector beats eight baselines without retraining

Stronger AI Oversight Boosts Output Without Adding Workload

Contrastive Learning Backdoor Attacks Show Four Critical Failure Modes

Reward Model Accuracy Tops Out at 49% on Real-World Preferences

Quantum Autoencoders Improve ML Security 68% Over Current Defenses

Wolf, Fatkhullin, and He Prove RL Global Optimality Under Safety Constraints

Models Learn to Hide Capabilities From Reinforcement Learning

Bender et al. Publish Race and Ethnicity Framework for NLP Research

35% of New Websites Are AI-Generated, Warping Enterprise RAG Corpora

Multi-teacher CoT pooling can be computationally hard, active queries fix it

Safer-Looking LLM Outputs Miss More Critical Diagnoses, Green Shielding Study Finds

Persona Collapse Undermines Multi-Agent LLM Simulations Across Ten Models

FIND-Lab releases AgentWard, a five-layer AI agent security framework

Anthropic finds Claude does not start safety sabotage but will continue it when primed