DailyGlimpse

New Benchmark Reveals Why Enterprise AI Agents Often Fall Short

AI
April 26, 2026 · 4:03 PM
New Benchmark Reveals Why Enterprise AI Agents Often Fall Short

IBM and UC Berkeley researchers have developed IT-Bench and MAST, two diagnostic tools that expose common failure modes in enterprise AI agents. Their study shows that even state-of-the-art agents struggle with complex, multi-step IT tasks, often failing to maintain context or execute actions in the correct order. The team tested over 20 agents on realistic scenarios, finding that success rates dropped sharply as task complexity increased. These benchmarks aim to help developers identify weaknesses and build more reliable autonomous systems.