Laravel

New Benchmark Reveals Why Enterprise AI Agents Often Fall Short

April 26, 2026 · 4:03 PM

IBM and UC Berkeley researchers have developed IT-Bench and MAST, two diagnostic tools that expose common failure modes in enterprise AI agents. Their study shows that even state-of-the-art agents struggle with complex, multi-step IT tasks, often failing to maintain context or execute actions in the correct order. The team tested over 20 agents on realistic scenarios, finding that success rates dropped sharply as task complexity increased. These benchmarks aim to help developers identify weaknesses and build more reliable autonomous systems.

New Benchmark Reveals Why Enterprise AI Agents Often Fall Short

We Care About Your Privacy

How and why we process data