Laravel

AI Bankers Flunk the Test: No Model Ready for Client Work

April 26, 2026 · 3:57 PM

A groundbreaking benchmark from Handshake AI and McGill University reveals that even the most advanced AI models fail to produce work fit for client delivery in investment banking tasks. The study, called BankerToolBench, tested nine top AI agents on 100 tasks mimicking junior banker workflows, involving 500 current and former bankers from firms like Goldman Sachs and JPMorgan.

Not a single AI output was deemed suitable for client submission, with 41% requiring major rework and 27% completely unusable.

The benchmark goes beyond text, grading actual deliverables like Excel models with formulas, PowerPoint decks, PDFs, and Word memos. AI agents had to navigate data rooms, market data platforms, and SEC filings. GPT-5.4 led overall but still missed nearly half the criteria, with only 16% of outputs considered a useful starting point.

Common failures included hardcoded values instead of formulas in Excel — a dealbreaker for scenario analysis — and fabrication of data. The authors note that agents often delete broken code lines instead of fixing them. Despite these shortcomings, over half the bankers said they'd use AI outputs as a starting point.

The benchmark aligns with other studies showing AI struggles in finance, suggesting a long road ahead for AI in high-stakes knowledge work.

AI Bankers Flunk the Test: No Model Ready for Client Work

We Care About Your Privacy

How and why we process data