Laravel

AI Outperforms Human Doctors in ER Diagnosis Accuracy, Harvard Study Finds

May 4, 2026 · 1:32 AM

A groundbreaking study from Harvard Medical School and Beth Israel Deaconess Medical Center has revealed that OpenAI's large language models can diagnose emergency room cases more accurately than human physicians. Published this week in Science, the research compared the performance of OpenAI's o1 and 4o models against two internal medicine attending physicians in diagnosing 76 real ER patients.

In a blinded evaluation, two other attending physicians assessed the diagnoses without knowing whether they came from humans or AI. The results showed that the o1 model either matched or exceeded the accuracy of both human doctors at every diagnostic stage, while the 4o model performed comparably. This suggests that AI could serve as a powerful tool to reduce diagnostic errors and improve patient outcomes in high-pressure emergency settings.

The study underscores the growing potential of AI in healthcare, though researchers caution that further validation and integration with clinical workflows are needed. The findings mark a significant milestone in the application of large language models to medicine, raising both hope and questions about the future role of AI alongside human expertise.

AI Outperforms Human Doctors in ER Diagnosis Accuracy, Harvard Study Finds

We Care About Your Privacy

How and why we process data