Google's Med-Gemini has achieved a 91% accuracy on the USMLE by employing an uncertainty-guided search mechanism that allows the model to self-critique its reasoning. This approach, built atop Gemini 1.5 Pro, enables expert-level clinical reasoning without the need for fine-tuning.
In head-to-head comparisons, Med-Gemini surpassed GPT-4 in complex, long-context medical record analysis. The key innovation is replacing standard retrieval-augmented generation (RAG) with a logic layer that iteratively evaluates and refines its own outputs, leading to more accurate diagnoses and treatment recommendations.
These results demonstrate that long-context reasoning can rival and even exceed traditional methods, potentially transforming how AI is applied in healthcare.