Researchers have developed a new method to train large language models (LLMs) to reason more effectively by using Jupyter notebooks as interactive training environments. The approach, called Jupyter Agents, leverages the sequential, code-execution nature of notebooks to teach models how to break down complex problems into smaller steps, execute code, and iteratively refine their reasoning.
Unlike traditional training data, Jupyter notebooks provide a natural record of trial-and-error problem solving, including code snippets, outputs, errors, and markdown explanations. By training on this rich, interactive data, LLMs learn to simulate a similar reasoning process: hypothesize, test, observe results, and adjust. Early experiments show significant improvements in tasks requiring multi-step reasoning, such as mathematical problem solving and data analysis.
The team behind Jupyter Agents hopes this method will reduce hallucinations and improve reliability in LLMs, especially for scientific and analytical applications. The work highlights the potential of using existing interactive coding environments as a training resource.