DailyGlimpse

Vintage LLM Trained on Pre-1931 Texts Predicts Steamships and Peace in 2026

AI
April 29, 2026 · 1:31 AM

What would a large language model think of 2026 if it had no knowledge of anything after 1930? A new experiment from researchers Nick Levine, David Duvenaud, and Alec Radford explores this question with a 13-billion-parameter model called talkie, trained exclusively on texts published before December 31, 1930.

Talkie was trained on 260 billion tokens from books, newspapers, scientific journals, patents, and case law from the early 20th century. The result is a model that envisions a retro-futuristic world: Europe with a billion inhabitants, crisscrossed by iron railroads, steamships connecting London and New York in ten days, and a social elite wintering in Paris and summering in London.

When asked about the possibility of a second world war, talkie dismisses the idea, stating that 'the madness of 1914-1918 has passed away.' However, it hedges by noting 'smouldering animosities' and potential flashpoints between China and Japan, as well as Italy and Yugoslavia. 'The spark may be applied at any moment,' it warns.

The researchers also tested talkie's predictive accuracy by running nearly 5,000 historical events from the New York Times' 'On This Day' feature through the model. Surprise values climbed sharply for events after 1930, peaking in the 1950s and 1960s before leveling off.

Challenges in Building a Vintage Model

Building a model with a strict pre-1931 knowledge cutoff posed significant challenges. All training texts had to be transcribed from physical sources, and standard OCR transcriptions achieved only 30% of the performance of human transcriptions. Simple regex cleaning improved that to 70%, and a custom vintage OCR system is under development.

Ensuring no modern data contaminated the training set was another headache. Despite a contamination classifier, information about Roosevelt's presidency, World War II, and the United Nations slipped through. The team plans to improve classifiers for future versions.

For post-training, the developers used historical reference works such as etiquette manuals, letter-writing guides, cookbooks, encyclopedias, and fable collections. Reinforcement learning with Claude Sonnet 4.6 as the judge helped improve instruction-following, though the researchers acknowledge this introduces some anachronistic behavior.

Surprising Programming Skills

Despite having no knowledge of digital computers, talkie showed basic proficiency in Python programming. On the HumanEval benchmark, the vintage models performed worse than modern counterparts but improved with scale. Talkie correctly implemented a rotation cipher decoding function by swapping an addition for a subtraction, suggesting a grasp of inverse functions.

The researchers note that vintage models, free from data contamination, are ideal for generalization experiments. They could help reveal which language model traits are universal versus dependent on the training corpus.

Plans for a Larger Model

Talkie is available as both a base model and a chat version on Hugging Face, with the code on GitHub. A live demo is on the project website, where Claude Sonnet quizzes talkie about its knowledge 24/7.

The team plans to scale the model significantly, targeting a GPT-3-level model by summer 2026. Early estimates suggest the corpus can grow to over one trillion tokens of historical texts, enough to train a model on par with GPT-3.5. Multilingual expansion beyond English is also on the roadmap.

The larger question: can a vintage model anticipate discoveries and inventions after its cutoff? As DeepMind CEO Demis Hassabis has suggested, a model trained only through 1911 might independently derive general relativity. Larger vintage models could reveal these scaling trends.