DailyGlimpse

AI's Data Diet: Is Reddit the Secret Ingredient?

AI
May 3, 2026 · 11:20 PM

In a recent episode of Building Great Tech, hosts Simon and Anastasija sat down with Dr. Travis Hoppe, Chief AI Officer at GroundVue and former White House Assistant Director of AI Research and Development. Dr. Hoppe, who co-developed The Pile—the first open-source dataset for large language models—shed light on the data sources powering modern AI.

The conversation tackled a provocative question: Is most of AI's training data actually from Reddit? While not explicitly answered, Dr. Hoppe discussed how platforms like Reddit contribute to the vast datasets used to train AI systems. He also explored AI's implications for democracy and the science behind large language model development.

Dr. Hoppe's background includes serving as the first Chief Artificial Intelligence Officer at the CDC, bringing a unique perspective on AI's role in government and public health. The episode offers a deep dive into the often-opaque world of AI data sourcing and its societal impacts.