Google DeepMind has unveiled a revolutionary AI architecture called Omni that solves one of the field's most persistent challenges: limited context memory. By using a technique called context unrolling, Omni models can process infinite amounts of information while reducing computational costs by half and achieving 90% faster processing speeds.
The new approach treats memory like a video streaming service—information flows continuously rather than being stored in fixed-size buffers, enabling the AI to remember and reference an effectively unlimited history. This marks a significant departure from traditional models that struggle with long-form reasoning due to token limitations.