A new artificial intelligence framework called OneVL promises faster and more accurate reasoning for autonomous driving by processing vision and language in a single step.
Developed for Vision-Language-Action (VLA) models, OneVL replaces traditional chain-of-thought reasoning with compressed latent states. This allows the system to learn causal relationships using both a language decoder and a visual world model decoder.
The key advantage is real-time performance: OneVL performs inference in one pass, eliminating separate steps that cause latency. Benchmark tests show it surpasses existing explicit chain-of-thought methods in both speed and accuracy.
"OneVL processes inference using compressed latent states to address the latency of traditional Chain-of-Thought," the researchers explained.
The framework is tailored for autonomous driving, where split-second decisions are critical. As AI gains traction in self-driving cars, OneVL could enable smoother, safer navigation by reducing the time between perception and action.