Nvidia and DeepSeek are escalating their moves into multimodal AI, unveiling models that combine vision, audio, and language into long-context agents. Nvidia's Nemotron 3 Nano Omni integrates these modalities for practical agent applications, while DeepSeek has released DeepSeek Vision with a similar universal device vision. The race is on to dominate the next generation of AI agents capable of processing multiple inputs seamlessly.
Nvidia and DeepSeek Battle Over Multimodal AI Agents in 2026
AI
May 1, 2026 · 2:10 PM