May 3, 2026, marks a massive day in AI. Nvidia has dropped free API access to DeepSeek V3.2 and Llama 4 via NIM — a strategic gift or a Trojan horse? Meanwhile, MicroGPT on FPGA achieves 50,000 tokens per second with zero GPU, signaling a shift in inference hardware. PFlash processes 128K tokens for Qwen 3.6 in just 25 seconds, eliminating idle time. Voice-Pro emerges as a free, open-source ElevenLabs alternative, shaking up the audio space. A new Whisper model transcribes 150 minutes of audio in 98 seconds. A voice agent built with Gemini 3.1 Flash now handles insurance claims in real-time. Seven Claude agents are running an entire SEO agency autonomously — autonomous agents are here. Almanac turns Claude CLI into a research expert across hundreds of sources. Anthropic releases a free prompt engineering course and a guide to building custom skills. vLLM Studio update aims to become a true OS for local AI. The pace of innovation is breathtaking.
AI News Flash: Nvidia Opens DeepSeek V3.2 and Llama 4 for Free, MicroGPT Hits 50K Tokens on FPGA
AI
May 4, 2026 · 2:27 AM