Google has launched Gemma 4, a new family of multimodal AI models designed to run directly on mobile devices and edge hardware. Building on the open-source Gemma series, these models integrate vision, language, and audio capabilities into a single compact architecture.
Unlike cloud-dependent AI, Gemma 4 processes data locally on-device, reducing latency and improving privacy for real-time applications. The models are optimized for smartphones, tablets, and IoT devices, enabling tasks like image captioning, voice commands, and document analysis without internet connectivity.
Early benchmarks show Gemma 4 achieves competitive accuracy against larger models while using significantly less memory and power. Developers can access the model weights via Google's Model Garden, with support for TensorFlow Lite and Keras.
"Gemma 4 brings frontier-level multimodal intelligence to edge devices, unlocking new possibilities for on-device AI assistants, accessibility tools, and industrial automation," said a Google spokesperson.
The release includes pre-trained variants specialized for text, vision, and audio, as well as a unified multimodal model. Google also released fine-tuning scripts for customizing Gemma 4 on proprietary datasets.