Google Cloud has announced a significant performance and cost breakthrough for running open-source large language models (LLMs) on its C4 machine series, powered by 4th Gen Intel Xeon Scalable processors. In collaboration with Intel and Hugging Face, the company reports a 70% improvement in total cost of ownership (TCO) for models like GPT-3 and other OSS LLMs.
The C4 instances leverage Intel's Advanced Matrix Extensions (AMX) to accelerate AI inference workloads, enabling faster processing and lower operational costs. According to Google, customers running models on C4 can achieve throughput gains of up to 2.5x compared to previous generations.
Hugging Face, a leading platform for open-source AI models, validated the performance gains across popular architectures, including GPT-2, GPT-J, and BLOOM. The collaboration ensures that developers can deploy these models with minimal code changes while benefiting from optimized hardware.
This advancement underscores the growing trend of specialized cloud infrastructure for AI workloads, making large-scale AI more accessible and cost-effective for enterprises.