Intel has announced that its Gaudi AI accelerators now support Text Generation Inference (TGI), a solution for deploying and serving large language models. This integration aims to enhance inference performance for LLMs, enabling faster response times and higher throughput in production environments.
The TGI framework, originally developed by Hugging Face, is designed to optimize text generation for LLMs, including models like LLaMA, Falcon, and BLOOM. By leveraging Intel Gaudi's hardware architecture, users can achieve improved latency and efficiency when running inference workloads.
Benchmarks shared by Intel demonstrate significant speedups compared to CPU-based serving, making Gaudi a competitive option for AI inference. The company also highlighted that the solution supports popular model formats and can be deployed via Docker containers for ease of use.
This development underscores Intel's push into the AI hardware market, challenging established players like NVIDIA. With TGI support, Gaudi becomes a more attractive option for enterprises looking to deploy LLMs at scale without relying solely on GPUs.