Hugging Face and Intel have announced a partnership aimed at making machine learning hardware acceleration more accessible. Through the Optimum open-source library, the collaboration will focus on optimizing Transformer models for Intel platforms, including Xeon CPUs and Habana Gaudi accelerators.
The goal is to help developers achieve better performance with minimal code changes. Optimum Intel, built on Intel Neural Compressor, provides tools for quantization, pruning, and knowledge distillation. A case study demonstrates quantizing a DistilBERT model for text classification, reducing memory usage while maintaining accuracy.
“We’re excited to work with Hugging Face to bring the latest innovations of Intel Xeon hardware and Intel AI software to the Transformers community,” said Wei Li, Intel Vice President & General Manager of AI and Analytics.
Earlier collaborations have already yielded low-latency inference results, such as single-digit millisecond latency for DistilBERT on Ice Lake CPUs and up to 40% better price-performance with Habana Gaudi for training. The joint effort aims to ease the adoption of Transformer models in production environments.