Intel and Hugging Face have announced a new collaboration that brings OpenVINO, Intel's toolkit for optimizing and deploying AI models, to the Optimum Intel library. This integration allows developers to easily run inference on Intel hardware and apply quantization to reduce model size and improve performance.
The first release, based on OpenVINO 2022.2, supports a wide range of PyTorch models through the new OVModels class. Developers can perform post-training static quantization and quantization-aware training on encoder models like BERT and DistilBERT, with more models expected in future updates.
To demonstrate the capabilities, the team provided an example of quantizing a Vision Transformer (ViT) model fine-tuned on the Food101 dataset. The process involves installing dependencies, loading the model and processor, creating a calibration dataset, configuring quantization, and exporting the quantized model to OpenVINO format. The resulting model can then be used with pipelines for inference, showing no significant loss in accuracy.
The integration simplifies hardware acceleration for Transformer models, making it accessible through the Hugging Face ecosystem.