DailyGlimpse

Graphcore and Hugging Face Expand IPU-Ready Transformer Models Across Vision, NLP, and Speech

AI
April 26, 2026 · 5:32 PM
Graphcore and Hugging Face Expand IPU-Ready Transformer Models Across Vision, NLP, and Speech

Graphcore and Hugging Face have significantly extended the range of machine learning models available in Hugging Face Optimum, an open-source library for optimizing Transformer performance. Developers now have convenient access to a broad set of off-the-shelf Hugging Face Transformer models that are fine-tuned to deliver peak performance on Graphcore's Intelligence Processing Units (IPUs).

The partnership initially brought BERT to the IPU platform, and now includes ten models covering Natural Language Processing (NLP), computer vision, and speech. Each model comes with IPU configuration files and ready-to-use pre-trained and fine-tuned weights.

Computer Vision

  • ViT (Vision Transformer): A breakthrough in image recognition that applies the Transformer mechanism to images by dividing them into small patches, similar to how words are processed in language systems. Each patch is encoded and processed individually.

NLP

  • GPT-2: A text generation model pre-trained on a large corpus of English data in a self-supervised manner. It generates text by predicting the next word in a sentence.
  • RoBERTa: An optimized BERT approach pre-trained with masked language modeling. It masks 15% of words in a sentence and predicts them, making it suitable for fine-tuning on downstream tasks.
  • DeBERTa: Enhances BERT with a disentangled attention mechanism and an improved mask decoder, boosting pre-training efficiency and downstream performance.
  • BART: A sequence-to-sequence model with a bidirectional encoder and autoregressive decoder, effective for text generation (summarization, translation) and comprehension tasks.
  • LXMERT: A multimodal Transformer for learning vision and language representations, achieving state-of-the-art results on visual question answering datasets.
  • T5: A unified text-to-text framework that converts all text-based language problems into a text-to-text format, simplifying use across diverse NLP tasks.

Speech

  • HuBERT: A self-supervised speech recognition model that learns combined acoustic and language representations from continuous audio. It matches or surpasses wav2vec 2.0 performance on LibriSpeech benchmarks.
  • Wav2Vec2: A pre-trained self-supervised model for automatic speech recognition that learns powerful speech representations from unlabeled data, then fine-tunes on small amounts of transcribed speech.

Building on a Solid Partnership

Graphcore joined the Hugging Face Hardware Partner Program as a founding member in 2021, aiming to lower barriers for machine intelligence innovators. Since then, the two companies have collaborated to make training Transformer models on IPUs fast and easy. The first Optimum Graphcore model, BERT, was released last year.

Transformers have proven highly efficient for tasks such as feature extraction, text generation, sentiment analysis, and translation. Models like BERT are widely used by Graphcore customers in cybersecurity, voice call automation, drug discovery, and translation. Optimizing their real-world performance demands significant time and expertise, which the expanded lineup aims to reduce.