SetFit is a new framework for few-shot fine-tuning of Sentence Transformers, developed by Hugging Face in collaboration with Intel Labs and the UKP Lab. Unlike traditional methods that rely on handcrafted prompts or verbalisers, SetFit generates rich embeddings directly from labeled text examples, eliminating the need for prompts entirely.
How It Works
SetFit operates in two stages. First, it fine-tunes a Sentence Transformer model on a small number of labeled examples (typically 8 or 16 per class) using contrastive training. Positive and negative pairs are created by selecting examples from the same or different classes. This produces dense embeddings for each example. Second, a classifier head is trained on these embeddings to predict class labels. At inference, new examples are passed through the fine-tuned Sentence Transformer to generate embeddings, which are then classified.
SetFit also supports multilingual classification by simply switching to a multilingual Sentence Transformer checkpoint. Experiments show promising results in German, Japanese, Mandarin, French, and Spanish, both in-language and cross-linguistically.
Benchmark Performance
Despite using much smaller models, SetFit matches or exceeds state-of-the-art few-shot methods. On the RAFT benchmark, SetFit Roberta (355M parameters) outperforms PET and GPT-3, and is competitive with T-Few (11B parameters) – a model 30 times larger. SetFit even surpasses the human baseline on 7 of 11 RAFT tasks.
| Rank | Method | Accuracy | Model Size |
|---|---|---|---|
| 2 | T-Few | 75.8 | 11B |
| 4 | Human Baseline | 73.5 | N/A |
| 6 | SetFit (Roberta Large) | 71.3 | 355M |
| 9 | PET | 69.6 | 235M |
| 11 | SetFit (MP-Net) | 66.9 | 110M |
| 12 | GPT-3 | 62.7 | 175B |
On other datasets, SetFit with only 8 examples per class typically outperforms PERFECT, ADAPET, and vanilla fine-tuned transformers, while achieving comparable results to T-Few 3B despite being 27 times smaller.
Speed and Cost Efficiency
SetFit is significantly faster and cheaper to train than competing methods. Training SetFit on an NVIDIA V100 with 8 labeled examples takes only 30 seconds, costing $0.025. In contrast, training T-Few 3B on an A100 takes 11 minutes and costs about $0.70 – a 28x increase. SetFit can even run on a single GPU like those on Google Colab or on a CPU in minutes. Inference speed-ups of up to 123x are possible through distillation.
Getting Started
To train your own SetFit model, install the setfit library and use the following code:
pip install setfit
from datasets import load_dataset
from sentence_transformers.losses import CosineSimilarityLoss
from setfit import SetFitModel, SetFitTrainer
dataset = load_dataset("SetFit/SentEval-CR")
train_ds = dataset["train"].shuffle(seed=42).select(range(16))
test_ds = dataset["test"]
model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")
trainer = SetFitTrainer(
model=model,
train_dataset=train_ds,
eval_dataset=test_ds,
loss_class=CosineSimilarityLoss,
batch_size=16,
num_iterations=20,
)
trainer.train()
For more details, see the paper, code, and models on the Hub.