The Hugging Face Transformers library has become a cornerstone of modern NLP, but tuning hyperparameters remains a challenge for many practitioners. A new integration with Ray Tune simplifies this process, offering advanced optimization algorithms that can significantly improve model accuracy while reducing costs.
In a recent benchmark, fine-tuning a BERT model on the RTE dataset showed that Population-Based Training (PBT) achieved 70.5% test accuracy in just 48 GPU minutes ($2.45), outperforming grid search (65.4% accuracy, 45 min, $2.30) and Bayesian optimization with early stopping (66.9% accuracy, 104 min, $5.30).
Getting Started
To use Ray Tune with Transformers, install the required packages:
pip install "ray[tune]" transformers datasets scipy sklearn torch
Here’s a minimal example fine-tuning DistilBERT on the GLUE MRPC dataset:
from datasets import load_dataset, load_metric
from transformers import (AutoModelForSequenceClassification, AutoTokenizer,
Trainer, TrainingArguments)
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
dataset = load_dataset('glue', 'mrpc')
metric = load_metric('glue', 'mrpc')
def encode(examples):
outputs = tokenizer(
examples['sentence1'], examples['sentence2'], truncation=True)
return outputs
encoded_dataset = dataset.map(encode, batched=True)
def model_init():
return AutoModelForSequenceClassification.from_pretrained(
'distilbert-base-uncased', return_dict=True)
def compute_metrics(eval_pred):
predictions, labels = eval_pred
predictions = predictions.argmax(axis=-1)
return metric.compute(predictions=predictions, references=labels)
training_args = TrainingArguments(
"test", evaluation_strategy="steps", eval_steps=500, disable_tqdm=True)
trainer = Trainer(
args=training_args,
tokenizer=tokenizer,
train_dataset=encoded_dataset["train"],
eval_dataset=encoded_dataset["validation"],
model_init=model_init,
compute_metrics=compute_metrics,
)
# Default objective is the sum of all metrics when metrics are provided
trainer.hyperparameter_search(
direction="maximize",
backend="ray",
n_trials=10
)
By default, each trial uses 1 CPU (and optionally 1 GPU). You can allocate more resources via resources_per_trial.
Advanced Algorithms
Easily swap in different algorithms like HyperBand, Bayesian Optimization, or Population-Based Training. For example, to use HyperOpt with ASHA scheduler:
from ray.tune.suggest.hyperopt import HyperOptSearch
from ray.tune.schedulers import ASHAScheduler
best_trial = trainer.hyperparameter_search(
direction="maximize",
backend="ray",
search_alg=HyperOptSearch(metric="objective", mode="max"),
scheduler=ASHAScheduler(metric="objective", mode="max")
)
Ray Tune also integrates seamlessly with Weights and Biases for experiment tracking.
Try It Today
pip install -U raypip install -U transformers datasets- Check the Hugging Face documentation and discussion thread
- Explore the end-to-end example notebook
For more, see the Transformers + GLUE + Ray Tune example, the Weights and Biases report, and the simplest way to serve your NLP model.