Laravel

The Hugging Face Transformers library has become a cornerstone of modern NLP, but tuning hyperparameters remains a challenge for many practitioners. A new integration with Ray Tune simplifies this process, offering advanced optimization algorithms that can significantly improve model accuracy while reducing costs.

In a recent benchmark, fine-tuning a BERT model on the RTE dataset showed that Population-Based Training (PBT) achieved 70.5% test accuracy in just 48 GPU minutes ($2.45), outperforming grid search (65.4% accuracy, 45 min, $2.30) and Bayesian optimization with early stopping (66.9% accuracy, 104 min, $5.30).

Getting Started

To use Ray Tune with Transformers, install the required packages:

pip install "ray[tune]" transformers datasets scipy sklearn torch

Here’s a minimal example fine-tuning DistilBERT on the GLUE MRPC dataset:

from datasets import load_dataset, load_metric
from transformers import (AutoModelForSequenceClassification, AutoTokenizer,
                          Trainer, TrainingArguments)

tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
dataset = load_dataset('glue', 'mrpc')
metric = load_metric('glue', 'mrpc')

def encode(examples):
    outputs = tokenizer(
        examples['sentence1'], examples['sentence2'], truncation=True)
    return outputs

encoded_dataset = dataset.map(encode, batched=True)

def model_init():
    return AutoModelForSequenceClassification.from_pretrained(
        'distilbert-base-uncased', return_dict=True)

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = predictions.argmax(axis=-1)
    return metric.compute(predictions=predictions, references=labels)

training_args = TrainingArguments(
    "test", evaluation_strategy="steps", eval_steps=500, disable_tqdm=True)
trainer = Trainer(
    args=training_args,
    tokenizer=tokenizer,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["validation"],
    model_init=model_init,
    compute_metrics=compute_metrics,
)

# Default objective is the sum of all metrics when metrics are provided
trainer.hyperparameter_search(
    direction="maximize", 
    backend="ray", 
    n_trials=10
)

By default, each trial uses 1 CPU (and optionally 1 GPU). You can allocate more resources via resources_per_trial.

Advanced Algorithms

Easily swap in different algorithms like HyperBand, Bayesian Optimization, or Population-Based Training. For example, to use HyperOpt with ASHA scheduler:

from ray.tune.suggest.hyperopt import HyperOptSearch
from ray.tune.schedulers import ASHAScheduler

best_trial = trainer.hyperparameter_search(
    direction="maximize",
    backend="ray",
    search_alg=HyperOptSearch(metric="objective", mode="max"),
    scheduler=ASHAScheduler(metric="objective", mode="max")
)

Ray Tune also integrates seamlessly with Weights and Biases for experiment tracking.

Try It Today

pip install -U ray
pip install -U transformers datasets
Check the Hugging Face documentation and discussion thread
Explore the end-to-end example notebook

For more, see the Transformers + GLUE + Ray Tune example, the Weights and Biases report, and the simplest way to serve your NLP model.

Boost Transformer Performance with Ray Tune's Hyperparameter Search

Getting Started

Advanced Algorithms

Try It Today

We Care About Your Privacy

How and why we process data