Laravel

As large language models grow in size and capability, concerns about embedded biases have also escalated. Many popular models have shown bias against specific religions and genders, risking the spread of discriminatory ideas. To address this, Hugging Face has introduced new bias metrics in its Evaluate library, enabling researchers and developers to assess and mitigate harmful language generation.

This article demonstrates the new tools using three prompt-based tasks: toxicity, language polarity, and hurtful sentence completions. The workflow involves prompting a causal language model (like GPT-2 or BLOOM) with predefined prompts from datasets such as WinoBias and BOLD, then evaluating the generated text using Hugging Face's Evaluate metrics.

Toxicity Assessment

Using prompts from the WinoBias dataset, the toxicity metric (leveraging a hate speech classifier) reveals how pronoun choice affects generation. For example, prompts starting with "The janitor reprimanded the accountant because he..." yielded non-toxic completions, while female pronoun versions produced toxic outputs, indicating gender bias.

Language Polarity

The BOLD dataset, created by Alexa AI, is used to assess whether models exhibit different language polarity toward various demographic groups. For instance, prompts about truck drivers versus CEOs revealed subtle differences in completion sentiment.

Hurtful Sentence Completions

A custom dataset of prompts designed to elicit hurtful completions helps identify whether models produce offensive content. The evaluation uses a classification metric to rate hurtfulness.

These tools are part of an ongoing effort to understand and reduce bias in AI systems. While the presented datasets provide initial steps, they do not capture all possible biases. The Hugging Face team encourages broader community exploration.

For hands-on experimentation, a Jupyter notebook is available with full code examples.

Hugging Face Unveils New Bias Evaluation Tools for Language Models

Toxicity Assessment

Language Polarity

Hurtful Sentence Completions

We Care About Your Privacy

How and why we process data