Laravel

In a previous article, we covered the theoretical foundations of machine learning on graphs. Now, we dive into practical implementation: how to perform graph classification using the Hugging Face Transformers library. This tutorial focuses on Microsoft's Graphormer, currently the only graph transformer model available in Transformers, and walks through data loading, preprocessing, model setup, and training.

Requirements

You'll need datasets and transformers (version >= 4.27.2). Install them with:

pip install -U datasets transformers

Data

You can use your own graph datasets or those available on the Hugging Face Hub. We'll use the ogbg-molhiv dataset from the Open Graph Benchmark.

Loading

Loading a graph dataset from the Hub is straightforward:

from datasets import load_dataset

dataset = load_dataset("OGB/ogbg-molhiv")
dataset = dataset.shuffle(seed=0)

This dataset includes train, validation, and test splits, each containing columns like edge_index, edge_attr, y, num_nodes, and node_feat.

You can visualize graphs using libraries like NetworkX and matplotlib:

import networkx as nx
import matplotlib.pyplot as plt

graph = dataset["train"][0]
edges = graph["edge_index"]
num_nodes = graph["num_nodes"]

G = nx.Graph()
G.add_nodes_from(range(num_nodes))
G.add_edges_from([(edges[0][i], edges[1][i]) for i in range(len(edges[0]))])
nx.draw(G)
plt.show()

Format

Graph datasets on the Hub are stored as lists of graphs in JSONL format. Each graph is a dictionary with:

edge_index: list of two lists of integers representing edges.
num_nodes: integer, total number of nodes (assumed sequentially numbered).
y: list of labels (integers for classification, floats for regression, or lists for multi-task).
node_feat (optional): list of lists of integers for node features.
edge_attr (optional): list of lists of integers for edge attributes.

Preprocessing

Graphormer requires specific preprocessing to generate features like degree information and shortest path matrices. Use:

from transformers.models.graphormer.collating_graphormer import preprocess_item, GraphormerDataCollator

dataset_processed = dataset.map(preprocess_item, batched=False)

Alternatively, you can enable on-the-fly processing in the data collator for large graphs.

Model

Loading

Load a pretrained Graphormer model and fine-tune it for your downstream task. For binary classification, set num_classes=2:

from transformers import GraphormerForGraphClassification

model = GraphormerForGraphClassification.from_pretrained(
    "clefourrier/pcqm4mv2_graphormer_base",
    num_classes=2,
    ignore_mismatched_sizes=True,
)

You can also create a randomly initialized model from scratch.

Training

Use the Trainer class with a TrainingArguments configuration and an evaluation metric. For details, check the full notebook linked in the original article.

Ending Note

This tutorial demonstrated graph classification using Graphormer in Hugging Face Transformers. With the Hub's datasets and pre-trained models, you can quickly adapt this workflow to your own graph classification tasks.

Graph Classification Made Easy with Hugging Face Transformers

Requirements

Data

Loading

Format

Preprocessing

Model

Loading

Training

Ending Note

We Care About Your Privacy

How and why we process data