In this tutorial, we explore the lambda/hermes-agent-reasoning-traces dataset from Hugging Face to understand how agent-based models think, use tools, and generate responses across multi-turn conversations. We start by loading and inspecting the dataset, examining its structure, categories, and conversational format to get a clear idea of the available information. We then build simple parsers to extract key components such as reasoning traces, tool calls, and tool responses, allowing us to separate internal thinking from external actions. We also analyze patterns such as tool usage frequency, conversation length, and error rates to better understand agent behavior. We create visualizations to highlight these trends and make the analysis more intuitive. Finally, we prepare the dataset for training by converting it into a model-friendly format, making it suitable for tasks like supervised fine-tuning.
First, we install all required libraries and import the necessary modules to set up our environment. We then load the dataset and inspect its structure, fields, and categories. We also optionally combine multiple dataset configurations and examine a sample to understand the conversational format.
# Install and import
!pip -q install -U datasets pandas matplotlib seaborn transformers accelerate trl
import json, re, random, textwrap
from collections import Counter, defaultdict
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datasets import load_dataset, concatenate_datasets
random.seed(0)
CONFIG = "kimi"
ds = load_dataset("lambda/hermes-agent-reasoning-traces", CONFIG, split="train")
print(ds)
print("Config:", CONFIG, "| Fields:", ds.column_names)
print("Categories:", sorted(set(ds["category"])))
We then define regex-based parsers to extract reasoning traces, tool calls, and tool responses from the dataset. We process assistant messages to separate thoughts, actions, and final outputs in a structured way. Testing the parser on a sample conversation verifies that the extraction works correctly.
THINK_RE = re.compile(r"<think>(.*?)</think>", re.DOTALL)
TOOL_CALL_RE = re.compile(r"<tool_call>\s*(\{.*?\})\s*</tool_call>", re.DOTALL)
TOOL_RESP_RE = re.compile(r"<tool_response>\s*(.*?)\s*</tool_response>", re.DOTALL)
We perform dataset-wide analytics to measure tool usage, conversation lengths, and error patterns. We aggregate statistics across multiple samples to understand overall agent behavior. We also create visualizations to highlight trends such as tool frequency, parallel calls, and category distribution.
N = 3000
sub = ds.select(range(min(N, len(ds))))
# ... analysis and plotting code ...
After analyzing the data, we render a sample trace to visualize the reasoning process step by step. This helps in understanding how the agent thinks and uses tools.
def render_trace(ex, max_chars=350):
# ... rendering code ...
idx = int(np.argmin(np.abs(np.array(turns_per_traj) - 10)))
render_trace(sub[idx])
Finally, we prepare the dataset for supervised fine-tuning by converting conversations into a format suitable for training a language model. This involves creating input-output pairs that map user queries to assistant responses, including reasoning traces and tool calls.
def format_for_sft(ex):
# ... formatting code ...
This tutorial provides a complete pipeline for parsing, analyzing, visualizing, and fine-tuning agent reasoning traces, enabling researchers and developers to build better AI agents.