What You'll Learn
- The basic idea of dynamic adversarial data collection and why it's important.
- How to collect adversarial data dynamically and train your model on them, using an MNIST digit recognition task as an example.
Dynamic Adversarial Data Collection (DADC)
Static benchmarks are widely used to evaluate model performance but come with major drawbacks: they saturate, contain biases, and often lead researchers to chase incremental metric gains instead of building trustworthy models. Dynamic Adversarial Data Collection (DADC) offers a promising alternative. In DADC, humans create examples specifically designed to fool state-of-the-art (SOTA) models. This approach provides two key benefits:
- It allows you to gauge your model's true robustness.
- It produces valuable data that can be used to train even stronger models.
The cycle of fooling and retraining is repeated over multiple rounds, yielding a more robust model that aligns with human expectations.
Training Your Model Dynamically Using Adversarial Data
Here, we'll walk through dynamically collecting adversarial data from users and training your model on them—using the classic MNIST handwritten digit recognition task.
In MNIST, the model predicts the digit (0–9) from a 28x28 grayscale image. While it's easy to achieve high accuracy on the standard test set, SOTA models often fail on handwritten digits outside the static set. Human-in-the-loop adversarial samples can help generalize better.
The process is divided into four steps:
- Configuring your model
- Interacting with your model
- Flagging your model
- Putting it all together
Configuring Your Model
Define your model architecture. The example below uses two convolutional layers, a fully connected layer, and softmax activation:
class MNIST_Model(nn.Module):
def __init__(self):
super(MNIST_Model, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)
Then train the model on the standard MNIST dataset.
Interacting with Your Model
With your trained model, you need a way for users to provide adversarial examples. Using Gradio and Hugging Face Spaces, you can build an interactive demo where users draw digits on a canvas, and the model attempts to classify them. Try to fool the model with unusual handwriting or digit placement.
Flagging Your Model
When a user successfully fools the model, you can flag that example. Flagging involves:
- Saving the adversarial example to a dataset.
- Retraining the model after collecting a threshold number of samples.
- Repeating the process multiple times.
Gradio provides a built-in flagging callback; you can also write a custom function. The flagged data becomes increasingly challenging, driving model improvement.
Putting It All Together
Combine all components into a single demo Space. The MNIST Adversarial Space on Hugging Face enables real-time DADC: users draw digits, the model predicts, and when wrong, the example is flagged and later used for retraining.
Conclusion
Dynamic Adversarial Data Collection is gaining traction as a method to gather diverse, non-saturating datasets that improve model evaluation and performance. By involving humans in the loop, you create models that generalize better and remain robust against real-world variations.