Despite increasing competition from PyTorch and JAX, TensorFlow remains the most-used deep learning framework, tightly integrated with Keras and tf.data. At Hugging Face, our philosophy is to embrace these tools rather than fight them. This article explores the principles behind our TensorFlow support.
Introduction
TensorFlow differs from other libraries in key ways, especially its tight integration with Keras and tf.data. Some PyTorch engineers try to bypass these APIs, but that's a mistake—Keras is a powerful high-level API, and pushing it aside leads to reinventing the wheel. As TensorFlow engineers, we want to leverage cutting-edge models with the tools we know. Here’s our approach.
Interlude: 30 Seconds to Hugging Face
If you're new to Hugging Face Transformers, here's the core idea: load a pretrained model by name in one line. For example, using TFAutoModel.from_pretrained("bert-base-cased") gives you BERT's architecture and weights. To add a task-specific head, use TFAutoModelForImageClassification. This enables transfer learning, where a model pretrained on vast data can solve new problems with far fewer examples.
Preprocessing must match the original training. Use AutoTokenizer to tokenize inputs consistently.
Philosophy #1: All TensorFlow models should be Keras Model objects
Every Hugging Face TensorFlow model subclasses tf.keras.Model, ensuring compatibility with Keras APIs like fit(), save(), and the Model class utilities. This allows seamless integration with the broader Keras ecosystem.
Philosophy #2: Loss functions are provided by default, but can be easily changed
Models come with default loss functions appropriate for their task, but you can override them by passing a custom loss to compile(). This flexibility ensures you can adapt models to your exact needs.
Philosophy #3: Labels are flexible
Our models accept labels in multiple formats (e.g., integers for classification, token IDs for language modeling). We handle the conversion internally, so you can focus on your data.
Philosophy #4: You shouldn’t have to write your own data pipeline, especially for common tasks
We provide TFPipeline and TFModelForXxx classes that wrap data preprocessing and model inference. For training, we offer helper functions and examples using tf.data to load and preprocess datasets efficiently.
Philosophy #5: XLA is great!
We encourage using XLA (Accelerated Linear Algebra) via tf.function(jit_compile=True) to speed up training and inference. Our models are XLA-compatible, and we provide guidance on enabling it.
Philosophy #6: Deployment is just as important as training
Hugging Face models can be exported to TensorFlow SavedModel format for deployment with TensorFlow Serving, TFLite, or TF.js. We also support conversion to ONNX.
Conclusion: Community is everything
As an open-source project, we rely on community contributions. We welcome feedback and pull requests to improve TensorFlow support. By following these philosophies, we aim to make TensorFlow development as smooth and powerful as possible.