DailyGlimpse

Deploying TensorFlow Vision Models via Hugging Face with TF Serving

AI
April 26, 2026 · 5:27 PM
Deploying TensorFlow Vision Models via Hugging Face with TF Serving

The Hugging Face team and external contributors have recently expanded the Transformers library with a diverse set of TensorFlow vision models, including Vision Transformer (ViT), Masked Autoencoders, RegNet, and ConvNeXt. This article demonstrates how to deploy a Vision Transformer model for image classification locally using TensorFlow Serving (TF Serving), which provides both REST and gRPC endpoints.

Saving the Model

All TensorFlow models in Transformers include a save_pretrained() method that can serialize weights in the SavedModel format required by TF Serving. For example, loading and saving a ViT model:

from transformers import TFViTForImageClassification

temp_model_dir = "vit"
ckpt = "google/vit-base-patch16-224"
model = TFViTForImageClassification.from_pretrained(ckpt)
model.save_pretrained(temp_model_dir, saved_model=True)

Inspecting the serving signature reveals the model expects a 4-d input pixel_values of shape (batch_size, num_channels, height, width) and outputs logits of shape (-1, 1000).

Model Surgery

To reduce cognitive load and training-serving skew, it's beneficial to embed preprocessing and postprocessing steps directly into the model graph.

Preprocessing

Essential preprocessing for ViT includes scaling pixel values to [0,1], normalizing to [-1,1], resizing to 224x224, and transposing to channel-first format. The following functions handle these operations:

def normalize_img(img, mean=processor.image_mean, std=processor.image_std):
    img = img / 255
    mean = tf.constant(mean)
    std = tf.constant(std)
    return (img - mean) / std

def preprocess(string_input):
    decoded_input = tf.io.decode_base64(string_input)
    decoded = tf.io.decode_jpeg(decoded_input, channels=3)
    resized = tf.image.resize(decoded, size=(SIZE, SIZE))
    normalized = normalize_img(resized)
    normalized = tf.transpose(normalized, (2, 0, 1))
    return normalized

@tf.function(input_signature=[tf.TensorSpec([None], tf.string)])
def preprocess_fn(string_input):
    decoded_images = tf.map_fn(preprocess, string_input, dtype=tf.float32, back_prop=False)
    return {CONCRETE_INPUT: decoded_images}

Using base64-encoded images as input helps keep request payload sizes manageable over REST or gRPC.

Postprocessing and Model Export

Postprocessing maps logits to class labels (e.g., ImageNet-1k). To embed both preprocessing and postprocessing, you can create a new model that wraps the original:

class TFServingViT(tf.keras.Model):
    def __init__(self, model):
        super().__init__()
        self.model = model

    @tf.function(input_signature=[tf.TensorSpec([None], tf.string)])
    def call(self, string_input):
        # Preprocess the string input to pixel values
        pixel_values = preprocess_fn(string_input)[CONCRETE_INPUT]
        # Run the model
        logits = self.model(pixel_values).logits
        # Postprocess: apply softmax and get top-k labels
        probabilities = tf.nn.softmax(logits, axis=-1)
        # Example: return top-5 labels and scores
        values, indices = tf.math.top_k(probabilities, k=5)
        return {"labels": indices, "scores": values}

Then, export this wrapper model:

serving_model = TFServingViT(model)
tf.saved_model.save(serving_model, "tf_serving_vit/1")

Deployment with TensorFlow Serving

Install TF Serving and start the server pointing to the exported model directory:

docker run -p 8501:8501 --name tf_serving_vit \
  --mount type=bind,source=$(pwd)/tf_serving_vit,target=/models/vit \
  -e MODEL_NAME=vit -t tensorflow/serving

Querying the REST Endpoint

Send a POST request with a base64-encoded image:

curl -X POST http://localhost:8501/v1/models/vit:predict \
  -d '{"instances": [{"b64": "<base64_image>"}]}'

Querying the gRPC Endpoint

Using the TensorFlow Serving gRPC API, you can send prediction requests in a similar manner.

Wrapping Up

This approach allows you to deploy Hugging Face TensorFlow vision models with all necessary processing baked in, making them easy to consume via standard endpoints. For the complete code, refer to the accompanying Colab notebook.

Additional References