Training machine learning models has become quite simple, especially with the rise of pre-trained models and transfer learning. Deploying models in production, however, often requires jumping through hoops: packaging in a container, provisioning infrastructure, creating a prediction API, securing, scaling, and monitoring—all taking valuable time away from actual machine learning work.
Hugging Face Inference Endpoints aim to fix this problem. They let you deploy models directly from the Hugging Face hub to managed infrastructure on your favorite cloud in just a few clicks—simple, secure, and scalable.
Deploying a Model on Inference Endpoints
In this example, we deployed a Swin image classification model fine-tuned on the food101 dataset using AutoTrain. Starting from the model page, click Deploy and select Inference Endpoints. This takes you to the endpoint creation page, where you can choose the latest revision, a GPU instance, region, autoscaling, or even a custom container.
Next, choose who can access your endpoint from three options:
- Public: Accessible to anyone on the internet without authentication.
- Protected: Requires an organization token for access.
- Private: Only accessible within your AWS account via a VPC Endpoint using AWS PrivateLink.
Deploying a Protected Inference Endpoint
Select Protected and click Create Endpoint. After a few minutes, the endpoint is running and its URL is visible. You can test it immediately via the inference widget by uploading an image, or invoke it with Python code using your Hugging Face API token.
import requests, json
API_URL = "https://oncm9ojdmjwesag2.eu-west-1.aws.endpoints.huggingface.cloud"
headers = {
"Authorization": "Bearer MY_API_TOKEN",
"Content-Type": "image/jpg"
}
def query(filename):
with open(filename, "rb") as f:
data = f.read()
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
output = query("food.jpg")
The predicted result matches the widget output. The Analytics and Logs tabs provide metrics and detailed logs for debugging.
Deploying a Private Inference Endpoint
Select Private and enter your AWS account ID. After creation, the UI displays a VPC service name (e.g., com.amazonaws.vpce.eu-west-1.vpce-svc-07a49a19a427abad7). Then, in the AWS console, create a VPC endpoint by entering the service name, selecting the VPC and subnets, and attaching a Security Group—following the Inference Endpoints documentation.
Once set up, your private endpoint is secure and accessible only from your specified VPC.
Now It's Your Turn!
Inference Endpoints make model deployment straightforward. Try it yourself on the Hugging Face hub and let us know what you build.