DailyGlimpse

Fetch Slashes ML Inference Latency by 50% with Amazon SageMaker and Hugging Face

AI
April 26, 2026 · 4:43 PM
Fetch Slashes ML Inference Latency by 50% with Amazon SageMaker and Hugging Face

Fetch, a consumer engagement and rewards app, has cut its machine learning inference latency by 50% after migrating its ML pipeline to Amazon SageMaker and adopting Hugging Face models. The company processes over 80 million receipts weekly and needed to improve speed and accuracy.

By using Amazon SageMaker for training and inference, along with Hugging Face AWS Deep Learning Containers, Fetch reduced latency by half and improved document understanding accuracy by 200%. The project took 12 months and involved deploying more than five ML models.

"Amazon SageMaker is a game changer for Fetch. We use almost every feature extensively. As new features come out, they are immediately valuable." – Sam Corzine, ML Engineer at Fetch

The company now plans to expand ML to new use cases, leveraging SageMaker's scalability and ease of deployment.