The Hugging Face datasets library, traditionally used for text data, now offers robust support for images. This blog post demonstrates how to combine datasets with Faiss and sentence-transformers to create a powerful image search application.
We start with the "Digitised Books - Images identified as Embellishments" dataset from the British Library, which contains historical images extracted from OCR output. Using datasets, we load and process these images efficiently, then compute embeddings with a sentence-transformers model (e.g., CLIP). These embeddings are indexed with Faiss for fast similarity search.
The workflow includes:
- Installing required libraries:
datasets,pillow,sentence-transformers,faiss, andrich. - Creating a dataset with the
Imagefeature, which accepts file paths, byte data, or PIL images. - Adding a Faiss index to the dataset for efficient retrieval.
- Running text queries (e.g., "a cat sitting on a chair") to find visually similar images from the collection.
The final result is a functional image search demo that can be deployed as a Hugging Face Space. This approach showcases the versatility of datasets beyond text and opens up possibilities for other multimedia search applications.