Meta AI's fastText library is now officially integrated with the Hugging Face Hub, making word vectors for 157 languages and a language identification model easily accessible. The open-source library, known for efficient text representation and classification using techniques like bag of words, subword information, and hierarchical softmax, can now be downloaded with a few commands.
Users can find pre-trained English word vectors and the language identification model under the Meta AI organization on Hugging Face. The integration also includes widgets for text classification and feature extraction, allowing hands-on testing directly on the model pages.
To load a pre-trained word vectors model:
import fasttext
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id="facebook/fasttext-en-vectors", filename="model.bin")
model = fasttext.load_model(model_path)
For nearest neighbor queries:
model_path = hf_hub_download(repo_id="facebook/fasttext-en-nearest-neighbors", filename="model.bin")
model = fasttext.load_model(model_path)
model.get_nearest_neighbors("bread", k=5)
Language detection is also straightforward:
model_path = hf_hub_download(repo_id="facebook/fasttext-language-identification", filename="model.bin")
model = fasttext.load_model(model_path)
model.predict("Hello, world!")
This integration, built on the huggingface_hub library, opens the door for other libraries to easily join the Hub ecosystem.