MusicGen, a powerful music generation model from Facebook, can be deployed as an API using Hugging Face's Inference Endpoints. This guide walks through creating a custom handler to serve the model, enabling text-to-music generation with just a few steps.
MusicGen takes a text prompt and optionally a melody to generate audio. While transformers pipelines handle many models out of the box, MusicGen requires a custom inference function. Hugging Face Inference Endpoints support such custom handlers, allowing deployment of any model.
Step-by-Step Deployment
-
Duplicate the MusicGen repository – Use the repository duplicator to copy
facebook/musicgen-largeto your account. -
Add custom files – Upload
handler.pyandrequirements.txtto the duplicated repo.handler.py: Defines anEndpointHandlerclass that loads the model and processor, then processes requests. It uses half-precision (FP16) for efficiency.requirements.txt: Lists dependencies liketransformers==4.31.0andaccelerate>=0.20.3.
-
Create an Inference Endpoint – On the Inference Endpoints page, select your repo, choose a GPU instance with at least 16 GB RAM, and deploy. The endpoint will be ready to accept requests.
Querying the Endpoint
Use curl or the huggingface-hub library to send prompts:
curl URL_OF_ENDPOINT \
-X POST \
-d '{"inputs":"happy folk song, cheerful and lively"}' \
-H "Authorization: Bearer {YOUR_TOKEN}" \
-H "Content-Type: application/json"
The response contains generated audio as a list of floats.
Conclusion
With Inference Endpoints, deploying custom models like MusicGen is straightforward. This approach can be extended to any model without a built-in pipeline. MusicGen's capabilities open doors for creative applications in music generation.
For more details, explore the full guide on the Hugging Face blog.