Hugging Face has released a suite of tools enabling developers to run large language models (LLMs) like Llama 2 directly on Apple devices using Core ML. The new open-source package, swift-transformers, provides a transformers-like API in Swift, along with tokenization, model wrappers, and generation algorithms. A companion app, swift-chat, demonstrates the integration.
Alongside the Swift package, Hugging Face has updated its Core ML conversion tools, including exporters and a no-code converter (transformers-to-coreml), making it easier to convert models from PyTorch to Core ML. Pre-converted models for Llama 2 and Falcon are available on the Hugging Face Hub.
The initiative aims to lower barriers for iOS and macOS developers who want to incorporate AI capabilities into their apps. By providing building blocks for common tasks such as tokenization and autoregressive generation, the community can focus on user experience rather than the underlying machine learning complexity.
Switching to Core ML allows models to run efficiently on Apple Silicon, leveraging the Neural Engine and on-device processing for faster inference and privacy benefits. The alpha release invites Swift developers to contribute feedback and improvements.