The Technology Innovation Institute (TII) has released Falcon 180B, the largest openly available language model to date. With 180 billion parameters and trained on 3.5 trillion tokens, it outperforms previous open models like Llama 2 and rivals proprietary systems such as PaLM-2.
Falcon 180B builds on the architecture of its predecessor, Falcon 40B, using innovations like multiquery attention for scalability. The model was trained on up to 4,096 GPUs simultaneously, consuming 7 million GPU hours. Its dataset is primarily web data from RefinedWeb, with curated sources like technical papers and code making up the remainder.
On benchmarks, Falcon 180B scores 67.85 on the Open LLM Leaderboard, tying Llama 2 70B and surpassing GPT-3.5 on MMLU. It remains behind GPT-4 but sets a new standard for open models. Community fine-tuning is expected to further boost performance.
Falcon 180B is available in base and chat versions on Hugging Face. However, its license imposes restrictive conditions for commercial use, excluding 'hosting use.' Users should review the license carefully.
Hardware requirements are steep: inference in FP16 demands 640 GB of memory (e.g., 8× A100 80 GB), while full fine-tuning requires 5 TB. Quantization (4-bit) reduces inference memory to 320 GB.
To use via Transformers, install version 4.33+. The chat model follows a simple prompt format:
System: Add an optional system prompt here
User: This is the user input
Falcon: This is what the model generates
Interact with Falcon 180B now on the official demo space.