MMS • Anthony Alford
The BigScience research workshop released BigScience Large Open-science Open-access Multilingual Language Model (BLOOM), an autoregressive language model based on the GPT-3 architecture. BLOOM is trained on data from 46 natural languages and 13 programming languages and is the largest publicly available open multilingual model.
The release was announced on the BigScience blog. The model was trained for nearly four months on a cluster of 416 A100 80GB GPUs. The training process was live-tweeted, with training logs publicly available throughout for viewing via TensorBoard. The model was trained with a 1.6TB multilingual dataset containing 350B tokens; for almost all of the languages in the dataset, BLOOM is the first AI language model with more than 100B parameters. BigScience is still performing evaluation experiments on the model, but preliminary results show that BLOOM has zero-shot performance on a wide range of natural language processing (NLP) tasks comparable to similar models. According to the BigScience team:
This is only the beginning. BLOOM’s capabilities will continue to improve as the workshop continues to experiment and tinker with the model….All of the experiments researchers and practitioners have always wanted to run, starting with the power of a 100+ billion parameter model, are now possible. BLOOM is the seed of a living family of models that we intend to grow, not just a one-and-done model, and we’re ready to support community efforts to expand it.
Large language models (LLMs), especially auto-regressive decoder-only models such as GPT-3 and PaLM, have been shown to perform as well as the average human on many NLP benchmarks. Although some research organizations, such as EleutherAI, have made their trained model weights available, most commercial models are either completely inaccessible to the public, or else gated by an API. This lack of access makes it difficult for researchers to gain insight into the cause of known model performance problem areas, such as toxicity and bias.
The BigScience workshop began in May of 2021, with over 1,000 researchers collaborating to build a large, multilingual deep-learning model. The collaboration included members of two key organizations: Institute for Development and Resources in Intensive Scientific Computing (IDRIS) and Grand Equipement National De Calcul Intensif (GENCI). These provided the workshop with access to the Jean Zay 28 PFLOPS supercomputer. The team created a fork of the Megatron-DeepSpeed codebase to train the model, which used three different dimensions of parallelism to achieve a training throughput of up to 150 TFLOPs. According to NVIDIA, this is “the highest throughput one can achieve with A100 80GB GPUs.” Training the final BLOOM model took 117 days.
Thomas Wolf, co-founder and CSO of HuggingFace, joined a Twitter thread discussing BLOOM and answered several users’ questions. When asked what compute resources were necessary to use the model locally, Wolf replied:
Right now, 8*80GB A100 or 16*40GB A100 [GPUs]. With the “accelerate” library you have offloading though so as long as you have enough RAM or even just disk for 300GB you’re good to go (but slower).
Although BLOOM is currently the largest open multilingual model, other research groups have released similar LLMs. InfoQ recently reported on Meta’s OPT-175B, a 175B parameter AI language model also trained using Megatron-LM. Earlier this year, EleutherAI open-sourced their 20B parameter model GPT-NeoX-20B. InfoQ also reported last year on BigScience’s 11B parameter T0 model.