MMS • Anthony Alford
Article originally posted on InfoQ. Visit InfoQ
Stability AI released two sets of pre-trained model weights for StableLM, a suite of large language models (LLM). The models are trained on 1.5 trillion text tokens and are licensed for commercial use under CC BY-SA-4.0.
The released models contain 3B and 7B parameters respectively, with larger models soon to come. The new suite of models is a result of Stability’s previous collaboration efforts with EleutherAI; the training dataset is an updated version of EleutherAI’s The Pile dataset, with three times the data used to train EleutherAI’s models. The release also includes versions of the StableLM models that have been fine-tuned on instruction-following and chat datasets, including Stanford’s Alpaca dataset. The fine-tuned models are licensed for non-commercial use only, due to Alpaca’s licensing requirements. According to Stability AI,
With the launch of the StableLM suite of models, Stability AI is continuing to make foundational AI technology accessible to all. Our StableLM models can generate text and code and will power a range of downstream applications. They demonstrate how small and efficient models can deliver high performance with appropriate training…Language models will form the backbone of our digital economy, and we want everyone to have a voice in their design. Models like StableLM demonstrate our commitment to AI technology that is transparent, accessible, and supportive.
The success of generative LLMs such as OpenAI’s GPT-3 spurred the development of smaller open-source models with similar capabilities. In 2022, InfoQ covered the release of EleutherAI’s GPT-NeoX-20B, an open-source 20B parameter LLM; more recently, InfoQ covered Meta’s 7B parameter LLaMA LLM. OpenAI’s release of ChatGPT showed that LLM performance could be improved by fine-tuning them on “instruction-following” datasets, which led to the release of similar models such as Stanford’s Alpaca, a fine-tuned version of LLaMA.
Although only the 3B and 7B parameter StableLM models have been released, Stability AI says that models with 15B, 30B, and 65B parameters are in progress, and a 175B parameter model is planned. The company also says they will be crowd-sourcing an open-source dataset for fine-tuning chatbot assistants, to further the efforts of projects such as OpenAssistant. While Stability AI did not announce any benchmark performance data for the models, they claim “surprisingly high performance in conversational and coding tasks.”
In a Hacker News discussion about the release, one user said:
Selling access to LLMs via remote APIs is the “stage plays on the radio” stage of technological development. It makes no actual sense; it’s just what the business people are accustomed to. It’s not going to last very long. So much more value will be unlocked by running them on device. People are going to look back at this stage and laugh, like paying $5/month to a cell phone carrier for Snake on a feature phone.
Stability’s CEO Emad Mostaque replied to questions about StableLM in an “ask me anything” thread on Twitter. When asked about the hardware used to train the models, he said that they were using “3,000 A100s and 512 TPU v4s.”
Stanislav Fort, LLM lead at Stability, posted a helpful tip on Twitter:
For the early StableLM models, try adding “User: ” to the prompt. Because of the way these models were trained, prepending your evals with “User: ” should make things *much* better.
The code for the StableLM models is available on GitHub. The model weights and a demo chat interface are available on HuggingFace.