It is well established by now that generative AI systems have demonstrated remarkable capabilities, from generating nuanced human-like text and engaging in complex problem-solving to assisting with code development and creative tasks. Their impact spans across industries, revolutionizing content creation, customer service, and data analysis. The power of these models stems from their sheer size and complexity. LLMs are built on neural networks with an enormous number of parameters – the adjustable elements that allow the model to learn and make predictions. The scale of these models has grown exponentially in recent years, driven by the principle that larger models, trained on more data, generally perform better across a wide range of tasks. This approach, often referred to as “scaling laws” in machine learning, has led to the development of increasingly massive models.
For context, OpenAI’s GPT-3, introduced in 2020, boasts possibly more than 175 billion parameters. More recent models have pushed these boundaries even further, with Google’s PaLM featuring 540 billion parameters and DeepMind’s Gopher reaching 280 billion. Some models, like Wu Dao 2.0 developed by the Beijing Academy of Artificial Intelligence, have even surpassed the trillion-parameter milestone. However, as these models grow increasingly powerful and complex, a significant concern has surfaced: the energy consumption for model training and the subsequent environmental impact. The immense computational resources required to train and operate these massive models translate into considerable energy consumption and, consequently, a larger carbon footprint. Training a model with hundreds of billions of parameters requires vast amounts of electricity, often equivalent to the annual energy consumption of hundreds of households. This growing environmental cost has sparked a crucial debate within the AI community and beyond. Researchers, industry experts, and environmentalists are now grappling with a pressing question: Is it possible to develop more compact, energy-efficient models that can rival the performance of their larger counterparts?
The Giants of AI: A Growing Concern
To understand the significance of this question, we must first grasp the scale of energy consumption associated with today’s leading LLMs. Take GPT-3, for instance, one of the most well-known language models. According to a study [1], its training process alone consumed an estimated 1,287 MWh of electricity – equivalent to the annual energy use of 120 average American households.
While more recent models like Meta’s Llama 3.2 have shown improvements in efficiency. According to Meta’s model card, both its 1 billion and 3 billion parameter variants consumed just over 581 MWh combined to train – about half the energy required for GPT-3 [2]. To put it another way, training Llama 3 used roughly as much energy as a large airliner consumes during a 7-hour flight. The estimated total greenhouse-gas emissions resulting from Llama 3.2’s training was 240 tons CO2eq, however almost 100% of the electricity was based on renewable energy, making it largely carbon neutral. As AI continues to advance, some projections are alarming. The study projected that by 2027 the AI sector’s annual energy consumption could rival that of entire countries like the Netherlands [1].
While much attention is paid to the energy required for training these massive models, the ongoing energy costs of inference – using the model to generate outputs – are also significant. Research done at Northeastern University reports that the 65-billion parameter version of the LLaMA model consumes about 4 Joules per output token [2]. This means that generating a passage of about 250 words requires approximately 0.00037 kilowatt-hours of energy.
Note: Only model output tokens were considered for this statistical illustration.
The Promise of Smaller Models
In response to these challenges, researchers are exploring ways to create smaller, more efficient models that can deliver comparable performance to their larger counterparts. This shift towards “small models with big impact” is driven by both environmental concerns and practical considerations.
One promising approach comes from researchers at UC Santa Cruz, who developed a method to run a billion-parameter-scale language model on just 13 watts of power – about the same energy required to power a light bulb [3]. This represents a more than 50-fold improvement in efficiency compared to typical hardware. In this study, researchers focused on eliminating computationally expensive operations like matrix multiplication during model training while maintaining performance. Some are exploring efficient parameterizations and meta-learning techniques to reduce the need for ever-larger models.
Another relatively smaller model named Orca was trained on a student-teacher mechanism, which involved imitating the reasoning processes of the strong GPT-4 model as a teacher [4]. Orca has 13 billion parameters and was trained with synthetic data generated by gpt-3.5-turbo and GPT-4 using 20 NVIDIA A100 chips for a total of 200 hours. While it’s significantly smaller than any GPT model, it still outperformed gpt-3.5-turbo (ChatGPT) model when evaluated with GPT-4, while only consuming approximately 1,600 kWh of energy during its training process.
Yet another example is the Phi-1.5 model, which has merely 1.3 billion parameters and is found to outperform models 5x its size on natural language tasks [5]. The Phi-1.5 model, trained on an A100 chip, consumed approximately 600 kWh of energy. While Phi-1.5’s performance compared to other larger models is controversial and somewhat disputed, it’s undeniable that the model trained on a domain-specific dataset can be a significant value proposition and tremendously cost-effective.
Creating Better Performing Smaller Models
The pursuit of more efficient models isn’t just about reducing size – it’s about maintaining or even improving performance. Various techniques to create application-specific smaller models that can outperform their larger counterparts are a reality. Some of the key approaches include:
- Fine-tuning: By taking a pre-trained model and further training it on domain-specific data, researchers can create smaller models that excel in particular domain-specific applications.
- Knowledge distillation: This technique involves training a smaller model to mimic the behavior of a larger, more complex model, effectively distilling the knowledge into a more compact form.
- Pruning and quantization: These methods involve removing unnecessary parameters and reducing the precision of the remaining ones, resulting in smaller models with minimal performance loss.
- Neural architecture search: This approach uses machine learning techniques to automatically design optimal neural network architectures for specific tasks, potentially leading to more efficient, task-specific models.
Conclusion
As we look to the future of AI language models, it’s clear that finding the right balance between performance and sustainability will be crucial. While large models have demonstrated impressive capabilities, the environmental and economic costs associated with training and running them are becoming increasingly difficult to ignore. The development of smaller, more efficient models offers a promising path forward. By leveraging advanced techniques in model compression, knowledge distillation, and hardware optimization, AI practitioners aim to create models that can deliver comparable results to their larger counterparts while consuming significantly less energy.
This shift towards more sustainable AI doesn’t just benefit the environment – it also has the potential to democratize access to powerful language models. Smaller, more efficient models could be run on less powerful hardware, making advanced AI capabilities accessible to a wider range of organizations and individuals.
As we stand at the crossroads of AI innovation and environmental responsibility, the pursuit of smaller, more efficient language models represents a crucial step towards a more sustainable future. By focusing on creating “small models with big impact,” the AI community can continue to push the boundaries of what’s possible while also addressing the pressing need for more environmentally friendly technologies.
The challenge ahead is clear: to develop models that can match or exceed the performance of today’s largest language models while dramatically reducing their energy footprint. As research continues to make strides in this direction, we move closer to a future where powerful AI tools can be deployed widely and responsibly, without compromising on performance or placing undue strain on our planet’s resources.
In the end, the true measure of AI’s success may not just be its ability to process language or generate human-like text, but its capacity to do so in a way that’s sustainable for both our digital and natural environments. The ongoing research into smaller, more efficient models promises to play a crucial role in shaping this sustainable AI future.
References
- Alex de Vries. The growing energy footprint of artificial intelligence. Oct 2023. CellPress, Joule, Volume 7, Issue 10. https://doi.org/10.1016/j.joule.2023.09.004
- Samsi, Siddharth, Dan Zhao, Joseph McDonald, Baolin Li, Adam Michaleas, Michael Jones, William Bergeron, Jeremy Kepner, Devesh Tiwari, and Vijay Gadepally. From words to watts: Benchmarking the energy costs of large language model inference. In 2023 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1-9. IEEE, 2023. https://doi.org/10.1109/HPEC58863.2023.10363447
- Zhu, Rui-Jie, Yu Zhang, Ethan Sifferman, Tyler Sheaves, Yiqiao Wang, Dustin Richmond, Peng Zhou, and Jason K. Eshraghian. Scalable MatMul-free Language Modeling. arXiv preprint arXiv:2406.02528 (2024). https://doi.org/10.48550/arXiv.2406.02528
- Mukherjee, Subhabrata, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, and Ahmed Awadallah. Orca: Progressive learning from complex explanation traces of gpt-4. arXiv preprint arXiv:2306.02707 (2023). https://doi.org/10.48550/arXiv.2306.02707
- Li, Yuanzhi, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, and Yin Tat Lee. Textbooks are all you need ii: phi-1.5 technical report. arXiv preprint arXiv:2309.05463 (2023). https://doi.org/10.48550/arXiv.2309.05463
Anjanava Biswas is an award-winning Senior AI Specialist Solutions Architect at Amazon Web Services (AWS), a public speaker, and author with more than 16 years of experience in enterprise architecture, cloud systems, and transformation strategy. He is dedicated to artificial intelligence, machine learning, and generative AI research, development, and innovation projects for the past seven years, working closely with organizations from the healthcare, financial services, technology startup, and public sector industries. Biswas holds a Bachelor of Technology degree in Information Technology and Computer Science and is a TOGAF certified enterprise architect; he also holds 7 AWS Certifications. Biswas is a Senior IEEE member, and a Fellow at IET (UK), BCS (UK), and IETE (India). Connect with Anjanava Biswas at anjan.biswas@IEEE.org or on LinkedIn.
Disclaimer: The author is completely responsible for the content of this article. The opinions expressed are their own and do not represent IEEE’s position nor that of the Computer Society nor its Leadership.