UW News

July 27, 2023

Q&A: UW researcher discusses just how much energy ChatGPT uses

UW News

A hand holding a phone that has ChatGPT on the screen

Training a large language model, such as ChatGPT, uses on average roughly equivalent to the yearly electricity consumption of over 1,000 U.S. households, according to Sajjad Moazeni, UW assistant professor of electrical and computer engineering, who studies networking for AI and machine learning supercomputing.Sanket Mishra/Unsplash

UPDATE Aug. 2, 2023: This story has been updated to correct the total number of U.S. households whose energy consumption is equivalent to daily ChatGPT queries.

ChatGPT and other large language models learn to mimic humans by analyzing huge amounts of data. Behind any chatbot’s text box is a large network of computer processing units that support training and running these models.

How much energy do networks running large language models consume? A lot, according to Sajjad Moazeni, a University of Washington assistant professor of electrical and computer engineering, who studies networking for AI and machine learning supercomputing. Just training a chatbot can use as much electricity as a neighborhood consumes in a year.

UW News sat down with Moazeni to learn more.

How do large language models, such as ChatGPT, compare to cloud computing energy-wise?

Sajjad Moazeni headshot

Sajjad MoazeniUniversity of Washington

Sajjad Moazeni: These models have become so large that you need thousands of processors to both train the models and then support the billions of daily queries by users. All this computing can only take place in a data center.

In comparison, conventional cloud computing workloads, such as online services, databases and video streaming, are far less computationally intensive, and require orders of magnitude less memory usage.

Can you describe these data centers?

SM: In today’s data centers, there are hundreds of thousands of processing units that can talk to each other using a large number of optical fibers and network switches. These processors (in addition to memory and storage devices) are stored in server racks. There is also internal infrastructure for cooling down the servers (with water and air) and units to generate and distribute power.

There are hundreds of such data centers across the world and they are mainly managed by big tech companies like Amazon, Microsoft and Google.

How much energy do these large data centers use to run these large language models?

SM: In terms of training a large language model, each processing unit can consume over 400 watts of power while operating. Typically, you need to consume a similar amount of power for cooling and power management as well. Overall, this can lead to up to 10 gigawatt-hour (GWh) power consumption to train a single large language model like ChatGPT-3. This is on average roughly equivalent to the yearly electricity consumption of over 1,000 U.S. households.

Today there are hundreds of millions of daily queries on ChatGPT, though that number may be declining. This many queries can cost around 1 GWh each day, which is the equivalent of the daily energy consumption for about 33,000 U.S. households.

While these numbers might seem OK for now, this is only the beginning of a wide development and adoption of these models. We are expecting that soon many different services will be using this technology daily.

Also, as models become more sophisticated, they get larger and larger, which means the data center energy for training and using these models can become unsustainable. Each big technology company is now trying to develop their own model, and this can lead to a huge training load on data centers.

What are some potential solutions to this issue?

SM: Researchers have been trying to optimize the data center hardware and processors to become more energy efficient for these types of computation.

My group specifically focuses on the networking aspect. In data centers today, processors send electrical signals to bring in or send out the data for computing. But these electrical signals can get distorted. In order to send a lot of data quickly, we need to use a lot of power to make sure the signals can be received correctly.

Moazeni recently won a Google Faculty Award in networking for this research.

We are building the next generation of optical interconnect solutions, which include converting these electrical signals to optical signals. These optical signals have significantly lower loss and this minimizes the energy consumption.

Because we are just in the beginning phases for this new technology, it’s really important for people to be transparent about their results and to create open-source models. This will also help us reach advanced and sustainable solutions.

For more information, contact Moazeni at smoazeni@uw.edu.