November 14, 2024
AI headphones create a ‘sound bubble,’ quieting all sounds more than a few feet away
Imagine this: You’re at an office job, wearing noise-canceling headphones to dampen the ambient chatter. A co-worker arrives at your desk and asks a question, but rather than needing to remove the headphones and say, “What?”, you hear the question clearly. Meanwhile the water-cooler chat across the room remains muted. Or imagine being in a busy restaurant and hearing everyone at your table, but reducing the other speakers and noise in the restaurant.
A team led by researchers at the University of Washington has created a headphone prototype that allows listeners to create just such a “sound bubble.” The team’s artificial intelligence algorithms combined with a headphone prototype allow the wearer to hear people speaking within a bubble with a programmable radius of 3 to 6 feet. Voices and sounds outside the bubble are quieted an average of 49 decibels (approximately the difference between a vacuum and rustling leaves), even if the distant sounds are louder than those inside the bubble.
The team published its findings Nov. 14 in Nature Electronics. The code for the proof-of-concept device is available for others to build on. The researchers are creating a startup to commercialize this technology.
Related:
“Humans aren’t great at perceiving distances through sound, particularly when there are multiple sound sources around them,” said senior author Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering. “Our abilities to focus on the people in our vicinity can be limited in places like loud restaurants, so creating sound bubbles on a hearable has not been possible so far. Our AI system can actually learn the distance for each sound source in a room, and process this in real time, within 8 milliseconds, on the hearing device itself.”
Researchers created the prototype with commercially available noise-canceling headphones. They affixed six small microphones across the headband. The team’s neural network — running on a small onboard embedded computer attached to the headphones — tracks when different sounds reach each microphone. The system then suppresses the sounds coming from outside the bubble, while playing back and slightly amplifying the sounds inside the bubble (because noise-canceling headphones physically let some sound through).
“We’d worked on a previous smart-speaker system where we spread the microphones across a table because we thought we needed significant distances between microphones to extract distance information about sounds,” Gollakota said. “But then we started questioning our assumption. Do we need a big separation to create this ‘sound bubble’? What we showed here is that we don’t. We were able to do it with just the microphones on the headphones, and in real-time, which was quite surprising.”
To train the system to create sound bubbles in different environments, researchers needed a distance-based sound dataset collected in the real-world, which was not available. To gather such a dataset, they put the headphones on a mannequin head. A robotic platform rotated the head while a moving speaker played noises coming from different distances. The team collected data with the mannequin system as well as with human users in 22 different indoor environments, including offices and living spaces.
 
The team created a prototype using off the shelf headphones fitted with microphones, pictured here.Chen et al./Nature Electronics
Researchers have determined that the system works for a couple of reasons. First, the wearer’s head reflects sounds, which helps the neural net distinguish sounds from various distances. Second, sounds (like human speech) have multiple frequencies, each of which goes through different phases as it travels from its source. The team’s AI algorithm, the researchers believe, is comparing the phases of each of these frequencies to determine the distance of any sound source (a person talking, for instance).
Headphones like Apple’s AirPods Pro 2 can amplify the voice of the person in front of the wearer while reducing some background noise. But these features work by tracking head position and amplifying the sound coming from a specific direction, rather than gauging distance. This means the headphones can’t amplify multiple speakers at once, lose functionality if the wearer turns their head away from the target speaker, and aren’t as effective at reducing loud sounds from the speaker’s direction.
The system has been trained to work only indoors, because getting clean training audio is more difficult outdoors. Next, the team is working to make the technology function on hearing aids and noise-canceling earbuds, which requires a new strategy for positioning the microphones.
Additional co-authors are Malek Itani and Tuochao Chen, UW doctoral students in the Allen School; Sefik Emre Eskimez, a senior researcher at Microsoft; and Takuya Yoshioka, director of research at AssemblyAI. This research was funded by a Moore Inventor Fellow award, a UW CoMotion Innovation Gap Fund and the National Science Foundation.
For more information, contact soundbubble@cs.washington.edu.
Tag(s): College of Engineering • Paul G. Allen School of Computer Science & Engineering • Shyam Gollakota