AI Technology Empower Users to Choose Their Preferred Sounds in Noise-canceling Headphone

Home Research AI Technology Empower Users to Choose Their Preferred Sounds in Noise-canceling Headphone
deep-learning algorithms

Most individuals familiar with noise-canceling headphones understand the significance of filtering out specific sounds at specific moments. For instance, someone may desire to eliminate car honks while working indoors but not when strolling along bustling streets. Users typically lack the ability to choose which sounds their headphones cancel.


A group of researchers from the University of Washington, led by a team, has devised advanced deep-learning algorithms that empower users to actively select the sounds they want to pass through their headphones in real-time. This innovative system is termed “semantic hearing.” The headphones transmit recorded audio to a linked smartphone, which then cancels out all ambient sounds.


Through either vocal commands or a smartphone application, headphone users can handpick the sounds they wish to include from a list of 20 categories, including sirens, baby cries, speech, vacuum cleaners, and bird chirps. Only the chosen sounds will be relayed through the headphones.


The team unveiled their discoveries on November 1 at UIST ’23 in San Francisco, with future plans to introduce a commercial version of the system.


Senior author Shyam Gollakota, a professor in the Paul G. Allen School of Computer Science & Engineering at UW, emphasized the real-time intelligence required to discern and extract specific sounds from the surrounding environment. He explained that existing noise-canceling headphones face challenges in synchronizing the sounds with users’ visual senses, necessitating neural algorithms capable of processing sounds within a hundredth of a second.


Due to this time constraint, the semantic hearing system processes sounds on a device like a connected smartphone, rather than relying on more powerful cloud servers. The system must preserve the temporal delays and spatial cues associated with sounds arriving from different directions, ensuring users can still perceptively engage with their surroundings.


Tested in various settings, such as offices, streets, and parks, the system successfully isolated target sounds like sirens, bird chirps, and alarms while eliminating all other background noise. Participants, when rating the system’s audio output for the target sounds, reported an average improvement in quality compared to the original recordings.


The system faced challenges in distinguishing between sounds sharing similar properties, such as vocal music and human speech. The researchers suggest that refining the models through additional real-world data could enhance these outcomes. Co-authors on the paper include Bandhav Veluri and Malek Itani, both doctoral students at UW; Justin Chan, a former doctoral student at the Allen School now at Carnegie Mellon University; and Takuya Yoshioka, research director at AssemblyAI.