All the sessions from Transform 2021 are available on-demand now. Watch now.
Google today detailed SoundStream, an end-to-end “neural” audio codec that can provide higher-quality audio while encoding different sound types including clean speech, noisy and reverberant speech, music, and environmental sounds. The company claims that it’s the first AI-powered codec to work on speech and music, while at the same time being able to run in real time on a smartphone processor.
Audio codecs compress audio to reduce the need for high storage and bandwidth requirements. Ideally, the decoded audio should be perceptually indistinguishable from the original and introduce little latency. While most codecs leverage domain expertise and carefully-engineered signal processing pipelines, there’s been interest in replacing handcrafted specs with AI that can learn to encode on the fly.
Earlier this year, Google released Lyra, a neural audio codec trained to compress low-bitrate speech. SoundStream extends this work with a system consisting of an encoder, decoder, and quantizer. The encoder converts audio into a coded signal that’s compressed using the quantizer and converted back to audio using the decoder. Once trained, the encoder and decoder can be run on separate clients to transmit audio over the internet, and the decoder can operate at any bitrate.
Compressing audio
In traditional audio processing pipelines, compression and enhancement — i.e., the removal of background noise — are typically performed by different modules. But SoundStream is designed to carry out compression and enhancement at the same time. At 3 kbps, Google claims that SoundStream outperforms the popular Opus codec at 12 kbps and approaches the quality of EVS at 9.6 kbps while using 3.2 to 4 times fewer bits. Moreover, SoundStream performs better than the current version of Lyra when compared at the same bitrate.
Here’s reference audio before processing with SoundStream:
And here’s the audio after processing:
Google cautions that SoundStream is still in the experimental stages. However, the company plans to release an updated version of Lyra that incorporates its components to deliver both higher audio quality and “reduced complexity.”
“Efficient compression is necessary whenever one needs to transmit audio, whether when streaming a video, or during a conference call. SoundStream is an important step towards improving machine learning-driven audio codecs. It outperforms state-of-the-art codecs, such as Opus and EVS, can enhance audio on demand, and requires deployment of only a single scalable model, rather than many,” Google research scientist Neil Zeghidour and staff research Marco Tagliasacchi wrote in a blog post. “By integrating SoundStream with Lyra, developers can leverage the existing Lyra APIs and tools for their work, providing both flexibility and better sound quality.”
VentureBeat
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.
Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more