Immersion Networks: the future of how audio will be experienced
ANOUK DYUSSEMBAYEVA | FEBRUARY, 23 / 2023
Photos provided by Paul Hubert
On their website and social media, Immersion Networks explains what they do with one sentence: they create software and hardware to reframe the human listening experience.

Crunchbase goes further to say that Immersion Networks is a Washington-based R&D laboratory, created in 2014 by industry-leading audio experts and engineers with hundreds of patents up their sleeves and played an instrumental role in the evolution of audio technology.

But before my interview I had no idea what I was about to hear – this, my friends, is the future of audio.
Paul Hubert landed his first contract with Prince when he was 20. At the time, the renowned artist was building a new studio in Minneapolis and wanted to bring the most technically advanced equipment there. To do so, he hired Paul, who just left Apple and was playing with computers and their applications in music. "I came up with a way to integrate computers, drum machines, and synthesizers so he could get access to them from his favorite spot to write in the studio," he explains. "I helped him discover which synthesizers did what sounds and start building out his studio of really amazing [and] innovative things."

With that, Paul was also busy creating his own studio, which soon turned into a recording, mixing, and mastering business. Although it was successful and he enjoyed the technical process behind it, the entrepreneur soon realized that the service wasn't scalable – there was only so much time he could devote to mixing and mastering.

Like with everything that he does, Hubert found the solution in tech: he started using computers to develop his own, better equipment. For instance, his company just received the task to go through 400,000 recordings of Russian classical music and restore them from analog master tapes and Master them into CDs.

Looking for a fix that would allow him to save time, Paul turned to neural networks in audio to remove the background noise. "We built the first back-propagating neural network to removal for noise for tape," he tells me, adding that back then removing tape hiss was a big deal. "We were lucky to have some really powerful signal processing on our hands … that had 56 DSP running in parallel, and we could do our modeling with it. We kept evolving this so it could run many processes at the same time."

Before he knew it, Hubert was building his next company, Neural Audio, in 2000 based on using that architecture. On top of being able to remove tape hiss, the team trained the Neural Networks to maintain dynamic range and spectrum, which allowed Paul to automate some of the most time-consuming studio processes.
"It was … the first use of AI in professional audio."
On the brink of the 21st century, MP3 had taken off and was the next big thing. The only dilemma was that there were still slow internet connections, which limited its applications and possibilities. Paul put an audio codec – a program or a device that encodes and decodes a digital data stream, which in turn encodes and decodes audio files – as part of the neural network loop. If before the neural network did an analysis and corrected it, the codec would look at each domain, find cross-correlations between the passes and come up with a way to make the codec output sound the same as the input if it was encoded with the pre-bias. That allows it to adapt to the various challenges of content in real time.

"That ended up being a genius signal processing thing that saved bits: We could train it, put it on a broadcast and … if you were using compressed audio, it either [became] more efficient, or sounded a lot better at the same data rate," the founder says. That was revolutionary since it allowed companies to run higher-quality audio at a limited bandwidth, and the music industry quickly grabbed hold of this technology. Radio Central bought an exclusive license for the internet.
"That was my first eureka: we could build a box that could do all of this process on its own."
Later, two satellite radio companies decided to launch a satellite to distribute audio through compressed audio. Hubert and his team knew that these gigantic space objects have fixed bandwidth and can only have so much spectrum. Four megabits, to be precise. How many channels can you put on that? Getting the codec efficiency up, Neural Audio ended up being the signal processing for XM. "Our initial proposal was: 'what if you launch with a hundred channels and still sound better than 70?'" Paul shares. "We gave XM a huge advantage at the beginning by just making them sound much better."

He and his co-founder, Robert Reams, ended up selling Neural Audio in 2009 to DTS Inc. for $15 million. By that time, their company had worked with the likes of ESPN, Sony, Universal, Warner Bros, Yamaha, Ford, Honda, Nissan, Vivendi, and SiriusXM. Even though the team came up with other ways to save bits, the industry simply wanted more and more channels to the point where, in Paul's words, the sound quality was getting questionable, and more variety was more crucial than better sound quality. That went against Paul's beliefs and principles, which is why he decided to go down a different path.

Returning back to the audio industry in 2014, Paul was full of ideas. He built microphones and dabbled in attempts at capturing the space in which the sound recording took place. That is how Immersion Networks was born. "Designing rooms that have a lot more space and openness is not something that's prevalent even in today's studios, so having a mic that does this is better," he says. From there, the team has designed rooms, acoustics, and other equipment in order to facilitate the ability to capture emotion and feeling in that particular space.

One of the main issues is that consumer technology, social media, and audio platforms are still stuck in the 1990s when it comes to sound quality. James "JJ" Johnston, one of the founding engineers at Immersion, invented the modern codec (AAC) in 1993 when he was an AT&T Bell Laboratories researcher. This is still the technology that everyone uses, and it hasn't evolved to provide higher audio quality. Due to shortcomings in the standard, immersive audio doesn't work very well with the codec architecture of the 90s.

To support its mission of capturing audio and delivering it in a convenient file size, one of the first things the startup did was develop its own, next-generation codec. "We found that we needed to run things at a higher clock rate," Paul says. "You don't need to run audio at 96 kHz because you can't hear that, but as you run with a higher clock, you end up with less latency in the system. [It] ended up being more efficient than any other codec."
At the moment, the music industry is just getting the first taste of immersive spatial audio. The dominant players like Spotify are still providing heavily compressed flat stereo audio, and only Apple has embraced spatial. While AirPods have immersive audio and capture your head position, as Paul says, they have way too much latency to give the user a sense of immersion. "It's tough being years ahead of everybody understanding," he smiles, and I hear the melancholy in his voice. "We have dozens of patents, but we're taking the long road."

Because compression is a palpable impediment in spatial audio as it takes away half of the experience (the current codec assumes that the sound it is tossing out is unnecessary), Immersion Networks built its own pipeline. "We put all the pieces of the puzzle to complete the experience," Hubert continues. Having an entire spatial toolbox, the company can create better Dolby Atmos mixes from mono, stereo, or multitrack sources. Moreover, Immersion Networks has collaborated with large catalogs and brought companies to "further their catalogs into the future." For instance, the company recently remixed the iconic Johnny Cash's debut album with its process.

The founder and his team are particularly excited about the metaverse since it lets one capture the experience and transmit it or have it bottled forever. Even though the metaverse is scheduled for sometime in the near future, Immersion Networks already has the necessary audio infrastructure to have it up and running. "We have these technologies and [now] it's just finding licensing opportunities for strategic partnerships," he conveys.

Concluding our almost two-hour conversation, we talk about the future of audio and the mission of improving the music listening experience. It's a long road, but a rewarding one. "We are here to create experiences, whether it's providing tools or a system for delivery … and this is just the tip of the iceberg," Paul concludes, smiling.
Made on
Tilda