Dolby Demos an "Ear Opening" Technology for Audio ConferencingDolby Demos an "Ear Opening" Technology for Audio Conferencing
What will sell this is the recognition that better sound means better and more productive conferences.
January 25, 2012
What will sell this is the recognition that better sound means better and more productive conferences.
Much has been written about the wonders of high-definition telepresence systems for video teleconferencing, but the vast majority of the working world spends far more time on audio conferences than they do on video conferences. Unfortunately, with the exception of marginally better conference phones and wideband audio, no one has done much to inch the sound quality for audio conferencing above what we’ve had on the public telephone network for the past hundred years.
Wideband codecs like G.722 can increase the audio bandwidth (i.e. the range of frequencies reproduced), but that does nothing to address problems like noisy drops, uneven sound levels, or background noise that do so much to make many audio conferences an experience in frustration. For organizations that increasingly depend on audio conferences a key part of doing business, you're not boosting productivity when users are so isolated they turn on speakerphone, mute the microphone, and pick up the Wall Street Journal.
Enter Dolby Laboratories, who is now using their pioneering audio technology to deliver a type of audio conferencing the likes of which you have never heard. At IBM’s Lotusphere in Orlando, Dolby offered analysts a demonstration of an as yet "unnamed" technology called simply the "Dolby audio conferencing solution", and the difference was striking.
While virtually everyone has heard of Dolby and knows they have something to do with high quality sound, understanding is a little cloudy beyond that. The Dolby logo appears at the end of every movie you see and on the bottom of every home stereo, game player, and mobile electronic device you buy that is capable of producing sound.
The key to Dolby's business is that they do not make consumer products, but rather they license their technology to the people who capture and reproduce the sound (that's why there’s a Dolby logo on your iPad). That technology was first adopted in the recording and motion picture industries, but has now been adopted in the gaming industry as well. Dolby's first product was Type A Dolby Noise Reduction, a compandor designed to eliminate the "hiss" in audio tape recording. Their real growth came with technology to improve the audio quality of motion pictures; Stanley Kubrick’s "A Clockwork Orange" was the first movie produced with Dolby sound.
According to Dr. Mike Hollier, Vice President-Voice Platforms for Dolby, the Dolby audio conferencing solution is made up of a conferencing server and a softphone client that can run on either a Windows or a Mac PC. While Dolby doesn’t describe it this way, the four major enhancements they can be categorized as:
* Wideband, Natural-Sounding Codec
* Ambient/Channel Induced Noise Cancellation
* Automatic Level-Adjusting Full Duplex Audio Bridging
* Spatial Sound Environment
Let's take a look at what's involved in each of these.
Wideband Natural-Sounding Codec
Anyone who has studied the physics of the telephone network will know that it was a design based on compromise. The human ear can detect frequencies in the range of 20 Hz-20,000 Hz, yet to maximize the number of "acceptable quality" channels they could carry, a traditional G.711 telephone codec filters out the frequencies above about 3100 Hz. Newer wideband codecs like G.722 capture and reproduce frequencies up to about 7,000 Hertz. Both of these codec standards require a 64 Kbps digital channel, though the wideband G.722.2 Adaptive Multi-Rate Wideband (AMR-WB) standard can get that bit rate down as low as 16 Kbps.
Dolby's codec reproduces natural-sounding human speech at frequencies up to around 8,000 Hertz with a bit rate around 40 Kbps with the ability to go higher if needed.
Ambient/Channel Induced Noise Cancellation
Nothing is more tedious on an audio conference than a noisy drop or a participant who is calling in from a noisy location. In the Dolby solution, the bridge that mixes the audio streams together can detect and cancel either channel noise or ambient noise at any location. In the demo, one of the participants was seated next to a TV set that was blaring the Giants-Packers football game, and none of that background noise could be heard by any of the other users.
Automatic Level-Adjusting Full Duplex Audio Bridging
As much as noise, the other annoying audio artifact with conference bridging is uneven sound levels. The Dolby solution uses some cool technology to ensure that all speakers come out not only noise free, but at equal volume regardless of the input.
The other feature that the bridge adds is the ability for multiple speakers to talk at the same time. In the traditional audio conference, when one party speaks, other talkers are effectively locked out. Participants often try multiple times to say their piece and then simply give up trying. Being able to break in at any time or to hear everyone laughing at a joke adds tremendously to the life-like experience the solution delivers.
Spatial Sound
The element that most impressed me was the spatial sound effect. By manipulating the stereo capability, the bridge can make each speaker's voice sound as if it’s coming at you from a specific direction (front-left, front-right, directly ahead, etc.). That directional information, along with the natural clarity of the sound, makes it especially easy to distinguish one speaker from another. To hear what spatial sound sounds like, listen to Virtual Barber Shop on YouTube. You have to use headphones to get the full effect, but it is awesome.
While the Dolby solution packs a lot of high-end technology, the most impressive thing about it is that after a few moments you forget about it and start conversing naturally as if you were all sitting around a table. Essentially all of the stuff that makes audio conferencing so unnatural (and unproductive) seems to disappear and you just start talking (and laughing, and joking, and interjecting, etc.) just like you would in person.
There are two main obstacles Dolby will have to overcome. The first is that all of the users must wear stereo headsets to get the full benefit. For those of us who routinely use a speakerphone to participate in conference calls this will mean a change in behavior. We will have to see if people are willing to make that concession in order to have this radically better sound quality. For a cubicle worker who’s been holding a handset to their ear all of these years, this will be a godsend.
The other obstacle is that this isn't a product. Dolby doesn't sell "products", they sell technology. According to Chris Bennett, Dolby's Vice President, Voice Technology, Research & Engineering, the company is actively looking for partners, either hardware, software, or service providers, who are looking to productize this idea. The technology is indeed impressive and the solution can scale to thousands of participants if need be. According to Mr. Bennett, Dolby got hold of that scaling technology with the 2009 acquisition of a company called Spatial Voice that made conferencing solutions for massive multiplayer network gaming systems.
So while there's a long road to travel, we might finally get some relief from the slow torture that is audio conferencing. Of course this technology could be coupled with a video conference, and the spatial positioning arranged so that the sound is coming from the direction of that speaker's image. What will sell this however is not the audio trick but the recognition that better sound means better and more productive conferences. So the biggest part of the task will be to convince business users that better sound leads to better meetings.