Expanding HD Voice to Audio QualityExpanding HD Voice to Audio Quality
"It is quite surprising that we still have to endure phone calls today at a level of quality that is rooted in the technical limitations of the early last century."
November 15, 2012
"It is quite surprising that we still have to endure phone calls today at a level of quality that is rooted in the technical limitations of the early last century."
I have used HD voice (150 Hz to 7000 Hz) and I like it. I have also used full-sound-range audio (50 Hz to 20,000 Hz). It's even better. Why can't we have audio phone calls instead of HD voice calls?
I spoke with H.P. Baumeister, Director, Mobile and Communications Markets, Fraunhofer USA Digital Media Technologies, to discuss the future of HD voice. He refers to audio quality voice as Full HD voice. Fraunhofer is Europe's largest application-oriented research organization. Their research efforts are focused on issues around health, security, communications, energy, and the environment.
1. We have had POTS voice for the last 100 years. Why expand the analog voice bandwidth?
All our audio experiences today are digital, full bandwidth, "CD-quality", from broadcast sources (ATSC, cable, IPTV, FiOS, FTA/DVB, DAB or ISDB outside the U.S., even AM radio's "HD Radio/IBOC"), physical media (CD, DVD, BD), or when delivered over the Internet. No one would even consider anything but full audio bandwidth when delivering a rich media service, nor would the market accept anything less than that. That even applies to smartphones with increasingly capable camcorders.
The only glaring exception to this is classical telephony. It is quite surprising that we still have to endure phone calls today at a level of quality that is rooted in the technical limitations of the early last century. Add to that the observation that the quality did not get any better with the introduction of the mobile phone.
2. How does expanded bandwidth help in the conversations? (understanding, accent reduction, productivity, error reduction, customer retention)
If we had full audio bandwidth in telephony, not only would a phone call be much easier on the ears, it would be much less stress to follow a conversation, especially with higher-pitched voices, speech accents, and foreign languages in the mix. We would not need to spell words and for that matter would not be limited to speech in the first place. We could play music, sing, or whistle to illustrate something. Would it not be nice to not only talk, but also communicate an ambience, for example crashing waves when calling from the beach, or church bells in the background when visiting a Bavarian village, all this in Full HD quality?
Many studies discuss a new realism, a new "closeness" when making calls with higher audio bandwidth, resulting in longer calls, or at least in staying with the mobile call. No more "can I call you back on the landline?"
3. Your website references three voice bandwidths, POTS, HD voice and Full HD voice. What are the differences?
"POTS" is limited to 300 Hz to 3.4 kHz, so called "HD Voice" to 150 Hz to 7 kHz, and Full HD Voice has at least 50 Hz-14 kHz audio bandwidth. AM radio stations in the U.S. modulate up to about 10 kHz audio bandwidth.
4. How is the Full HD voice codec technologically different than the other narrower-band voice codecs?
Most narrowband codecs are voice codecs today, which means they are optimized for "mainstream" voice. They are based on modeling a single human vocal tract. Everything but the voice is being distorted, often to the point of not even being able to recognize the original sound. Voice codecs inherently have problems with multiple voices, music, and ambient sounds. Noise cancellation with all its drawbacks is an absolute must.
Full HD Voice implies that audio codecs are used. Audio codecs are not limited to voice signals and at least fundamentally don't need noise cancellation. Audio codecs are the only codecs used in today's digital rich media world. The leading, most popular examples are MP3 and AAC. Full HD voice digital bandwidth can operate as low as 24kbps up to 64kbps.
5. Is an audio codec different than a Full HD voice codec?
The only major difference really is the requirement to have low latency for real time communications applications. AAC and MP3 are not tuned for low latency and are not really suitable. We have developed special, low latency communications versions of AAC, called AAC-LD and -ELD.
Next page: Considerations for vendors
6. Are there any spoken languages that are not well supported by voice codecs today and why?
There is anecdotal evidence, but I cannot point to any published studies on the subject. It would be great if anyone out there could investigate this.
7. If I am a vendor of voice products or service provider, how can Full HD voice benefit me in my market?
Adding a Full HD Voice capability to a service today will immediately allow a service provider to offer a highly differentiated service and show market leadership. It will be a great match with LTE.
Generally phone calls sound the same, even when a consumer has purchased an LTE phone. Having a Full HD Voice capability would immediately convey the benefits of LTE, and give the early adopters recognition by providing them a much better service.
No one wants to be left behind. Very soon everybody will want an LTE phone with that great CD-like audio quality. Without Full HD Voice capability, a service provider will be increasingly on the defensive because end users will be more and more exposed to Full HD Voice (through other media) and will demand premium audio quality. The result will be quicker adoption of LTE.
8. Do any vendors or service providers support Full HD voice?
Full HD Voice is already widespread in video conferencing, telepresence systems and clients such as PC software or smartphone/tablet apps. Our first AAC-based communications codec, AAC-LD, is the only mandatory codec in the Telepresence Interoperability Protocol (TIP), the interoperability standard adopted by all the leading manufacturers such as Cisco, Tandberg, Polycom, Logitech, and others. Moreover, AAC-ELD is used in Apple's FaceTime deployed in well over 250 million devices today. It is a native codec in iOS, and is native and mandatory in Android 4.1.
Full HD Voice is not some esoteric capability, not a result of some futuristic, wishful thinking, it is already very much reality today--another reason why the "clock is ticking" for legacy voice. This is similar to the transition from NTSC to digital, "Full HD" television. Once consumers experience this, they will not want to go back.
9. Who would buy the licenses for the expanded bandwidth voice?
Here is some great news, and many even in the industry don't know this. There is no smartphone today that does not already support AAC. The AAC license however actually includes five "flavors": AAC-LC, HE-AAC, HE-AACv2 as well as AAC-LD and AAC-ELD. This means that the vast majority of phones and tablets, including many if not most CE devices, such as TVs, STBs [set top boxes], and DVD/BD players, already have an AAC license. The royalty has been paid for by the device manufacturer, whether or not AAC-LD/ELD is actually used. Visit www.vialicensing.com for more information.
10. How is the problem of interoperability among various different voice codecs solved?
Well, I don't want to answer this question in a flippant way, but the easiest is to settle on one codec that can do it all, such as AAC-ELD.
This is not without precedent. Billions of devices are using MPEG AAC today as the only or the de-facto only audio codec. The same is true for video. MPEG-2 and especially MPEG-4/H.264 are the de-facto video standards. Both MPEG audio and video standards cover well over 90% of the multimedia market today. There is no reason why we should not see the same widespread adoption of AAC in telecommunications.
The transition to audio quality phone calls will evolve. We will need support for G.711 for some time to come and for AMR in mobile. This is due to the legacy issues and large installed base.