Sponsored By

A Practical Guide to Audio CodecsA Practical Guide to Audio Codecs

One of the problems with most technologies is that they are jargon filled, and if you don’t know the secret handshakes, you won’t be welcomed into the club.

Andrew Prokop

September 3, 2014

6 Min Read
No Jitter logo in a gray background | No Jitter

One of the problems with most technologies is that they are jargon filled, and if you don’t know the secret handshakes, you won’t be welcomed into the club.

The field of communications is certainly like that. I am sure we've all used terms like TDM, network region, signaling group, and gateway with our friends and significant others only to be met with blank stares. It gets even worse with IP communications, where you combine old school telephony terms with networking phraseology.

Case in point are codecs (coder/decoder). Some people bandy about G.711, iLBC, RTAudio, and Opus as if everyone in the world should know what they are talking about.

I've determined that the best way to think about codecs is to understand what you are trying to accomplish rather than immediately going to their technical specifications. Are you looking for great sounding audio or are you willing to sacrifice voice quality for a reduced impact on your network? Is compatibility your prime concern? Do you need to settle for a limited codec set because of legacy devices or applications? Perhaps you are the cutting edge type and want the latest and greatest to play with.

Over the next several paragraphs, I am going to dig into each of these questions and concerns with the goal of helping you decide where the various codecs fit in.

Looking Back
From the late 1970s until the end of the '90s, digital telephony was all the rage. Compared to analog 2500 sets, digital telephones supported hundreds of features in user friendly form factors. You had buttons, lamps, a display, a headset jack, and quite often, a hands-free speaker and microphone. All of this came with the audio quality we grew up with. A call made from a digital telephone had the sound we had come to expect.

There were, of course, significant problems that the users of those phones may not have been aware of. Digital telephones required line cards in the PBX. Depending on the manufacturer, you needed a board for every 16 to 24 users. A major company could easily fill a large room with PBX cabinets jam packed with little more than these cards.

Every telephone also required its own wire back to the PBX. While this might not have been a problem in 1977 when data networks were uncommon, it became a huge issue as companies rolled out IP to the desktop. The data network required a completely separate wiring system, and equipping every desk with two separate cables was expensive.

IP telephony solved both those problems. You no longer needed to fill cabinets with line cards for the new IP telephones. Even better, those telephones ran on the same network as your PC.

However, with the introduction of those telephones came a new question that IT directors needed to ask themselves: "How do I want to encode the audio that my users send and receive?" It wasn't like the old days of digital phones where you had no choice. IP telephones had access to a wide range of codecs that ultimately defined the audio experience.

Keeping the Status Quo
Some enterprises opt for voice quality that is exactly the same as what they had with their older digital telephones. That's not a bad thing, though, since those old telephones sounded pretty good. Why fix something that isn't broken?

For these folks, I recommend G.711, which gives you the exact same voice quality as digital telephones. It's what we insiders call, "toll quality audio."

G.711 also has the advantage of supporting touch tones (DTMF) and under the right conditions, fax.

Skinny it Down
Voice quality is important, but the cost savings from IP telephony often come when you increase the number of calls that can be sent on your network. In this case, you want to consider codecs that minimize the amount of bandwidth they consume.

In this class are the compressed codecs such as G.729, G.728, G.726, iLBC, and G.723. Of that crew, G.729 is the most common. Personally, I like G.726 since it uses less bandwidth than G.711, but sounds nearly as good. Unfortunately, most carriers don't currently support it.

It's important to know that these codecs do not support touch tones or fax signals. For that, you would need to configure an out-of-band mechanism to transport them such as the one outlined in RFC 2833.

Why Settle for Less?
G.711 may be considered toll quality audio, but it only captures speech within a very narrow frequency range. The newer wideband audio codecs sound much more realistic, yet their bandwidth consumption is not much greater than that of G.711.

The most common codec in this class is G.722. While relatively unheard of a few years ago, it is now supported by the most important IP telephony vendors. G.722 sounds great over a handset and absolutely shines when played in hands-free mode.

On the Cutting Edge
These are the codecs for folks like me who are always willing to try the latest and greatest. The goal is to produce higher quality audio using state of the art compression techniques.

The one that jumps out for me is Opus. Opus sounds amazing and can be used for both speech and high fidelity audio. Although it was originally designed for WebRTC, I've already seen it being used outside of browser-based telephony. For instance, AudioCodes added support for Opus into their SIP telephones.

It may take some time for Opus to gain widespread adoption, but I expect big things from this codec and would not be surprised if one day it becomes as common as G.711.

Proprietary Approaches
With the exception of Opus, all the codecs I've mentioned are fairly common with the major providers of telephony such as Avaya, Cisco, and Unify. For example, I recently configured a slew of Avaya IP telephones and provisioned them with G.711, G729, G.726, and G.722.

However, there are those that don't follow the trend. Microsoft chose a different, non-standard route and based their Lync product on their own codec, RTAudio. RTAudio is an adaptive bandwidth codec that provides both wideband and narrowband audio. Additionally, it can adapt to network conditions and consume more or less bandwidth as needed. While it may not be a standard, RTAudio does a great job in terms of voice quality and network usage.

Microsoft does give you the choice of using G.711, but that is typically reserved for connecting Lync to the outside world. Inside the bounds of Lync, users will use RTAudio exclusively.

Wrapping Up
I hope you've noticed that nowhere in this article did I mention the bit rates, frequency ranges, and various other parameters of these commonly available codecs. Instead, I presented a high-level look at what they are and where you want to use them. There are plenty of other sources out there that dig deeply into those nitty-gritty aspects, and if you are curious about them, I suggest you seek them out. Until then, this should be enough for most people to know which codecs to provision as they build and rollout an IP (H.323 or SIP) telephony system.

About the Author

Andrew Prokop

Andrew Prokop has been involved in the world of communications since the early 1980s. He holds six United States patents in SIP technologies and was on the teams that developed Nortel's carrier-grade SIP soft switch and SIP-based contact center.

 

Through customer engagements, users groups, podcasts, proof-of-concept software development, trade-shows, and webinars, Andrew has been an evangelist for digital transformation technologies for enterprises and their customers. Andrew understands the needs of the enterprise and has the background and skills necessary to assist companies as they drive towards a world of dynamic and immersive communications.

 

Andrew is an active blogger and his widely read blog, Tao, Zen, and Tomorrow (formerly SIP Adventures) discusses every imaginable topic in the world of unified communications. He is just as comfortable writing at the 50,000 foot level as he is discussing natural language processing or the subtle nuances of a particular SIP header.

 

You May Also Like