Why Use WebRTC?Why Use WebRTC?
Here's my story on choosing a VoIP protocol stack.
April 5, 2016
Here's my story on choosing a VoIP protocol stack.
At a healthcare communications company I've been working with, we recently had a debate over the best way to build a mobile voice/video communications app. The app is aimed at a closed community of users, with support from a contact center and a gateway into the public telephone network.
Of central importance to the app is the quality of the user experience. Our CEO posed the question, "The audio quality matters more than anything else; can we use VoLTE, the telephone system itself, to make this work well? After all, the phone companies have clearly worked out audio quality."
Now, we've already decided that we're not going to build it all ourselves. Rather, we're going to use a platform-as-a-service (PaaS) provider -- such as Twilio, Sinch, or Kandy -- and wrap our service around it. We also have closely followed the HTML5 vs. native app debate, and are leaning towards building native apps, so that we can exert maximum control over the UI.
But what about ensuring a good voice and video experience? We considered three approaches: using VoLTE, a VoIP software stack, and WebRTC.
VoLTE and RCS: Nothing to Offer the App Developer
VoLTE, or Voice over LTE, is the next step in the evolution of the mobile telephone network. It features high-definition voice and, for the first time in a carrier offering, video calling interoperable with other carriers. When complemented by multimedia messaging and content sharing, it forms the basis of the carrier Holy Grail, Rich Communications Services (RCS).
With standards defined by the GSMA, RCS has promised ubiquitous, interoperable communications to seize back mindshare -- and revenue -- from over-the-top (OTT) applications like Skype and WhatsApp. RCS has been branded as "joyn" by the GSMA and most participating carriers, and is now starting to see commercial service, starting in the U.S. with T-Mobile.
Carriers participating in joyn are expected to offer APIs into the service, allowing external applications to interact directly with users. Imagine, for example, Facebook using joyn instead of having to create the Facebook Messenger application. On the face of it, having a ready-made voice and video client would seem like a quick way to bootstrap our service. All we would need is a SIP trunk to pipe the voice and video calls into our contact center.
Sadly, this approach would not work for us. Only one active joyn client can run in a device at a time, and if we wanted to build a customized version, we would have to disable the standard one. And since joyn endpoints are identified by their phone numbers, they are globally addressable, which is clearly incompatible with having a secure, spam-free closed user community. As things stand, it's simply not possible for us to control user identity without running our app over the top.
You would have thought that the phone companies, with the billions they have plowed into building infrastructure engineered for quality of service, would be ideally positioned to offer the building blocks of a suitable service. But with that very scale, aimed at a mass market, comes a lack of flexibility. Concerns of extreme reliability, device compatibility, and the sheer scale of rolling out new services mean long planning horizons and long testing cycles. It's not like some product manager at Verizon or AT&T can come up with some neat idea, have an engineering team code it up in the next biweekly sprint, do a bit of A/B testing, and roll it out.
It's Down to WebRTC Vs. a SIP-based VoIP Stack
What we're left with is either adopting WebRTC for our app, or using a commercial or open-source VoIP software stack.
"But wait a minute," you say, "doesn't WebRTC mean you're running your app in a browser? And didn't you say you were most likely not going down the HTML5 road?" Well, here's the dirty little not-so-secret about WebRTC -- at its heart is an open-source VoIP stack, maintained as part of Google's Chromium project. Not only is this code used in WebKit-based browsers like Chrome, but Google also supplies it as a software development kit (SDK), ready for developers to incorporate into their own iOS and Android applications. In fact, most WebRTC usage happens within native mobile apps like Facebook Messenger and Amazon Mayday.
So, our decision came down to the choice of which VoIP stack to use, and this brings us to the issue of signaling.
Call signaling: It's so unsexy, but so essential. And yet the WebRTC standards left it out entirely! Google in its wisdom decided that folks building apps and services using WebRTC would be free to choose whatever they thought most suitable. They could use SIP or the Jingle XMPP extension, or use a modern, browser-friendly approach, such as carrying the signaling as JSON objects over a WebSocket transport. On the other hand, VoIP stacks -- for the last decade, at least -- have typically used SIP for signaling.
The Winner: WebRTC
Three factors really guided our choice. One is that PaaS vendors are embracing WebRTC and provide SDKs customized to their services, more often than not based on the Chromium code. The second is that SIP would complicate the business of identity management and authenticating our users; we want to maintain control over this, and don't want to build out a SIP-based infrastructure to do so. A simple Web service on our part and a token-based access mechanism offered by the PaaS vendor would do just nicely.
The third factor is that we could have our contact center agents use WebRTC from their browsers, using the same high-quality voice and video codecs. We would not need to download a separate soft client or resort to using the PSTN. When making calls out to the telephone network, we could always fall back to G.711, the baseline for traditional PSTN quality. (Sadly, though, WebRTC and VoLTE don't use the same HD voice codecs, so G.711 is the lowest common denominator.)
Final Words of Caution
A counter-argument is that going with a PaaS vendor's SDK constitutes vendor lock-in, while SIP is an open and widely implemented standard. This is actually the most serious knock against WebRTC -- you're stuck with whatever your PaaS or equipment vendor supports. You will have to act with due diligence, and pay close attention to the details of the signaling protocol on offer.
You should also architect your own service so that it is coupled with the PaaS vendor's service as loosely as possible. This way, if you move between vendors you won't need to rewrite your apps. And if a vendor says, "We support standard SIP-over-WebSockets," and leaves you to figure out the protocol stack yourself without providing a nicely packaged SDK, then keep looking!
Editor's Note: This is a slightly revised version of the original, changed to clarify one of the author's points.