Making Every Desktop a PhoneMaking Every Desktop a Phone
It has taken more than a decade, but native VoIP calling is headed to the desktop as well as to the tablet and smartphone.
July 6, 2011
It has taken more than a decade, but native VoIP calling is headed to the desktop as well as to the tablet and smartphone.
VoIP endpoints started gaining traction in the early 2000s. Both Windows XP and MacOS 10.0 were launched in 2001, yet neither included native voice capabilities. Nor have any of their subsequent releases (Windows: XP, Vista, Windows 7; and MacOS 10.1-10.7). To this day, the softphone requires installation of local software (applications, applets, or plugins) and/or hardware.
There is no shortage of options to transition a desktop computer into a voice endpoint. Nearly every enterprise solution has its own softphone client. Many phone systems use a SIP based client produced by CounterPath or others. Skype has its own stand-alone application, Google and Yahoo use a browser plugin. Facebook is experimenting with several plugins, and MagicJack uses a hardware device.
It has taken more than a decade, but native VoIP calling is headed to the desktop as well as to the tablet and smartphone. It is reasonably likely Microsoft will embed Skype's core technology into a future release of either Windows or Internet Explorer. Perhaps that is why Google, just a few weeks after Microsoft announced its intent to acquire Skype, released "WebRTC" for Chrome. But if you think you know what WebRTC is, better think twice. In addition to Google's recent release, there are two efforts underway with standards groups--RTCWeb within the IETF and WebRTC within W3C. Yes, that is correct--three different initiatives: RTCWeb, WebRTC, and WebRTC. Oh, and don't forget Jingle.
Jingle (XEP-166-7) is the primary signalling protocol that powers Google Talk’s VoIP capabilities. Jingle is based on XMPP--not SIP. It was jointly developed by Google, Collabora, Yate, Tandberg, and Jabber (the last two are now a part of Cisco). Google has lobbied the IETF RTCWeb work group to adopt Jingle as a standard, but there is debate over XMPP vs. SIP.
Rather than wait while the standards groups take potentially years to resolve the matter, Google released voice and video controls into the the public domain--with the unfortunate name of WebRTC for Chrome. It provides developers with browser-based, royalty-free signal processing for voice and video chat via HTML and JavaScript APIs. HTML5 alone does not provide controls for the microphone or video. WebRTC is available now for developers, and will be baked into future releases of Chrome browsers, Chrome OS, and future releases of browsers from Mozilla and Opera (no word on IE or Safari).
By releasing WebRTC to the public domain, Google gave the open source community a significant portion of what it obtained in its $68 million GIPS acquisition of May 2010. Open Source effectively got a $68 million gift from Google to take on an $8 billion juggernaut (i.e., Microsoft's acquisition of Skype).
Did Google just take the first step toward killing Skype? Far from it. The code Google released provides the tools to make and receive voice and video calls, but that’s about it. It does not support IM, presence, buddy lists, or signaling. WebRTC is a barebones framework that fits nicely together with Jingle. What Google really did was give the IETF and W3C a swift kick in the derriere.
Without signaling support for XMPP and/or SIP, third party software will still be required to supplement WebRTC in order to complete a call. Google favors XMPP, but a full blown SIP user agent could offer integration into enterprise voice systems. The problem has to do with SIP’s limitation around peer-to-peer communication. Although SIP can communicate peer-to-peer, it requires assistance from a server/registrar for call setup. None of the major browser and OS makers produces a solution that can leverage a SIP user agent on the desktop.
XMPP is mostly associated with Google Talk, but others are using it too. Voxeo uses Jingle in its Phono SDK providing developers voice and IM tools that leverage the web browser. Phono uses several tools such as Flash, Java, and Objective C (Apple IOS) to complete the call. All these steps are necessary because there is no consistent capability (API) with desktops and tablets. Voxeo deviated from SIP specifically because XMPP-Jingle provided a simpler faster (low latency) firewall-savvy solution. Where SIP is session based, XMPP is IM-focused; the same userID is used for chat. Voxeo reports that it is carefully reviewing Google's WebRTC and intends to integrate it now that it's open.
Applications and infrastructure services will flourish with a more capable browser (at Skype’s expense). Of course, what Microsoft does with Skype, IE and Windows as well as what Apple does with Safari and its mobile devices are yet to be revealed. HTML5 is not sufficient by itself because It is unable to control a microphone or capture video--and resolution on these controls are not expected soon.
Most likely, none of these efforts will eliminate softphone applications, but rather provide hooks that will simplify softphones. The market need is obviously growing, compounded by the fact that Adobe's Flash technology is dying. Flash is losing popularity due to both HTML5 capabilities and Apple's refusal to support it on portable devices. But considering the popularity of VoIP on desktop and portable devices it is surprising that there are no inherent calling capabilities in these devices.
Google’s WebRTC is an interesting turn of events. The ability to natively initiate a call has been difficult and standardization has been stymied in committees. WebRTC could change the desktop calling experience, but Google's perception of the call is more web or cloud based than via traditional telephony systems. However, the nature of open source means WebRTC could eventually take a path that diverges from Google's vision.