Sponsored By

Interoperability in IP Telephony and Unified CommunicationsInteroperability in IP Telephony and Unified Communications

The obstacles to true multi-vendor interoperability are numerous and large. It's not at all certain they can be overcome.

May 27, 2008

23 Min Read
No Jitter logo in a gray background | No Jitter

As enterprise real-time communications moves from a hardware to a software architecture model, the issue of interoperability has come to the fore. Enterprise telecom departments traditionally have had to support multiple vendors’ PBXs within their networks, and have been frustrated by these systems’ limited ability to support common features and functionality across different vendor platforms. One of the key benefits of moving to a software model is supposed to be that this new model could more easily and effectively support a multi-vendor environment.

Mark Straton, senior VP at Siemens Communications, explains that customers have a “huge amount of leverage” when [interoperable software] systems are deployed on common off-the-shelf (COTS) hardware,” and he adds that there are “huge economies in getting [vendors] to specialize on components.”

The most important driver of the issue now is the entry of Microsoft and IBM into the communications space, and the role that these two companies’ systems are expected to play in this new software-based communications architecture. At VoiceCon Orlando in March, representatives of the two vendors shook hands and agreed to work on aspects of the interoperability challenge (more about that later in this article). However, the attitude held by several of Microsoft’s (and IBM’s) competitors on that VoiceCon stage can be summed up in a No Jitter comment posted by a commenter with the handle, “Bithead:”

They [Microsoft] have come to the market touting a unique software-based architecture, and in doing so they should be able to claim an openness that far surpasses that of traditional vendors. What I see is the opposite…. Ultimately we should be looking at architecture and how they drive the ultimate value for customers.

So what’s really involved in getting to a real-time architecture in which multi-vendor interoperability is possible and can support meaningful enterprise features and functions? And then what needs to happen to actually bring about such interoperability?

A big picture view would place communications systems within the larger context of the entire enterprise IT infrastructure. It envisions a world where there’s interoperability not only among different pieces of communications gear, but between communications gear and other business process applications. I’ve tried to give a general sense of these different levels of interoperability in a recent No Jitter blog post .

However, this Feature article will be limited to three major interoperability challenges within the communications level of the architecture, and won’t touch on the “levels of interoperability” challenge in the blog post cited above. We’ll look at the multi-layered challenge in future coverage, but there’s way more just at the communications level than you can deal with in a single article.

The three areas of focus for this article are:

1.) The effort to use SIP to standardize legacy PBX functions in new IP-based PBXs.
2.) The issue of codecs, and their implications for interoperability.
3.) Presence federation.

These are not identical kinds of challenges: Number 1 is a pretty straightforward problem of writing standards that will cover implementation scenarios such that a vendor’s assurance of standards compliance will, in the real world, mean that the vendor’s product actually does “plug and play” with another vendor who claims their product also meets the standard. Actually, this is a straightforward statement of the problem, but the problem itself is anything but straightforward, as you’ll see.

Number 2 has to do with how interoperability is playing out in the real world, in one specific area where clear standards exist and are, in fact, widely implemented in an interoperable fashion--and where there is, nevertheless, debate over whether one vendor (Microsoft) is violating some tenets of interoperability.

Number 3 represents an area where there an ongoing effort to attain interoperability, but where there are also serious underlying technical/conceptual hurdles to overcome before a framework for multi-vendor interoperability can even be realized.

[You can discuss/comment on this article in the Comments section here.]


SIP

Almost every enterprise owns voice communications platforms from more than one major vendor, and even if you standardize on a single IP-PBX vendor today, your company could acquire another firm that uses a different vendor, and you’ll be called upon to try and integrate the two systems. Or your vendor could merge with another vendor and product lines could be rationalized in ways that threaten to strand your investment.

The acceptance of the Session Initiation Protocol (SIP) as the universally-acknowledged standard for call/communications setup has led to expectations, or at least hopes, that multiple vendors’ PBX systems will be able to work together while preserving extensive functionality. This goal proved elusive in the TDM world, where the QSIG standard aimed to provide this ability but was rarely implemented to its full extent.

Today, standards-compliant SIP phones can connect, in an interoperable way, to just about all major IP-PBX platforms—with limited functionality. That’s because the Internet Engineering Task Force (IETF), which created and governs the SIP standards, consciously decided to build SIP around the idea of “primitives,” i.e., basic functions that can be combined to create more advanced features like call park and pickup. Cullen Jennings, distinguished engineer at Cisco and IETF Real Time Applications Area Director, explained: “It’s not that SIP vendors don’t have call park and pickup, it’s that there’s a ton of different ways that you might be able to do call park and pickup in SIP, and no specific one was really specified or worked out [within the standard]. Different vendors went off and did different things in different directions.”

To get a sense of the scale of the problem, consider one of the newer IETF-sanctioned efforts around SIP interoperability: A working group called BLISS (Basic Level of Interoperability for SIP Services). The name says it all, and the working group’s charter spells out the problem:

SIP's approach to supporting more advanced features and applications has been to specify a number of primitive operations, including refer, dialog replacement and joining, and event packages, and then to allow those primitives to be combined in many ways to realize different features. This approach avoids the need for standardized definitions of a feature, which can severely limit innovation and broad applicability.

While this approach brings great flexibility and generality, it complicates interoperability. Without any kind of standardized definition of a particular feature, each implementation creates its own definition and corresponding set of call flows and primitives used to realize this feature. In practice, this has resulted in a poor track record for interoperability for more advanced features which make assumptions on supported SIP extensions and behaviors from other elements.

The problem is exacerbated by the desire for these features to work across many types of SIP endpoints, including SIP hardphones, softphones, and gateways to the PSTN and other VoIP networks including non-centralized environments, and for the desire to work across domain boundaries and to interwork with the PSTN, when applicable.

The BLISS working group narrowed its initial focus to four features, whose implementation the group plans to have submitted as IETF Best Current Practices (BCPs) by the end of this year:

  • Line Sharing

  • Parking

  • Automated handling

  • Call queuing

    From the BLISS website, here’s an example of the “Problem Statement” for interoperability in the feature of Call Park:

    • Service/Feature may be implemented/provided

      • Partially on UA [user agent, i.e., the end station, typically the phone]

      • Partially on server

      • Fully on UA

      • Fully on server

    • Service/Feature may be implemented/provided

      • Using non-signalling channel (DTMF etc.)

      • Using signalling channel (SIP methods, )

      • Using specific call-flows

    • Service/Feature may have various interpretations.

    • When service/feature is provided by a server, UA might assume one approach when the server is providing a service in another.

    By way of illustration:

    Call park requires some kind of 'indication' to be emitted by the phone that signals a park request. There are many ways to do this:

    1. …the phone emits a DTMF feature code (rfc2833) on the call; this is captured by the PBX, interpreted as an invocation request, mapped to the park feature. The call is then parked at the PBX, and can be picked up elsewhere.

    2. Similar to 1, except invocation is done via some kind of signaling channel mechanism (INFO, proprietary methods, etc.)

    3. Phone is a bit more intelligent, and knows the user wants to invoke Park. So, it REFERs to the park server, having it replace its call with the participant

    4. conference mechanism; park is implemented by viewing the park server as a conference bridge. So, the invoking UA [user agent] creates an ad-hoc conference bridge, transfers the correspondent into it.

    and there are more. Interop failures are anticipated when a UA assumes one mechanism, and the PBX and other phones assume different ones, we have no hope of interop.

    That’s the theoretical problem, and the Call Park example illustrates just one feature on one leg of the network. SIP is also the core of the standards that will connect call control servers to other servers for applications such as messaging and contact centers; and the associated SIMPLE (SIP for Instant Messaging and Presence-Leveraging Extensions) will be the key in any attempts to standardize presence federation (which we’ll discuss later).

    Cullen Jennings explains why, ultimately, standards don’t necessarily equal interoperability: “Fundamentally we’ve got really a large number of RFCs around SIP, of doing different features and extensions to it, and vendors are going to pick and choose some set of those that they implement. Certainly nobody’s going to implement all of them. And nobody should implement all of them. [They’ll be] trying to pick the ones that are relevant and meet their customer needs.

    “So it’s really hard from an IETF point of view to say, The following things work or don’t work. What we can say is, Well, if you want to do X, we have a standard to do it. And what your vendor may or may not implement, your mileage may vary.”

    So, bottom line: A vendor’s claim to be “SIP-compliant” for a given feature probably will never represent a guarantee that this particular feature will be able to be invoked from a business desk phone built by that vendor, talking to a platform built by another vendor, even if that other vendor also claims “SIP compliance” for the same feature. Note that this doesn’t mean they won’t interoperate; it means they won’t necessarily interoperate.

    On the other hand, if the efforts of the BLISS working group were to gain momentum and general acceptance, then if two vendors both claimed that their implementation of a given feature complied with the BLISS Best Current Practice for that feature, in theory that ought to mean you could trust they’d interoperate. Or comparable “best practices” efforts from groups like the SIP Forum could also provide similar assurance. The question is whether we’ll ever get to that point. We’re definitely not there yet.

    [You can discuss/comment on this article in the Comments section here.]


    CODECS

    There is one area of IP telephony implementation where standards compliance is pretty much a check-box item today. Generally speaking, if two vendors’ phones both implement the same standardized codec—the most commonly chosen are G.711 and G.729—the two phones should be able to communicate without any intervening transcoding function needing to be employed. And, indeed, most vendors do use standard codecs in their phones. With one exception.

    That exception is Microsoft, which uses only its own proprietary codec, called the Real-Time Audio codec, in all of its endpoints, whether softphones or hard phones. The reason Microsoft gives for making this choice is that the whole point of moving to IP-based communications is that endpoints should be able to be located anywhere, attached to any network, and still be able to register on their enterprise system and thus appear as if they’re within the corporate network, even if the user is at home or in a hotel on the other side of the world. The implication of this state of affairs is that the end user, and the enterprise IT manager, will not always be able to control the quality of the network across which the real-time traffic is traveling. If it’s a high-quality network, as the internal enterprise is presumed to have, you’re OK; but if it’s the Internet or a low-bandwidth/quality wireless network somewhere, there are lots of impairments that will degrade voice quality.

    One way to improve voice quality in the face of network impairments is to use a wideband codec—essentially, to dedicate more bandwidth to each voice stream. The RT Audio codec requires 80 kbps per voice channel for optimal performance, versus 64 kbps for G.711 and 8 kbps for G.729. But the Microsoft codec is proprietary, and the only way for other vendors’ phones to talk to Microsoft endpoints is via Microsoft’s Mediation Server, which transcodes between standard codecs and Microsoft’s RT Audio.

    Mark Straton, senior VP at Siemens Communications, has been one of the most persistent critics of Microsoft’s approach to interoperability, and he calls their implementation of the proprietary codec a form of lock-in. “People don’t want a closed model; they really do want an open model,” Straton said. Of big Microsoft customers like Siemens’ own parent company, the German industrial conglomerate, Straton said, “I don’t think they’re going to be forced into a codec that Microsoft demands.”

    Microsoft senior director Eric Swift’s response is that this is a non-issue. “Can I make a phone call from OCS to a user who’s not on OCS?” he asks. “The answer is yes, of course. And the Mediation Server does that. Well, why do you use a Mediation Server? The same reason anybody else uses [any sort of] mediation server. It’s to get from one codec and one set of signaling to another. Whether that’s a PRI gateway or whether it’s mediating from different codecs.”

    Swift said Microsoft’s implementation of the wideband codec reflects a conscious decision to offer a value proposition that’s different from other players in the market. “One of our primary value propositions is that we don’t expect our endpoints to be stationary. We don’t assign a phone to an office and expect it to sit there. I can pick up this phone and drive it to my house, plug it into my DSL connection at my house, I can make the same phone call that we’re having right now without any involvement from IT.” Of course, all vendors’ IP telephony gives you that portability in terms of call routing--but Swift is arguing that you can’t automatically bring the quality of experience with you wherever you go, because that remote connection might be happening over a poor-quality network that requires you to “fix” the impairments at the endpoint itself, via the wideband (proprietary) Microsoft codec.

    Swift plays down the burden that the requirement for Microsoft’s Mediation Server places on adopters. “We don’t charge an extra cost for it; It’s free if you’ve got the [Office Communications Server, OCS] server up and running. We did that because we didn’t want to limit the way people deploy it.

    “I think sometimes our competitors put a little false information out there about what a big deal this is,” Swift said. “It’s really a pretty straightforward little piece of software.”

    This particular interoperability debate has little to do with technology and everything to do with the company that’s at the center of it. Microsoft is both the booster that rocketed first generation IP telephony into its second-generation orbit as Unified Communications; and it’s also, well, Microsoft: The company that competitors and some customers love to fear, loathe, and, more often than not, whom they eventually must embrace.

    But the debate illustrates some things about what various people mean when they talk about “interoperability.” Microsoft points out that the SIP implementation in OCS follows all the RFCs, which in some sense makes them more interoperable than vendors whose original IP-PBXs relied on proprietary call control protocols. But as we’ve seen, even a fully SIP-compliant implementation only gets you so far on the road to interoperability. And then, does the fact that Microsoft used a proprietary codec mean they’re not, in fact, open and interoperable? Or does the fact that they provide a gateway at no cost mean that you really can interoperate?

    [You can discuss/comment on this article in the Comments section here.]


    FEDERATION

    The biggest interoperability challenge of all is probably not an immediate concern for most enterprises, but it’s critical to the future architecture of Unified Communications. Today, companies worry about different vendors’ PBXs talking to phones and to each other, as well as to applications such as contact centers. But the major bone of contention going forward into UC is presence and the need to “federate” presence engines.

    Why is presence so important? As explained by Akiba Saeedi, Program Director, Unified Communications and Collaboration Software for IBM’s Software Group, “Presence is the heart and soul of the [UC] system.”

    But while the presence engine may sit at the heart of the new architecture, it will be receiving information about users from multiple systems—calendar applications, workflow apps, the cellular network, and of course the device formerly known as the PBX. And several of those sources may be willing and able to offer presence information on a given user at the same time. And while the presence function may be the heart of the system, the enterprise may actually have multiple boxes maintaining presence state information, and these boxes need to be in synch. Oh, and the enterprise will want to share that presence information with other enterprises or service providers.

    Actually, that last point is the (relatively) easy part of presence federation. Inter-domain presence uses SIP/SIMPLE to reconcile a single user’s multiple identities in separate domains, say an IM service like Yahoo and your corporate messaging system. What’s more difficult is when everything is happening within the same domain where you have multiple different servers maintaining presence state, and each one thinks it holds the definitive information about the current availability state of, say, [email protected]. Explains Microsoft’s Eric Swift:

    It’s difficult to have 2 and synchronize, because you end up with 2 versions of the truth. They get out of synch, updates collide and that’s a hard problem that’s been looked at across many different technical areas, and you run into the same problems; one of them is, which version of the truth are you going to believe? And what happens when they get out of synch? And with something like realtime presence, boy, getting out of synch can be critical.

    And then you have the developer standpoint. Say I’m a corporate developer and I want to build a new application, and I want to get presence, who do I get presence from? Is there a single repository I can go to, or do I have to go to 3 different places because I need to get the PC presence from this system, and I need to get the phone presence from that system, and I need to get the login state of that application from another system? And that becomes a burden for the corporate developer. So you have that issue: If I’m a corporate information architect, if I’m going to create a presence architecture, do I want to have 2 versions of the truth? Do I want to have 1 and tell everybody to hang off that? That’s a hard conceptual thing to work through.

    An IETF effort is under way to try and work out this problem, with the technical challenges laid out in detail in an Internet Draft authored by Jonathan Rosenberg of Cisco and Avshalom Houri of IBM. Among the issues they address is the problem of whether to have a central hub of presence or allow multiple instances to be distributed in different places.

    And then there’s an additional level of detail to the whole issue of presence federation/interoperability, one that roughly parallels the SIP/PBX-feature challenge. It’d be a whole lot easier for multiple vendors’ systems to exchange presence information if there were only a few basic presence states, like you probably have now with your public network IM service: On the basic configuration of my Yahoo IM, I can be Available, Busy, Stepped Out, Be Right Back, Not at My Desk, or On the Phone. And there’s probably even one too many there: Stepped Out is pretty similar to Be Right Back.

    However, just as PBX feature/functionality gets hairy when the features get more complicated and task-specific, the same is true (though for different technical reasons) when you get to Presence. And, as with PBX features, the equipment vendors are going to want to use fine-grained functionality (i.e., detailed presence states) as one product differentiator. Eric Swift says the market wants such granularity:

    “I was in Wall Street a couple weeks back, they have a turret phone system there, [with] several different presence states beyond what a normal phone has. It has not only on hold or off hook, or what have you; it has the on-intercom, or on a private call or in a public call. Got all those weird states. You probably don’t want to push all those states into your central presence engine.”

    Likewise, manufacturing or retail customers might want presence states that say, On The Factory Floor or At Cash Register 15, for example, Swift adds. “Does that make interop harder? Of course, because then you don’t have a pre-defined [rule]: There are 7 different presence states that are acceptable across the world. You have to say: There are N number of presence states, and here’s how you figure out what they are, and here’s how you deal with them once you figure out what they are, and you then relate to them in a standard way as opposed to having them pre-defined in a cookie-cutter fashion. And that makes it a little more difficult to interoperate, even with a standard. But I think it’s a necessary thing if you want to have practical and relevant flexibility as a platform.”

    And although this article isn’t going into detail on interoperability between communications and business applications, it’s worth noting here that such multi-layer integration will further complicate presence federation. “When people want to deploy presence, they’re not just looking at, are you offline and I’m online,” noted Anwar Siddiqui, manager of Avaya Labs’ chief technology office. “Companies want to pull that [presence information] together and expose it upwards to the business applications.”

    At VoiceCon Orlando 2008, Avaya announced its approach for doing this: Its Intelligent Presence software which, according to the vendor, “aggregates telephony, desktop and application presence information from Avaya and third party sources, such as those from Microsoft, IBM and others, and bridges industry standard protocols including SIP/SIMPLE and XMPP.”

    Which brings us to the really tough issue: The major vendors all want to own presence. Christian Szpilfogel, who’s in the office of the CTO at Mitel, echoes Eric Swift’s comments about the application environment. Szpilfogel explained that the advantage of owning the “call” (in whatever medium) is that the vendor who owns the call will likely own all of the interworking behavior, and effectively owns the addressing space of the customer, which in turn is going to be what’s used to tie collaboration more deeply into the rest of the network.

    All of which brings us to the Interoperability discussion that took place at VoiceCon Orlando, and the commitment that Microsoft and IBM made to work toward interoperability of OCS and Sametime, respectively. Since the March on-stage handshake between Eric Swift and IBM’s Pat Galvin, the two companies have been talking, and Akiba Saeedi said recently that, “I absolutely remain hopeful that we’re going to make significant progress here.”

    Saeedi didn’t want to go into the details of what’s been discussed for a planned interop demo at VoiceCon San Francisco in November. Eric Swift of Microsoft said, “What we’re following up on is discussing how we can demonstrate between Sametime and Office Communications Server, inter-domain federation. We have a lot of clients who want to do that, who either work with other companies or work with other business units on different domains, that they want to be able to use our product next to Sametime.” So, in other words, the interoperability work between Microsoft and IBM will only address the (relatively) easier issue around presence federation, that of inter-domain federation.

    What’s at issue for Microsoft-IBM integration at this level? According to Eric Swift, “What it comes down to basically is when they send an instant message, they send the message along with the initial request, whereas we send the SIP message which says, Hey, we want to have a conversation; once we get that acknowledgement, then we send the instant message along with it. Those two options are both supported within the standard, and so we’ve just got to decide, do we each do both, do we move to one, or does one of us do both and one of us stick with our status quo?

    “This is not rocket science,” Swift concluded. “This spec is pretty well understood. It’s just a matter of deciding on the approach and then testing it and agreeing to support it, that’s all.”

    It may not be rocket science, but any interoperability work like this is going to involve a lot of diplomacy, which is often much harder than rocket science.

    As for intra-domain federation, while it may be a technical problem that’s still unsolved, it’s clearly an important one for users who want their internal systems to be able to talk to each other. To the extent that either Microsoft or IBM believes they’ll gain market share at the other’s expense as enterprises move more deeply into UC, it could be in their interest to have an interoperability standard. Assuming that enterprise migrations will take years, not weeks or months, it’s almost certain that enterprises will own multiple vendors’ systems during the course of their migration, and less interoperability means less opportunity to capture the benefits the enterprise is seeking by going to UC in the first place.

    CONCLUSION

    So interoperability challenges come with different levels of urgency, technical difficulty, and likelihood of vendor intransigence. When it comes to basic communications systems, it’s worthwhile to keep in mind something Mark Straton of Siemens noted, namely that large voice systems typically have a 10-15-year life cycle: “Customers don’t replace their fundamental communications systems until they wear out,” he said.

    This isn’t just because those systems are expensive (though of course they are). It’s because they work, and making things work in real time is another degree of difficulty from the integration and interoperability challenges that have occurred in other parts of the network. It’s a challenge that the industry is beginning to explore and attempt to define; but we can’t expect solutions any time soon.

    Eric Krapf is editor and lead blogger for No Jitter, and is also program co-chairman of VoiceCon.

    [You can discuss/comment on this article in the Comments section here.]

    Here are some handy URLs and No Jitter blog posts on this topic:

    Standards work