It Takes More Than SIP to Get the Job DoneIt Takes More Than SIP to Get the Job Done
Signaling, media, and control work together to create a cohesive conversation, but each stands on its own.
December 7, 2015
Signaling, media, and control work together to create a cohesive conversation, but each stands on its own.
We shall not cease from exploration. And the end of all our exploring will be to arrive where we started and know the place for the first time.
-- T. S. Eliot
As a Minnesotan on the cusp of another long and cold winter, March feels like a lifetime away. While I have the holidays to lift my spirits here in the month of December, January and February are nothing more than snow, ice, and sub-freezing temperatures. It gets so bad that there will come a time when 33 degrees Fahrenheit feels like a warm, tropical day.
Besides the start of the spring thaw, what's so special about March? Enterprise Connect 2016 in sunny Orlando, Florida, of course. Not only will I get a break from winter's icy grip, but I have been asked to host the very popular tutorial, Understanding and Leveraging SIP for Your Enterprise. For an hour and 45 minutes, I will be talking about everything from INVITE requests to SIP trunks to the differences between SIP and WebRTC. While some material will undoubtedly be on the geeky side, it's my intent to both educate and entertain without losing too many people in the process. I want folks to understand the technical aspects of SIP along with the reasons why it's important and how it can be successfully implemented.
In the meantime, I plan on writing a series of teaser articles that will help set the stage for what I plan on covering during the session. Like the tutorial itself, they will vary between lowdown technical and high level positioning.
Today, I am going to start with lowdown technical and describe some of the most important protocols that work hand-in-hand with SIP to establish voice, video, chat, and whatever calls.
It goes without saying that SIP is a protocol. After all, SIP does stand for Session Initiation Protocol. It couldn't be more obvious than that.
SIP is a signaling protocol. It creates, manipulates, moves, and releases sessions. The INVITE request creates new sessions, and the BYE request tears down an existing session. REFER moves (e.g. transfers) an active session from one endpoint to another. UPDATE modifies sessions in progress. The point is that SIP works at the signaling level and is present at every moment in a session's lifecycle.
SIP is not a media protocol, and other than a few headers that allow it to advertise that media will be present, SIP has nothing to do with codecs, frame rates, IP addresses, ports, or anything else related to a media stream. SIP establishes communication sessions, but it doesn't have a clue as to what kind of conversation will ultimately occur.
In order to add media to a session, SIP uses another protocol. As the name implies, Session Description Protocol (SDP) describes the media that will be used within a session. It lists the codecs that an endpoint supports, along with network parameters of the media stream that will be established. For instance, the SDP for a SIP telephone may advertise that it supports G.711, G.729, G.722, and Opus on IP address 10.100.10.23 using port 5000.
SDP is a fairly cryptic protocol with a number of parameters that can be a bit daunting to newcomers. Thankfully, there are really only a few things that need to be understood to be SDP-literate:
For example, the following SDP snippet allows an endpoint to tell another endpoint that it supports G.711 -- designated as PCMU (Pulse Code Modulation U-Law) and PCMA (Pulse Code Modulation A-Law) -- and iLBC (Internet Low Bandwidth Codec) on IP address 10.100.10.23 using port 49170:
The c= line provides the IP address.
The m= line informs you that RTP (I'll get to that in a bit) will be used to transmit media to and from port 49170. This endpoint supports three codecs designated by 0, 8, and 97. Each of these codecs has its own attribute line to further define the codec. In this case, they will all be encoded at 8000 Hz.
SDP is carried within the body of a SIP message, but the two protocols are completely independent of one another. They even have their own specifications – RFC 3261 for SIP and RFC 4566 for SDP. In fact, SDP is also used by WebRTC, and there are plenty of people who will tell you that SIP and WebRTC have very little in common.
SDP is a description protocol. It does not negotiate media. Negotiation is left up to the applications that send and receive SDP. It's useful to understand how this occurs, but I'll save the explanation for another article.
SDP describes the media, and Real-Time Protocol (RTP) is used to transport the chosen codec. RTP is bidirectional, with each endpoint sending its own media stream. Using a tool such as Wireshark, you can capture a call flow and independently play each side of the conversation.
RTP is a datagram protocol (UDP) which means there are no retransmissions. The hope is that each packet arrives in order and uncorrupted, but there is nothing in the protocol to request retransmissions if a problem is discovered. The real-time nature of voice and video communications requires that each endpoint renders what it receives and deals with errors as best as it can. Newer codecs, such as Opus, employ a number of sophisticated techniques to minimize the damage due to network problems, but even they never request the retransmission of lost packets.
RTP is defined by RFC 3550.
Nothing in life is perfect, and real-time communications provides a way to measure just how bad things can get. Real-Time Control Protocol (RTCP) is the sister protocol of RTP and is used to convey quality of service information. Sent periodically throughout an RTP conversation, RTCP informs the recipient of things such as jitter and latency. This allows software to measure and report on the quality of the RTP it is receiving. Tools such as Prognosis, Prism One View, and Empirix can examine RTCP streams to create a holistic view of all unified communications traffic.
RTP always uses an even numbered port, while RTCP uses the subsequent odd numbered port. In my previous example, 49170 was used for RTP. This means that 49171 will be used for the corresponding RTCP.
RTCP is also defined by RFC 3550.
In every SIP telephone or video call, there will be three directional information streams: signaling, media, and control. They work together to create a cohesive conversation, but each stands on its own. As with SDP, RTP and RTCP are used by protocols and solutions outside of SIP.
Well, there you have it. A bit of the geekiness you will learn by attending Understanding and Leveraging SIP for Your Enterprise this March. Of course, a protocol without solutions is even too nerdy for me. That's why I intend to take the session (pardon the pun) well beyond the bits and bytes of SIP to help you understand why you need to be seriously interested in one of the biggest transformational changes to come to communications in years.
Andrew Prokop writes about all things unified communications on his popular blog, SIP Adventures.
Register now for Enterprise Connect 2016, taking place March 7 – 10 at the Gaylord Palms in Orlando, Fla., to take advantage of reduced rates. Use the code NJPOST to receive $200 off the current conference price.
Follow Andrew Prokop on Twitter and LinkedIn!
@ajprokop
Andrew Prokop on LinkedIn