Scalable Video Coding: Another Kneecap Blown AwayScalable Video Coding: Another Kneecap Blown Away
Scalable video coding, or SVC, may be just the latest example of the video conferencing industry's affinity for never missing an opportunity to miss an opportunity. In this case we're talking about interoperability.
February 13, 2014
Scalable video coding, or SVC, may be just the latest example of the video conferencing industry's affinity for never missing an opportunity to miss an opportunity. In this case we're talking about interoperability.
H.264 SVC, an algorithm for encoding/decoding video streams, became an internationally approved (ITU) standard (formally known as a Recommendation) in 2007--ancient history in today's UC world.
The details of SVC are outside the scope of this document. A video bitstream is called scalable if part of the stream can be removed in such a way that the resulting bitstream is still decodable. Suffice to note that H.264 SVC standardizes the encoding of a video bitstream into a base layer and one or more enhancement layers.
The enhancements add frame rate (temporal enhancements), image resolution (spatial enhancements), or image quality (signal-to-noise enhancements) to the base layer. The base layer itself complies with the old H.264 baseline profile, commonly (but inaccurately) referred to as H.264 AVC. This is an important detail.
SVC facilitates videoconferencing over best-effort IP networks like the Internet because when things get tough on the network, SVC can temporarily send just base layer packets. More to the point, a video switch/router in the middle of the connection (the video infrastructure that is so often overlooked in today's SVC debates) decides dynamically which layers to send to which endpoints, depending on the endpoint's processing power, network bandwidth, and traffic conditions. The switch does not perform any transcoding; since processor requirements are minimal, this fundamentally changes the economics of multipoint video calls.
This type of architecture is a natural for calls that connect high-performance room systems with desktops on a LAN and/or mobile devices on a cellular network or slow Internet pipe.
Traditional multipoint video conferencing involves the use of expensive, centrally-deployed multipoint control units (a.k.a. MCUs or video bridges). The MCUs decode each incoming video signal, mix the signals together as required, and then re-encode the signals to provide the appropriate stream to each endpoint. This transcoding architecture is processor intensive and very expensive, and also introduces significant latency to the call.
Suffice to say that the benefits of SVC include:
* Low Cost Endpoints: The solution runs on industry standard endpoints (PC, Macintosh, iOS, Android) and x86 servers
* Low Cost, Scalable Infrastructure: The router/switch can be an industry-standard server; in at least one implementation a single 1U device can support up to 100 HD streams
* Virtual Server Ready: The infrastructure software can be virtualized and run in a cloud
* Low Latency in Multipoint Calls: The infrastructure device performs packet switching and NOT video encoding/compositing/decoding
So, What's the Problem?
To start with, several vendors have decided to offer an alternative to SVC. Their answer is simulcasting. Here, a device encodes two or three versions of the video input, for example 1080p, 720p, and 360p. Each of these is H.264 AVC-compliant. A switch in the middle decides which streams to send to which participants.
The switch, theoretically, can go even further by implementing simple SVC temporal scaling--that is, it could decide to send 60 frames per second to some endpoints, 30 fps, or 15 fps as conditions warrant. So, while SVC encodes one stream with multiple layers, simulcasting encodes multiple streams with one layer each.
Simulcasting has the advantage that each stream is standards-compliant and, hence, is interoperable with "legacy" systems. The major disadvantage is that the endpoint needs enough horsepower to encode multiple streams and enough bandwidth to send them all to the infrastructure device in the middle. If it takes XYZ kbps to send 1080p, then adding 720p adds 1/4th XYZ and adding 360p adds 1/16th more XYZ.
Because SVC has so many permutations and combinations, and because the H.264 Recommendation doesn't cover signaling--a key ingredient to be handled by those devices in the middle--an industry group known as the UCI Forum (nee Unified Communications Interoperability Forum) is trying to get vendors to agree on a subset of the possibilities they will all support.
Unfortunately, the UCIF group appears to have broken out into a nasty civil war, with "government" video engineers favoring a purebred SVC approach that takes advantage of both temporal and spatial enhancements, while the "rebels" have opted for the more simple approach of simulcasting. The bottom line is that when you put Microsoft, Google, Cisco, Polycom, Avaya, and Vidyo in a room, then throw in some uncertainty about patent infringement and IP licensing, it's not surprising that interoperability will remain a distant dream.
These and other issues will be explored in one of our Enterprise Connect video sessions entitled "SVC in 2014: Assessing the Implications." Moderated by Wainhouse Research, the panel will include representatives from both sides of the battle, including Vidyo, Polycom, and Avaya.
The eight-session video track at Enterprise Connect offers a peek into the future of enterprise video and an exploration of the critical issues facing decision makers today. Register for the conference and join us this March!