Implementing SVC in WebRTCImplementing SVC in WebRTC
A new technical approach can provide large-scale group WebRTC video conferencing despite WebRTC's lack of native support for Scalable Video Coding.
June 2, 2014
A new technical approach can provide large-scale group WebRTC video conferencing despite WebRTC's lack of native support for Scalable Video Coding.
One of the main issues with WebRTC is the lack of Scalable Video Coding (SVC) support for group video conferences. The WebRTC standard does not include SVC, and without it, a session with multiple participants--especially on a mobile platform--requires recoding for the same conference in different formats. This massively reduces the capacity of a WebRTC server.
Let's look at the problems group video conferencing faces in real Internet situations, along with the possible solutions for these problems.
One-on-One Conferencing Structure and the Challenges of Group Video Conferences
The optimal way to achieve one-on-one video communication is to directly connect two clients. The clients just need to agree on the maximum possible channel capacity. For example, if one user has a 300 kbps channel and another has a 200 kbps channel, they need to choose the minimum speed--200 kbps. This way, each user will send and receive 200 kbps for the duration of the call and they will have no problems.
The situation changes when more than two people want to connect. WebRTC allows them to connect directly with each other, but here we have a couple of issues.
Let's look at what happens with the channels in a group video conference. Each client will send the best resolution at the best possible speed. As a result, it is possible that one client will not be able to send 200 kbps to two users, because its upstream channel is only 200 kbps total. It can only send 100 kbps to each user, or 200 kbps to one and 0 kbps to the other. Both options are unacceptable for a full-scale group video conference.
The optimal solution is to install an additional entity: a video conferencing server or an MCU (Multipoint Control Unit) in a centralized location with good communication channels. However, there is no concept of this kind of server in basic WebRTC. There is also no traditional model for peer-to-peer group videoconferencing support. Therefore, to sustain full-fledged group video conferencing in WebRTC, we need to create a WebRTC server that can receive multiple video streams from the clients and distribute them to other participants.
A server can receive 200 kbps and distribute 200 kbps to each client. It seems the problem is solved, but, in fact, it is not.
In the following example, we can see where the problem is: Client #1 has an outgoing channel of 200 kbps and could send 200 kbps. Client #2, with a channel of 500 kbps, may be more than capable of receiving 200 kbps. But what is left for client #3? If client #1 only sends 100 kbps because client #3 can only accommodate a maximum of 100 kbps, then client #2 will wind up only receiving 100 kbps, too!
How can we give each client the best possible quality? The answer is: The server must be able to control each data stream.
MCU Transcoding vs SVC Technology
The classical approach to moderate the issues of variable communication channels is to transcode video streams on the server.
The most common example of this approach is using an MCU, which requires video stream transcoding for each layout and bit rate. This means that in real network conditions, each endpoint will require a separate transcoder. This, in turn, requires a lot of processing power. In the end, this makes the MCU-server option quite costly, and in the era of cloud services, this approach is unreasonably expensive.
The modern approach is to use Scalable Video Coding. SVC is a technique which allows a client to flexibly adjust an encoded video stream without re-coding. In other words, it cuts parts of the encoded stream, lowering bandwidth consumption while preserving the highest available quality for each user.
SVC consists of three forms of scalability: spatial, temporal and qualitative scalability. The first form--spatial--allows a client to select different video resolutions; temporal adjusts frame rate; and qualitative allows the client to effectively adjust image quality. SVC changes some video characteristics, including frames per second (fps) rate and resolution, which is also typical for classic MCU. However, there is no video transcoding happening on the server.
This feature of SVC makes it possible to conduct a large number of group conferences on a regular server, whereas video encoding on MCU requires considerable computing power that directly affects its cost. Because of this remarkable flexibility, SVC has changed the world of video conferencing over the past couple of years.
With SVC on a server, you can receive data at the best possible rate and then adjust bandwidth individually for each conference participant, eliminating the problem in which one channel affects the video quality for other users.
For normal operation of SVC, the party that sends the video stream must use a the video codec that supports this technology, and the video conferencing server must be able to work with such video streams. The problem with WebRTC is that there is neither: There is no concept of a server, as we have already mentioned above; and the VP8 video codec used in WebRTC does not have a full SVC extension.
The task is complicated by the fact that we cannot fully control the way in which clients (i.e. browsers) encode the video stream; therefore the implementation of the current SVC for WebRTC must be fully carried out by the server.
Despite the complexity of this situation, there is a solution that many people are not aware of. It turns out that the VP8 video codec supports the temporal scalability part of SVC: In other words, it can change the fps rate without re-encoding the stream.
How can this be used? If a VP8 stream is encoded with temporal scalability, then before and after the applying of "thinning," the stream is still normal VP8 and can be decoded by clients not aware of SVC, like WebRTC browsers. But how to use this temporal scalability stream feature of VP8 while web browsers are not able to make it themselves? To do this, we have to recode the stream on the video conferencing server upon receiving from the client browser.
Another very important feature, which is present in the SVC specification for H.264 but is absent in the VP8 codec, is called spatial scalability. It allows a client to change video resolution without transcoding, in addition to changing the fps rate. This capability is valuable for the effective use of SVC: Temporal scalability without the accompanying spatial scalability is restricted to a smaller range of bandwidth options, which can be made from one stream.
Despite the fact that this option is not available in the VP8 video codec standard, it can be developed independently and implemented thanks to its open architecture.
The Solution
To sum up all of the above, we will formulate an approach that can be used for carriers and cloud service providers who want to provide full-scale group WebRTC video conferencing to their customers but wish not to go bust in the process, from buying expensive infrastructure and MCUs.
It requires a video conferencing server, which performs the following tasks:
● Recodes regular incoming VP8 video streams from the browser into VP8 SVC
● Reduces the bandwidth of the video streams using SVC
Also, real browsers currently have problems with receiving multiple video streams (more than 4-6) and correctly estimating their downstream channels, so the video server should be able to:
● Create a group video conferencing canvas by mixing and regulating video streams.
● Monitor each client's parameters, its channel and screen resolution (desktop or mobile), and automatically "thin out" the mixed group video conference before sending it on to the WebRTC browser, which results in a normal VP8 stream with the desired characteristics. It could be done effectively by coding server streams into a VP8 SVC and applying SVC to the video stream.
As a result, each participant of a group video conference in WebRTC sends the maximum possible stream, and, in turn, receives the maximum that they are able to receive. At the same time, the video conferencing server does exactly one encoding, which is not dependent on the number of clients connected to the conference. This approach allows us to use SVC and WebRTC together and to increase the capacity of the video conferencing server.
In the long term, we believe that Google's cooperation with Vidyo to develop the VP9 SVC is a positive development for the industry. This agreement could significantly change the future of WebRTC technology and cut down SVC overhead costs.
Stass Soldatov is CTO of TrueConf.