The Twisted Path of Multimedia VDIThe Twisted Path of Multimedia VDI
Monitoring and troubleshooting multimedia communications in a VDI environment is much more complex than in a peer-to-peer network model.
January 1, 2013
Monitoring and troubleshooting multimedia communications in a VDI environment is much more complex than in a peer-to-peer network model.
There is a big push in many organizations to move to Virtual Desktop Infrastructure (VDI). This brings a whole new level of complexity to deploying, managing, and troubleshooting multimedia applications. An audio and/or video application may be operating over multiple paths, each with different types of encoding and operational characteristics, as seen in Figure 1.
Figure 1: Multimedia VDI Communications Path
There are three major flows in each direction for a bi-directional multimedia session to function.
* Client A VDI path to VDI Server 1, using Client A's VDI protocol.
* VDI Server 1 to VDI Server 2, using the native multimedia streaming protocol.
* Client B VDI path to Server 2, which may use a different VDI protocol than Client A's VDI protocol.
In addition, the reverse direction for each of the above paths may be used if a bi-directional multimedia session is in use.
Note that the VDI protocol can be different for each client, depending on the VDI server in use (i.e., Citrix or VMWare) or depending upon the client itself (i.e., computer, thin client, or tablet). And the network infrastructure between each client and its VDI server may be substantially different. For example, one client may be remote, such as at a satellite medical facility that is connected by a T1 access line.
Conferencing between three or more endpoints will require the addition of a Multipoint Control Unit (MCU). Looking only at VDI Server 1 and VDI Server 2 conferencing paths, two additional paths would be added to the infrastructure for communication with the MCU. In this situation, the implementation may be made more complex if the protocol for each VDI server is different, such as when one of the clients is running a low bandwidth codec while another client is running a high bandwidth codec.
The quest for optimum performance and cost savings may further complicate the system. For example, if the organization is doing traffic engineering, asymmetric paths may result. Troubleshooting connectivity or performance problems in this type of environment can drive up the time and effort that it takes to diagnose problems.
The good news is that there are ways to tackle the complex task of diagnosing any potential problems. The bad news is that more advanced diagnostic tools will be needed to perform the troubleshooting within reasonable time frames. Simple packet capture tools can be used to look for specific symptoms, but the time required to perform the necessary analysis may make their use uneconomical.
Identify the Problem
You may learn that there is a multimedia problem because the people using the systems are reporting audio dropouts or jerky video. Proactive monitoring of the systems along the path can allow you to determine that there are problems, potentially heading off poor performance before it gets bad enough to have a big negative impact on the multimedia sessions.
When you are doing proactive systems monitoring, it is useful to configure the multimedia endpoints to report call stats to a central call controller for further analysis; the multimedia endpoints are the VDI Server 1 and VDI Server 2 (not the "Clients"). The ideal is getting RTCP info every 5 seconds, which allows you to determine how frequently the problem occurs. Without RTCP, obtain data about peak jitter and total packet loss from the server's multimedia-client application.
In the case of problems between each VDI client and its server, you'll need to monitor whatever performance measurement points are provided by the VDI software. It also makes sense to monitor network traffic statistics on both the VDI client and VDI server, looking for TCP retransmission counts and things like duplicate ACKs. A good network management product can help with basic network statistics monitoring.
Finally, the new generation of Application Performance Management (APM) products may be able to provide insight into a problem. These systems capture and analyze packet flows to identify application performance problems. They can typically measure jitter, packet loss, and server turnaround time to isolate the problem to the network, or to the server or the application.
Isolate the Path Components
Once you know that a problem exists, you can begin troubleshooting. You will need to isolate the source of the problem to one of the path components: 1) Client A to Server 1; 2) Server 1 to Server 2; or 3) Server 2 to Client B. Remember that the problem can be with the data flowing in one direction and not the other, so treat each direction separately for each path.
Once you think that you know the paths involved, check them thoroughly to confirm. I've seen a case where the video traffic was going to an MCU located on the Internet instead of to the corporate internal MCU. Traceroute between the two video systems took an entirely different internal path. The path via the Internet experienced a lot of jitter and loss while the internal path was clean. It took weeks of work before someone spotted the discrepancy in the configuration. Packet captures with a network analyzer or with an APM can verify that the endpoint addresses are correct. You may also need to look for network traffic engineering that may route the video traffic via a path that is different than what traceroute tells you.
The benefit of VDI is that all communications in the above scenario is routed through the data center. The installation of packet capture probes (e.g., Gigamon, Anue, etc) in the data center can significantly aid in the capture and analysis of multimedia and VDI flows.
Next Page: Troubleshooting Tips
Troubleshooting Tips
The exact troubleshooting process depends on where you suspect the problem exists and the symptoms and data that you have been able to collect. Let's walk through a few possibilities.
High packet loss
Packet loss is generally due to two things:
* Interface congestion. When an interface is congested, it may cause high jitter as packets are queued for transmission. If the buffers overflow, it results in packet loss, and the interface "drop" counter is incremented. Even if there is no packet loss, jitter that is too high results in symptoms that are identical to packet loss because the packets arrive too late to be used. Therefore, large buffers are not advantageous. It is better to drop packets than to transport them all the way across the network, only to have them arrive too late to be used.
* Link errors. A modern network should experience very little packet loss. Packet loss of more than one packet out of 1 million (1x10-6 or 0.0001% packet loss) should be investigated and corrected. Packet loss has a particularly bad impact on TCP performance, as documented in one of my blog posts. TCP Performance and the Mathis Equation.
Interestingly, high packet loss may have less of an impact on a multimedia UDP session than on a VDI session that is running over TCP. The reason is that TCP may experience enough packet loss that it is running at a lower throughput than is required by the multimedia session (more true for video than for audio). So the client-server paths (each direction) may be potential sources of very poor performance if there is much packet loss.
For the server-to-client paths, a good test is to play a long video and look for dropouts. You may need to load a video on the server in order to make sure that any dropouts are due to the server-to-client path, and are not due to the time it takes data to reach the server from another location, before it's sent on to the client.
High Jitter
High jitter is likely due to large buffers or an improper or inconsistent QoS configuration. You will likely see this when the multimedia path between the VDI servers is long, for example if the servers are in separate data centers. Your network management system should not report interface drops (an indication of congestion). You may need to modify the number of buffers allocated to each QoS queue in order to minimize jitter and packet loss. An example of doing buffer tuning is in another of my blog posts: Diagnosing a QoS Deployment.
Active Troubleshooting
In some cases, you'll want to use active troubleshooting. Cisco's IP SLA and Appneta's PathView are good tools for generating active path testing data flows. Cisco's IP SLA can only be run from a Cisco router or switch, so you'll need to select the appropriate location from which to test. Appneta's product relies on probes to generate packet flows. If you use two probes, it can perform more detailed analysis with UDP packets than a single probe that has to use ICMP packets. Either tool can report packet loss and high jitter.
The advantage of these tools is that they can be deployed and left as ongoing active path testing tools. I like this approach enough that I've included the functionality in my network management architecture. The advantage of active path testing is that it can be done continuously, providing proactive notification of problems.
Summary
Monitoring and troubleshooting multimedia communications in a VDI environment is much more complex than in a peer-to-peer network model. But by breaking down the communications path to individual components and verifying the operation of each component, you can quickly determine the source of multimedia performance problems.