Real-Time Monitoring ToolsReal-Time Monitoring Tools
Back on February 19th I posted a note about real-time monitoring and why it is so necessary for supporting voice and video on the enterprise network. Today I want to look at some of the tools and vendors that provide this monitoring, and look at the tradeoffs for using the different methodologies they employ.
March 11, 2008
Back on February 19th I posted a note about real-time monitoring and why it is so necessary for supporting voice and video on the enterprise network. Today I want to look at some of the tools and vendors that provide this monitoring, and look at the tradeoffs for using the different methodologies they employ.
Back on February 19th I posted a note about real-time monitoring and why it is so necessary for supporting voice and video on the enterprise network. Today I want to look at some of the tools and vendors that provide this monitoring, and look at the tradeoffs for using the different methodologies they employ.Agent Based Tools: These tools place either a hardware or a software agent at distributed points around the enterprise network. A centralized control then scripts these agents to run tests that simulate real-time traffic flows like multiple voice or video conferencing calls. The receiving agent is able to determine if there was any packet loss or jitter caused by the network. Some agents are time-synchronized so that one-way latency can also be measured. By correlating the quality experienced on these test calls with the portions of the network topology tested, the system can identify those links that are causing poor performance. Storing the information in a database then allows the tools to see how performance changes over time and cause appropriate alarms to create trouble tickets where needed.
Viola Networks, NetIQ and Brix all have versions of this approach. The advantage of an active agent is that the network is tested all the time, data is collected on regular intervals, and trends and predictive algorithms can use that data to identify problems before they become bad enough to affect the user experience. The down-side of these tools is that those agents have to be deployed around the enterprise. For businesses that have a very large number of small offices this could create a high cost in capital or logistics.
ICMP Based Tools: In a category by itself is Apparent Networks with their AppCritical tool. This tool sits in a central location tied to the core of the network and sends bursts of ICMP packets towards each endpoint being monitored. The tool then detects the returning packets (or their absence) as well as their timing. From this information AppCritical calculates packet loss, jitter, apparent bandwidth and latency. The tool also has sophisticated heuristics to identify specific network problems like duplex mismatches and routers with high CPU utilization.
This approach has two wonderful advantages. First is a very simple deployment since only a single server at the network core is required. Second is that the tool can not only test to any endpoint but can also test to each intermediate router hop along a path. Thus if packet loss is occuring, for example, AppCritical can indicate at which router hop the packet loss begins. This shortens the diagnostics cycle considerably.
The downside to this approach is that ICMP packets are not usually given QoS treatment in the network, thus testing QoS is difficult. Some networks may ban ICMP all together, or the bursts from AppCritical may be viewed as an attack by a network intrusion detection system.
Active monitoring of endpoints & routers: This might be considered the old fashioned approach, tools that monitor the devices along the path. If endpoint devices are measuring network statistics (which they often do), then this information can be collected either dynamically (during the call) or after the fact (from call data records or CDRs). The endpoint is located in exactly the right place to report on network statistics of interest, but won't provide details of where the problem was occuring in the network. Collecting statistics from lots of endpoints often lets the network team isolate problems based on who is having issues and who is not.
Some tools will also watch the router queues to see where drops are occurring. This can help find congestion problems or Layer 2 errors such as those caused by duplex mismatches. But unless these tools are looking at every device along all the paths, it can't find all the problems like the endpoint can. Many traditional network measurement vendors are providing this kind of tool, including NetScout and Solar Winds.
CDR Analysis: I mentioned CDRs in the last section. Collecting CDR information will not help solve the problem a user is having right now, but it will provide information on how well the network is behaving overall, and allow trends to be determined. Isolation can be done by noting which endpoints are having problems and which are not. This information can then trigger a trouble ticket and more detailed tools can be focused on a specific area of the network to isolate the problem further.
NetQoS and Prognosis both have tools that capture this kind of information in VoIP networks, and collect it for analysis. Video conferencing gatekeepers or management systems also collect CDR information but don't yet have the sophistication necessary to analyze them for network faults.
Quality of Experience: There are some new innovative vendors now pushing tools that will not only look at network faults, but will analyze the signal waveforms in a VoIP data stream to predict the user experience. I am doing a session on this at VoiceCon in Orlando (Tuesday morning at 8:00), come on by and say hello if you are there. I'll write more about this kind of measurement in a future posting.