Network Telemetry: A New Network Management ModelNetwork Telemetry: A New Network Management Model
Streaming network telemetry provides several key network management benefits, but network admins need to consider the trade-offs.
November 26, 2021
When considering a replacement of the Simple Network Management Protocol (SNMP), enterprises might want to consider streaming network telemetry as the data collection mechanism for network management systems. But how does it work, and what are the tradeoffs?
What Is Streaming Network Telemetry?
Streaming network telemetry is an alternative method for collecting network device statistics, theoretically replacing SNMP. Of course, you need to consider the trade-offs. The answer to the question, which is better, as usual, boils down to it depends.
Streaming telemetry sends data as it becomes available, providing a more real-time view of the network. SNMP’s periodic polling may miss critical data between polling intervals, such as an interface flap (down and back up) or micro-bursts. Techniques for detecting some events exist, but they require polling for additional data that must also be stored and analyzed.
Telemetry allows devices to report any meaningful change in some device statistic without the overhead of periodic polling. Simply configure it to watch the statistic and send the data when a threshold is crossed, such as interface utilization or packet loss. While some data analysis must occur on the device, it is frequently less demanding than handling periodic polling requests. A brief summary of the benefits of streaming telemetry is in the article: Streaming Telemetry: View from the Trenches.
On the other hand, SNMP relies on a history of industry-defined management information bases (MIBs) that define the device data model. Vendor-specific MIBs are used to provide extensions to the industry-standard MIBs. Telemetry definitions, in contrast, rely on YANG-based device models, some of which are being standardized but are frequently vendor-specific.
How Streaming Telemetry Works
Telemetry data streams can be established by the device or by the telemetry server. Devices that establish the telemetry stream must be configured with the address of the telemetry servers (there may be more than one), through the command line interface or via an API. Devices that support receiving the connection from the telemetry server may use a simpler configuration if the system supports an API through which the server can subscribe to the desired data. You can think of the difference between the two connection methods as dial-in versus dial-out. The subscription model is easier to implement centralized changes to the streamed data across the entire network, provided the system design supports it.
You should investigate the details of the streaming telemetry system operation and how you can detect when it isn’t working properly. For example, when the telemetry data connection fails and needs to be re-established, is it a manual process, or is there a retry timer? What mechanism exists to report when a device is no longer sending data so that you can take the appropriate action?
In today’s security-centric networks, it will be important for both the device and the telemetry server to incorporate authentication and encryption of the data stream. It’s a good idea to learn from your vendors about any other security concerns they have identified.
Streaming telemetry is encoded as either JSON, XML, or GPB (Google Protocol Buffers, an efficient binary encoding used by gRPC – Google remote procedure call). Sending a large volume of data via text-based encoding like JSON or XML may impose a significant load on the sending device, so pay attention to these details. The transport can be over HTTP/HTTPS using TCP, over UDP, or gRPC. Transport over UDP may experience packet loss, but creates less load on the device, so there’s another trade-off.
The higher-level device data uses YANG data models to provide a level of device independence. So, a router from one vendor uses a model that’s like that used by another router vendor. The common YANG model greatly simplifies the functionality of the NMS that’s ingesting the telemetry.
One of the downsides to streaming telemetry is that it is a relatively new technology, and each vendor uses a different configuration, encoding, and transport mechanism. The same variation applies to configuration, with variations in CLI and APIs. The OpenNMS platform includes some streaming telemetry support, and there are various other smaller open-source projects building telemetry servers. Vendor-specific NMS platforms will clearly be the easiest path to supporting a particular vendor’s devices. You should carefully examine the types of data that can be collected and what functionality the NMS offers for data retention and alerting. For example, you may find that vendors don’t support monitoring MP-BGP and VXLAN EVPN using telemetry.
Summary
Streaming telemetry offers many advantages over traditional SNMP monitoring but with some tradeoffs that are currently limiting general acceptance. I expect the YANG model standardization effort to make it easier to adopt telemetry. Networks will gradually become more capable of supporting telemetry as more products incorporate it.
Until then, SNMP is still a valuable network monitoring protocol because of its standardization and widespread deployment. It can perform acceptably well in large networks. Several NMS products can poll interface performance data from over a million interfaces every minute from a single engine, so scaling isn’t the problem that some detractors imply.
A more important impediment to the adoption of streaming telemetry is network management platforms that support both streaming telemetry and efficient SNMP, combining the data from both in a single platform.