Sponsored By

Monitoring a Software Defined Network, Part 3Monitoring a Software Defined Network, Part 3

Should the SDN monitoring system be integrated with the controller, or should it be a separate system?

Terry Slattery

February 11, 2014

11 Min Read
No Jitter logo in a gray background | No Jitter

Should the SDN monitoring system be integrated with the controller, or should it be a separate system?

Note: My discussion of SDN monitoring covers several topics. Here are two prior posts:
1. Monitoring the SDN data plane
2. What parameters to monitor in an SDN control system

Monitoring Requirements
Defining the requirements for monitoring a software-defined network (SDN) is an interesting challenge, considering the fact that the definition of an SDN itself is still under debate. However, there are several things that we know about SDNs, and we can use them to help us begin to construct a management architecture.

We know that an SDN will contain a control system. The control system could be a single controller or it could be a multi-controller system, depending on the complexity and resilience requirements. A larger network may need multiple controllers in order to distribute the load. A highly resilient system will likewise need multiple controllers in case one controller fails or loses connectivity. So let's start with a requirement to monitor multiple controllers.

SDNs are very agile. One of the benefits is to be able to adapt to the application's needs very quickly. The monitoring system will need to be just as adaptable. It will need to track which network interfaces are active. It should be able to display a map of how an application uses the network. If overlays are in use, show the overlay and how it maps to the physical underlay network.

We explored monitoring of the data plane components in the first blog of this series--the monitoring requirements there are no different than those of a traditional network management system. Since much of the data plane monitoring is based on periodic performance measurements, it makes sense to use a time-series database to store the data. Note: I've never seen a relational database handle the volumes of data produced by a network performance monitoring system; a time-series database is the proper tool for this job.

Data will need to be collected from all the interfaces in the data plane (that's the only way to detect certain classes of errors) and store it. This requirement dictates the use of high-throughput disk I/O. We will also need to gather the data from physical switches, so we'll need network connectivity to those switches.

Non-performance data, such as hardware and software inventory, device configurations, policy definitions, and functional data are much lower volume, and change much less frequently, making them suitable for storage in a relational database.

The network administrators will need to use the monitoring system to turn the raw data into actionable information that can be presented through a graphical user interface. Most network management systems now use a web-based GUI that is accessible from almost anywhere. Data warehouse technology and big-data technology can be used to transform raw data into actionable information, reducing the volume of information and making the GUI very responsive.

The Architecture
What should the architecture of the monitoring system be? Should it be based on a distributed system, a centralized system, or something else? Should it be built into the SDN control system? Or should it be a separate system?

There is a temptation to embed the monitoring system within the controller, since that is where all the data about the state of the system resides. This approach will work as long as the monitoring system can merge data from multiple controllers. However, in the past, the network monitoring development staff has been separate from the network equipment development staff within vendor organizations. Decoupling the two groups allows each group to ship products independently of each other, which is an advantage to companies that are attempting to get to market quickly.

Separating the controller and the monitoring system also means that a different company can develop each product. Each company can then focus on what it does best. I also expect to see existing network management products incorporate an SDN monitoring and management function, alongside their existing functionality.

Another argument in favor of building the monitoring system into the SDN controller is that the architecture could result in lower disk I/O requirements on each controller's monitoring system. But a distributed monitoring system architecture could provide the same benefit without being integrated into the SDN controller.

A distributed monitoring system architecture has a lot of benefits. Since the system is distributed, this architecture could result in lower disk I/O requirements on each monitoring component, which is an advantage. Local processing of the collected data may also be an advantage in this architecture, although moving the data to a data warehouse solution may be a better alternative. A monitoring system using this approach will need to make sure that it provides a view of the overall network.

Now let's examine the contrary point of view. Integrating an SDN management system within the controller creates a software update problem. As we've historically seen, network management software gets released on a different schedule than network control software. Decoupling the SDN controller from the management system makes it easy to perform software upgrades on both components.

It also makes a lot of sense to take advantage of x86-based server virtualization to run the management software, just as other network functions are being virtualized. This makes it much easier to perform software maintenance and upgrades. And when the hardware needs to be upgraded, simply migrate the management VM to a new platform.

The only problem with this approach is that network management systems tend to need high performance I/O subsystems for storing and processing the collected data. However, by creating a distributed architecture that partitions the data-handling task into enough pieces, the performance problem disappears.

Summary
I predict that future network management systems (NMSs) will start to look like many of the big data applications that we see enterprises using today. There is no compelling reason to integrate the SDN NMS into the controller. While we need to collect data from the SDN controllers, that can be done through some north-bound APIs that provide a clean interface. One SDN NMS collector can then monitor multiple SDN controllers.

With a distributed architecture, the NMS can handle large-scale SDN monitoring, multiple SDN domains, and non-SDN networking infrastructure, providing a view of the entire network. With the right internal architecture and components, the NMS can adapt to changes in the network, starting and stopping monitoring VMs as needed. In this way the NMS becomes as flexible and as agile as the underlying network.

Want a deep dive into SDN and its impact on communications? Attend Terry Slattery's workshop on the topic at Enterprise Connect Orlando 2014!

Monitoring Requirements
Defining the requirements for monitoring a software-defined network (SDN) is an interesting challenge, considering the fact that the definition of an SDN itself is still under debate. However, there are several things that we know about SDNs, and we can use them to help us begin to construct a management architecture.

We know that an SDN will contain a control system. The control system could be a single controller or it could be a multi-controller system, depending on the complexity and resilience requirements. A larger network may need multiple controllers in order to distribute the load. A highly resilient system will likewise need multiple controllers in case one controller fails or loses connectivity. So let's start with a requirement to monitor multiple controllers.

SDNs are very agile. One of the benefits is to be able to adapt to the application's needs very quickly. The monitoring system will need to be just as adaptable. It will need to track which network interfaces are active. It should be able to display a map of how an application uses the network. If overlays are in use, show the overlay and how it maps to the physical underlay network.

We explored monitoring of the data plane components in the first blog of this series--the monitoring requirements there are no different than those of a traditional network management system. Since much of the data plane monitoring is based on periodic performance measurements, it makes sense to use a time-series database to store the data. Note: I've never seen a relational database handle the volumes of data produced by a network performance monitoring system; a time-series database is the proper tool for this job.

Data will need to be collected from all the interfaces in the data plane (that's the only way to detect certain classes of errors) and store it. This requirement dictates the use of high-throughput disk I/O. We will also need to gather the data from physical switches, so we'll need network connectivity to those switches.

Non-performance data, such as hardware and software inventory, device configurations, policy definitions, and functional data are much lower volume, and change much less frequently, making them suitable for storage in a relational database.

The network administrators will need to use the monitoring system to turn the raw data into actionable information that can be presented through a graphical user interface. Most network management systems now use a web-based GUI that is accessible from almost anywhere. Data warehouse technology and big-data technology can be used to transform raw data into actionable information, reducing the volume of information and making the GUI very responsive.

The Architecture
What should the architecture of the monitoring system be? Should it be based on a distributed system, a centralized system, or something else? Should it be built into the SDN control system? Or should it be a separate system?

There is a temptation to embed the monitoring system within the controller, since that is where all the data about the state of the system resides. This approach will work as long as the monitoring system can merge data from multiple controllers. However, in the past, the network monitoring development staff has been separate from the network equipment development staff within vendor organizations. Decoupling the two groups allows each group to ship products independently of each other, which is an advantage to companies that are attempting to get to market quickly.

Separating the controller and the monitoring system also means that a different company can develop each product. Each company can then focus on what it does best. I also expect to see existing network management products incorporate an SDN monitoring and management function, alongside their existing functionality.

Another argument in favor of building the monitoring system into the SDN controller is that the architecture could result in lower disk I/O requirements on each controller's monitoring system. But a distributed monitoring system architecture could provide the same benefit without being integrated into the SDN controller.

A distributed monitoring system architecture has a lot of benefits. Since the system is distributed, this architecture could result in lower disk I/O requirements on each monitoring component, which is an advantage. Local processing of the collected data may also be an advantage in this architecture, although moving the data to a data warehouse solution may be a better alternative. A monitoring system using this approach will need to make sure that it provides a view of the overall network.

Now let's examine the contrary point of view. Integrating an SDN management system within the controller creates a software update problem. As we've historically seen, network management software gets released on a different schedule than network control software. Decoupling the SDN controller from the management system makes it easy to perform software upgrades on both components.

It also makes a lot of sense to take advantage of x86-based server virtualization to run the management software, just as other network functions are being virtualized. This makes it much easier to perform software maintenance and upgrades. And when the hardware needs to be upgraded, simply migrate the management VM to a new platform.

The only problem with this approach is that network management systems tend to need high performance I/O subsystems for storing and processing the collected data. However, by creating a distributed architecture that partitions the data-handling task into enough pieces, the performance problem disappears.

Summary
I predict that future network management systems (NMSs) will start to look like many of the big data applications that we see enterprises using today. There is no compelling reason to integrate the SDN NMS into the controller. While we need to collect data from the SDN controllers, that can be done through some north-bound APIs that provide a clean interface. One SDN NMS collector can then monitor multiple SDN controllers.

With a distributed architecture, the NMS can handle large-scale SDN monitoring, multiple SDN domains, and non-SDN networking infrastructure, providing a view of the entire network. With the right internal architecture and components, the NMS can adapt to changes in the network, starting and stopping monitoring VMs as needed. In this way the NMS becomes as flexible and as agile as the underlying network.

Want a deep dive into SDN and its impact on communications? Attend Terry Slattery's workshop on the topic at Enterprise Connect Orlando 2014!

About the Author

Terry Slattery


Terry Slattery is a Principal Architect at NetCraftsmen, an advanced network consulting firm that specializes in high-profile and challenging network consulting jobs.  Terry works on network management, SDN, network automation, business strategy consulting, and network technology legal cases. He is the founder of Netcordia, inventor of NetMRI, has been a successful technology innovator in networking during the past 20 years, and is co-inventor on two patents. He has a long history of network consulting and design work, including some of the first Cisco consulting and training. As a consultant to Cisco, he led the development of the current Cisco IOS command line interface. Prior to Netcordia, Terry founded Chesapeake Computer Consultants, a Cisco premier training and consulting partner.  Terry co-authored the successful McGraw-Hill text "Advanced IP Routing in Cisco Networks," is the second CCIE (1026) awarded, and is a regular speaker at Enterprise Connect. He blogs at nojitter.com and netcraftsmen.com.