Sponsored By

4 Tips to Reduce Network MTTR4 Tips to Reduce Network MTTR

Automation and collaborative troubleshooting top off three basics of network operations to open new world of diagnosing problems, quickly.

Terry Slattery

January 5, 2021

4 Min Read
4 Tips to Reduce Network MTTR
Image: audy_indy - stock.adobe.com

We all wish we could reduce the mean time to repair (MTTR) of network outages. Here are several tips for doing so; each optimizes part of the diagnostic process and when combined result in big benefits.

 

The People, Process, and Technology Framework

IT operations revolves around a triad of people, process, and technology. Sharp people, working with known and well-rehearsed processes and using current technology, can produce remarkable results. We’ll see how the tips relate to the triad.

 

Slattery_Article_20Image.jpg

 

Tip 1: Employ Trusted Network Experts

Your network experts, whether they are employees or contractors, are the most important element. If you use contractors, it is best if they are working on your network on a regular basis so that they learn and understand your business and how the network supports the business.

 

In addition to learning how the network functions, they will learn its idiosyncrasies, which is where failures and slowdowns are most likely. It is this knowledge that enables your sharp people to make intuitive leaps regarding potential causes of problems.

 

Tip 2: Create Good Network Documentation

You’ll need good system documentation and network baseline data to validate what the network should look like and how it should function. If the necessary documentation doesn’t exist, take the time to create it. This is a critical process. Good documentation comprises:

  1. Network diagrams that show both physical and logical connectivity that’s so important to the troubleshooting process — Creating a single diagram that shows both can be challenging, so you may need multiple diagrams. You should be able to follow a network path between any two points and identify places to gather data or test hypotheses.

  2. Written policies that describe the network’s design, operation, and future growth — Policies should describe things like the network segmentation paradigms, addressing plan, site interconnectivity mechanisms, network management goals, and routing/switching policies.

  3. Documentation for network equipment refresh planning, upgrades to new technologies, and growth plans — Make sure to include diagnostic tools that are specific to any new technologies.

  4. Run-books that describe typical problems and the mechanisms that worked in the past for diagnosing them — A well-written run-book for a single scenario should allow a more junior network engineer to diagnose and remediate common problems successfully.

 

Tip 3: Develop Consistent Network Building Block Designs

Another process element is the use of consistent network building block designs to yield significant gains in simplification, documentation, monitoring, and troubleshooting. You should tie the building block designs to equipment refresh cycles. Each cycle may (it doesn’t have to) result in a slightly different design and new equipment with new configurations. Occasionally, you’ll have a significant change that drives an entirely new design paradigm, such as the switch from MPLS to SD-WAN (and the just-starting change to secure access service edge (SASE). This may be the opportunity to implement a more widespread change if the savings offsets any residual value or cost of the old implementations. Note that you’ll need new design and troubleshooting documentation to go with changes in the building block designs you adopt.

 

Don’t fall for enticements to use shiny new products and features or to switch vendors. Rather, only implement changes with sound reasoning. Standardization means that you sometimes give up on some of these things to make the network easier to monitor, manage, and troubleshoot. The place for new technology is in the lab, during the process of creating new building block designs.

 

Tip 4: Accelerate Diagnosis with Automation

Gone are the days of manually logging into network equipment and collecting troubleshooting information from the command line interface. Network automation (the technology component in the diagram above) is not just for deploying new configurations. In fact, using automation for the rapid collection and correlation of the same data as the manual process simply accelerates the diagnostic process. Because collecting diagnostic data is a read-only operation, automating this process causes no risk to the network — an objection that some people have regarding automation.

 

Coupling automation with trouble-ticketing systems and UC collaboration tools yields a powerful system that’s able to perform diagnostic data collection quickly and push the results into a chat space where the network team, regardless of their location, can view it and collaborate on troubleshooting. This method of operation has a name: ChatOps. This is a true paradigm shift in network troubleshooting that promises to reduce the MTTR.

 

Summary

Excellence in operations begins with the basic framework: people, process, and technology. The integration of these elements and the depth of their use is what results in gains in the troubleshooting realm. The first three tips have been around for as long as we’ve been doing networking and should be part of basic network operations. The last tip, regarding the use of automation, has seen sporadic use until recently, when the scale of networks mandated the switch to automation. The automation of troubleshooting data collection with network team collaboration tools has opened a whole new world for reducing the time to diagnose network problems.

About the Author

Terry Slattery


Terry Slattery is a Principal Architect at NetCraftsmen, an advanced network consulting firm that specializes in high-profile and challenging network consulting jobs.  Terry works on network management, SDN, network automation, business strategy consulting, and network technology legal cases. He is the founder of Netcordia, inventor of NetMRI, has been a successful technology innovator in networking during the past 20 years, and is co-inventor on two patents. He has a long history of network consulting and design work, including some of the first Cisco consulting and training. As a consultant to Cisco, he led the development of the current Cisco IOS command line interface. Prior to Netcordia, Terry founded Chesapeake Computer Consultants, a Cisco premier training and consulting partner.  Terry co-authored the successful McGraw-Hill text "Advanced IP Routing in Cisco Networks," is the second CCIE (1026) awarded, and is a regular speaker at Enterprise Connect. He blogs at nojitter.com and netcraftsmen.com.