Sponsored By

One Beer Too ManyOne Beer Too Many

The "One Beer Too Many" problem occurs in IT systems when an event occurs and the system can not recover without manual intervention.

Sorell Slaymaker

March 10, 2009

3 Min Read
No Jitter logo in a gray background | No Jitter

The "One Beer Too Many" problem occurs in IT systems when an event occurs and the system can not recover without manual intervention.

One challenge to centralized IP Telephony system availability is recovering from network outages.The "One Beer Too Many" problem occurs in IT systems when an event occurs and the system can not recover without manual intervention. The demand on system resources to recover from the event is greater than what the system can provide. In college, when one drank more beer than the body could handle, bad things would happen. The mantra was to avoid the "One Beer Too Many" problem.

When voice travels over an IP network, and the network stops passing traffic for more than 5 seconds, users will hang up and try to redial. If the outage is long enough, the IPT phones will try to reregister. This instantaneous demand for phone registration and/or call set-up can overwhelm an IPT system. Once overwhelmed, the IPT system will exhaust its resources trying to manage all the incoming requests, and the end systems will keep sending new requests, especially when they get intermittent responses. The result is a few-second network outage can cause an IPT system to be unavailable for hours.

Traditional phone systems were distributed with a PBX at every business site. With IPT, all the call control functions are being centralized to lower costs (economies of scale and support) and enable integration of voice with other services.

Skype experienced a similar problem in August of 2007 when a Windows update triggered a massive number of Skype clients to re-register in a short period of time. In my experience, I have seen large IPT deployments suffer the same problem. For example, in a virtual call center model, if the entire WAN stops passing traffic for a few seconds, all the callers and agents will hang up and immediately try to redial/reconnect.

To avoid the "One Beer Too Many" problem in centralized IPT systems, one should:

* Throttle--Limit the number of call set up and registration requests per second. For example, Session Border Controllers (SBCs) can limit the number of new in-bound calls from a carrier.

* Head-room--Design server clusters with enough capacity to handle peak demand as defined as number of calls or registrations handled per second.

* Sub-Second Network Rerouting--Design the core IP network to reroute sub-second and a dual carrier MPLS WAN site to reroute in under 5 seconds.

* Test--After major changes, have a capacity testing plan. Empirix and other vendors provide on-site and hosted solutions to do voice capacity testing.

* Monitor--Track the 5-7 Key Performance Indicators (KPIs) on all network and IPT equipment. Set green, yellow, and red thresholds, alarm and create monthly reports from this data. Core routing topology changes should also be monitored.

* Definition of an Outage--IP network managers should define a network incident as one where IP traffic flow was impeded for greater than 5 seconds. Service Level Agreements (SLAs) should be defined not only in terms of latency, jitter, dropped packets, and overall availability, but number of incidents.

IP networks will have outages. The goal is not only to prevent outages from occurring, but being able to recover quickly after they occur. Thanks to redundancy, most network outages are not due to a device failing, but when a device is sick and cannot keep up with demand. The same is true with IPT. As they say in the corporate world: One "aw sh*t" negates a hundred "that a boys."The "One Beer Too Many" problem occurs in IT systems when an event occurs and the system can not recover without manual intervention.

About the Author

Sorell Slaymaker

Sorell Slaymaker has 25 years of experience designing, building, securing, and operating IP networks and the communication services that run across them. His mission is to help make communication easier and cheaper, since he believes that the more we all communicate, the better we are. Prior to joining 128 Technology as an Evangelist in 2016, Sorell was a Gartner analyst covering networking and communications. Sorell graduated from Texas A&M with a B.S. in Telecom Engineering, and went through the M.E. Telecom program at the University of Colorado.

On the weekends, Sorell enjoys being outside gardening, hiking, biking, or X-skiing. He resides in St. Paul, Minn., where he has grown to appreciate all four seasons of the year, including camping in January.