According to Forbes contributor Micheline Maynard’s July 8, 2015 article, United’s Global Computer Outage Was A Big Communications Fail, United said it suffered a “connectivity issue”. In IT, that’s also known as a network issue.
What could have caused this? What could have prevented this, and why did it take so long to diagnose and repair?
If a company does not have up to date and complete visibility into 100% of its computer network topology, they, like United, are at risk of major network outages. Without 100% coverage of their computer networks, finding and fixing the cause of an outage or slowdown is time consuming. The man hours and SWAT team approach dedicated to finding and fixing the problem, as well as lost business opportunities, can be extremely costly. It’s well documented that in today’s world of ecommerce, the competition is one click away.
Additionally Twitter, Facebook, Snapchat and other social media enable bad service experiences to trend at lighting speed across the globe, thereby damaging a company’s reputation and impacting future business. For example, the news channels were riddled with customers vowing to never fly United again. The impact of this outage may be felt for years to come.
How can network outages like the one at United be avoided? Early detection of network issues are the key and network monitoring is the answer.
All companies like United monitor their networks. The greatest challenge is monitoring the entire network, that is to say 100% coverage of all network infrastructure. The fact is not all network monitoring solutions are up to the task.
Many commercially available solutions do not automatically detect changes in network architecture. If networking equipment is added and goes undetected, then it won’t or can’t be monitored. Unmonitored network equipment creates exposure to security breaches, exposure to faults and slowdown that impact end user experience, and exposure to outages. Furthermore, because the network equipment is unknown and undocumented, it’s often difficult to physically locate, thereby increasing mean time to recovery and repair.
What’s needed is a monitoring solution that continuously searches for changes in network topology and alerts network managers when there’s a change, specifically if that change is an unmonitored piece of network equipment. Then, and only then, can one insure that 100% of a company’s critical IT network is protected from faults and failures like the one United experienced in July of 2015.