NMS Misconceptions: “Our Network Won’t Crash. We Have Redundancy.”

The Redundancy Myth

Most IT personnel rightfully associate “stability” with not “crashing.” The industry reflects this way of thinking, standardizing on mean time between failures (MTBF) as an accepted measure of reliability. But with the advances in hardware and communications protocols, natural disasters aside, you may be hard pressed to remember the last time you had a complete network outage. In fact, modern network architectures purposefully build in redundancy to the point that complete outages are largely a thing of the past. A false sense of security, however, can cause you to miss more pervasive and increasingly frequent instability issues.

Financial Services Company Lacks Link-Level Visibility

That was precisely the experience of a major financial services company before using Entuity. They had already invested millions of dollars in a network management framework and spent thousands of labor months deploying the application. Unfortunately, the framework lacked the link-level visibility to indicate that the primary fiber connection between the company’s two main campuses had been down for some time.

The cause? A construction crew installing new utility poles alongside the parking lot severed the backup fiber link. Ordinarily, this would not have been a problem—if, that is, the IT staff had known that the primary link had failed and had been able to fix it right away. But because their framework solution couldn’t see the failure, the staff didn’t know their redundancy had been compromised and that they’d been running over the backup link for weeks before it was severed. In addition to the panic, overtime costs to retrench, and the emergency service call from the fiber repair contractor, this outage impacted millions in financial transactions for this company. In addition, the framework solution was a huge investment, yet failed to alert them of the risk to their redundancy.

Had the company been using Entuity, they would have had the benefit of several features that would have quickly notified the IT staff that trouble was ahead. These features include continually updated inventory, topology maps that provide clear visual indicators of the status of devices, ports, links, and services, and an embedded Event Management System (EMS) that eliminates event storms and allows focus on the most important events. Entuity also enables service-level monitoring, which would have allowed the team to define a service that consisted of the primary and backup links.

By monitoring their redundancy as a service, the team would have had port-level visibility meaning they would have known that the port for the backup connection (which was normally down) came up (and they were now running on an expensive link) and that the primary link had failed. They would have further known about the subsequent failure of the backup.

Entuity Helps Financial Trading Company Avoid Disaster

Another financial trading company discovered how little they knew of their network’s health during an evaluation of Entuity. Shortly after installing the software at their facility, Entuity alerted an operations manager to a fan failure in one of their core switches. There was a secondary fan still in operation, but the temperature data being collected by Entuity clearly showed an increase that would go critical in a few days.

The software evaluation team took a brief intermission to immediately fix the situation. On continuing the evaluation, investigation of a core router revealed a secondary power supply failure—a risky situation. Obviously, the team took another intense intermission. In both cases, their current “red-light, green-light” monitoring tools reported that the devices were “up,” but left them blind to imminent doom. A failure in either of the two cases would have left business users without access to the trading applications they need to complete their jobs. Hundreds of users idle and customers unable to make trades would certainly have been a disaster.

NMS misconception paper redundancy power supply critical screen

Entuity Event Management System revealing a power supply in a critical state. Such real-time alerts about devices and their sub-components give IT staff the visibility they need to significantly reduce the risk of catastrophic failures.

Summary

The consequences of the lack of visibility into the true state of the network can be severe. Traditional framework network monitoring solutions and lower-end tools simply don’t go deep enough to catch all or most of the issues that may pose a threat to the network. Entuity starkly contrasts with these products by providing detailed, easily accessible network information that enables businesses to catch potential problems before they happen. Features such as continuous auto discovery, up-to-date topology maps, an advanced EMS, service, device, and port level monitoring, and industry-leading dashboards and reports enable companies around the world to dramatically reduce the risk factors that may be lurking just below the surface of their networks.

Share This