Network Management: Covering Your Assets while Maximizing Operational Efficiency
The Ultimate Goal
The ultimate goal of Network Management is to lower risk, eliminate downtime and slowdowns, minimize variability, and actively support business expansion.
Today’s networks are irrevocably intertwined with the daily operations of most organizations. The amount of business conducted that doesn’t rely on available and performing networks is small and shrinking.
Even though networks are an integral part of business operations, in most cases the network is not the business. Good Network management is an insurance policy that protects the company from a catastrophic outage of vital networks that would have major business impact.
Network management operational tools have two mandates:
- Get the job done in accordance with business needs
- Do it at a reasonable cost
This is operational efficiency, which has two variables: Coverage and Cost.
The rest of this paper discusses the Entuity network management solution and how it incorporates each of these elements to meet the demands of today’s networks. It also includes a brief discussion of Entuity’s integration approach.
One of network management’s major challenges is operational efficiency. Even if your existing network management solution provides the coverage your company needs, you need to understand the cost components, not just the purchase price of technology. Moore’s law and ongoing decreasing technology costs have shifted the IT cost balance. Operational cost in the form of personnel costs is often many times greater than the purchase price of the technology. Even “free” software isn’t free, and in many cases could turn out to be the most expensive option.
Companies need to focus on reducing the amount of personnel effort needed to manage the network through better architecture, automation, simplicity and business focus. This will ultimately require fewer resources to keep the “lights on” and enable more progress toward innovation. It’s all about automating the technology to do more, increasing the span of control and increasing personnel efficiency. Getting more done with less was one of the overwhelming factors that drove virtualization’s success.
Another cost often overlooked is the amount of time and personnel needed to resolve a problem when it’s detected. This is where integrated functionality and deep data/history (rather than non-integrated pieces) can save valuable time to provide root cause analysis quickly and avoid an all-hands-on-deck, prolonged and costly scramble for a solution.
This is one of the most fundamental, but often overlooked, questions regarding network management. There is an idea held by some that network management isn’t innovative and the technology has been around for years. They think they’re “covered.” However, networks are not static. New technologies are introduced continually: the Internet, virtualization, wireless networks and devices, newer and more powerful switches and routers, and so on. The question is: Is the network management technology you use today is keeping up with the network it’s managing? If you don’t regularly evaluate your network management solution, you either settle for a coverage gap or restrain the use of advanced technology for your business.
Coverage has two dimensions: depth and breadth. Does your network management technology cover the breadth of the network devices or are there gaps? And is the data collected deep enough to understand potential risks? Network management technology has to evolve along with the network or coverage is affected.
There’s a lot more happening on your network than you think, and it could be costing you real money without your knowledge.
Just as with high blood pressure or high cholesterol (the human “silent killers”) operating your network beyond threshold or at capacity or with unaddressed issues could be the silent killer for your enterprise. Modern, advanced network management tools allow you to monitor and control the health of your network 24×7, giving you the proactive diagnostic capability to keep your business in good health and prevent catastrophe.
“Our Network Won’t Crash. We Have Redundancy”
Most IT personnel rightfully associate “stability” with “crashing”—more accurately, the length of time between crashes. The industry has standardized on Mean Time Between Failures (MTBF) as an accepted measure of reliability. But with the advances in hardware and communications protocols, natural disasters aside, you’re probably hard pressed to remember the last time you had a complete network outage. In fact, modern network architectures purposefully build in redundancy to ensure complete outages are a thing of the past. A false sense of security, however, can cause you to miss more pervasive and increasingly frequent instability issues.
Case Study: No Circuit Level Visibility
That was precisely the experience of a major financial services company before using Entuity. They had already invested millions of dollars in a network management framework and spent thousands of labor months deploying the application. Unfortunately, the framework lacked the circuit level visibility to indicate the primary link between two main campuses had been down for some time. A construction crew installing new utility poles alongside the parking lot severed the backup fiber link. Ordinarily this would not have been a problem. Unfortunately for them, no one knew that the primary fiber link had failed some time ago and they’d been running over the backup for weeks. In addition to the panic, overtime costs to retrench, and an emergency service call from the fiber repair contractor, this outage impacted millions of financial transactions for this company.
Another financial trading company discovered how little they knew of their network’s health during an evaluation of Entuity. Shortly after installing the software at their facility, Entuity alerted an operations manager to a fan failure in one of their core switches. There was a secondary fan still in operation, but the temperature trend data being collected by Entuity clearly showed an increase that would go critical in a few days. The software evaluation team took a brief intermission to immediately fix the situation.
“The Devices Are Up. Service Will Be Fine”
Businesses today rely on bandwidth-intensive and latency-sensitive enterprise applications such as VoIP, streaming media, CRM, ERP, PDM, or a multitude of SaaS applications like Salesforce.com. They build Internet connected trading or partner networks, where any delays in transactions can ripple through work stoppages and cost overruns throughout numerous organizations. Whether a network serves one building, multiple campuses, or divisions across the globe, instability is commonly caused by misconfiguration or unknown change. The dynamic nature of mobility technologies and network access technologies today makes this issue less obvious and more insidious.
Identifying and correcting duplex mismatches is not a one-time, “set and forget’ task.” It’s an ongoing issue in today’s fluid, constantly evolving enterprise infrastructure.
Take, for example, a high-tech manufacturing company that was performing maintenance on their corporate application servers. Their Microsoft Exchange server needed requisite updates of security patches, so the operations team scheduled down time for 3:00 AM on a Saturday to minimize any impact on the user community. Patching and updating went without a hitch, the server rebooted successfully, and everyone went home happy.
Come Monday morning, however, the moods were not as light. As the wave of business users sat down at 8:00 AM with their cup of coffee to check their email, the operations call center was flooded with the dreaded complaint, “the network is slow and Outlook says the connection to the Exchange server is lost!” Realizing that most problems come as a result of change and that the security patches were the last change made to the server, operations staff quickly uninstalled them, rebooted and breathed a sigh of relief, but the calls kept coming.
Before the day was out, five IT technicians spent half a day troubleshooting the problem while the user community waited. The final fix was to swap out the server with a spare and relegate the original server to off-hour file backup alone. That also didn’t work. What was wrong? It wasn’t until weeks later when they upgraded their open source network monitoring tools for Entuity that they were able to immediately identify the true cause of this problem. Uniquely sensitive to duplex mismatch issues, Entuity alerted the network manager that the NIC reverted to half-duplex state during the server reboot, causing incompatibility with the switch. It wasn’t the security patch at all.
“Our SLAs Guarantee High Levels of Service, Don’t They?”
Another common practice in the IT industry that is directly affected by network instability is found in Service Level Agreements (SLAs) for both the provider and the recipient of the service. An agreed upon level of service is determined, contracted, measured, and billed. Deliver as contracted and you will get paid and there will be no penalties. Better yet, deliver above agreed levels and there could be performance bonuses. Either approach—carrot or stick—is common with Managed Service Providers (MSPs). As with network crashes, the likelihood of a service being completely down is relatively small but even intermittent interruptions can cause huge penalties come billing time.
Case Study: SLAs and Operational Expenses
Both SLA penalties and operational expenses were the concern for a national MSP as they were growing and winning new customers globally. With their current management toolset and methods of work, one network engineer would be required for every 10 new customers added. Increased costs for staffing, training, integration, operations, and maintenance also resulted. Although they used several management tools, silo-based management was increasingly ineffective so they sought a solution to simplify and unify infrastructure monitoring with a central database provided by a single network monitoring, reporting, and capacity planning tool.
“Adding this Hardware Will Fix Your Capacity Issues. Trust Me”
When running into capacity issues, it’s common for people, including in the IT world, to rationalize the need for more. If you’ve used a business laptop for the past several years, you’ve likely had your hard disk size at least double every two years but are still running out of room. The generous 500 GB drives of a few years ago have given way to configurations commonly including 2 or 3 or even 4Tb drives.
And so it also goes with network capacity: devices are getting faster and able to support more bandwidth. If your company has been through a network upgrade, you have firsthand knowledge of how Moore’s Law applies to network devices and the dizzying array of new equipment that makes it to market each year.
For any network of more than just a single segment, the choices can be so plentiful that the hardware vendors will usually offer assessments to help you choose the correct equipment. Without a doubt, they’re experts at their product’s capability and the optimum component for a node, but without circuit level visibility to your current capacity, you could be buying more than you need.
A healthcare provider with a network covering their regional campus was caught unaware when a seemingly small fan module failure necessitated a $10,000 rush order switch purchase.
No stranger to network management solutions, the healthcare provider had several staff network engineers who were familiar with a low cost, red light / green light type network management tool-set. Based on their success with that tool-set, they even upgraded to the vendor’s “small to medium enterprise” solution. The distinction of “enterprise” referred only to scalability, however, not enterprise class functionality. In conjunction with well-defined operations processes, their network generated very few trouble tickets. Unfortunately, without the visibility into spare capacity that higher order management systems provide, they were gradually losing headroom with the addition of each new end user or server.
The final “straw that broke the network’s back” and brought their capacity problem to the fore was a ball bearing failure in a switch fan. Their network management solution dutifully alerted them to the failure and they dispatched a network engineer to address the issue. Their plan was to reroute traffic from the other ports on that device to another switch in the same rack until the new fan module could be installed after it arrived in 3 to 5 business days. But when she reached the network closet across campus, the engineer noticed that there were only two available ports in the second switch, far fewer than the 24 that were required.
Case Study: Automated Inventory Reduces Maintenance Costs
Implementing ITIL best practices to reduce costs and achieve proactive network management was the goal when a large consumer co-operative evaluated and chose Entuity as their core management solution. They had been using Entuity for several months improving operations and business service delivery when it came time for their annual network hardware maintenance review. As they had done in the preceding years, one hardware vendor delivered a proposal, including a list of the equipment with serial numbers tallying devices and costs.
Rather than blindly accepting the vendor list as accurate, operations personnel at the co-operative now had the unique historical context of Entuity’s automated and continual discovery to provide independent confirmation. With Entuity, as devices are added, removed, or changed on the network, details are captured and stored in the Entuity CMDB. Operators can also note details of any changes to devices, keeping accurate documentation for compliance and validation, even for devices removed from active use. Armed with finegrain visibility of their network, the co-operative was able to discontinue annual maintenance on devices no longer in service, saving $50,000 on annual maintenance costs and nearly eliminating the operational burden of preparing for the meeting.
Supply Chain Domino Effect
Internet connectivity has not only made the world smaller by enabling data and voice to traverse the globe nearly instantly, it has provided a ubiquitous framework to tightly link supply chains of retailers and manufacturers worldwide, enabling efficiencies previously believed unattainable.
For all its benefits, the integrated supply chain is not without risk. While riding the “cutting edge” of technology, a disruption, outage or slowdown in any service along the chain can have very real and costly effects.
Case Study: Supply Chain at Risk
A retail distributor implemented many of the same supply chain initiatives when it began providing perishable food products for their customers. Network connectivity between their suppliers and their warehouses was provided by a top-tier ISP who guaranteed 5 hour turn-around time on all IT issues. At $20,000 per month, the service was expensive, but around the clock connectivity was required to keep the supply chain working efficiently.
Although the ISP had provisioned a network monitoring tool in each of the warehouses, it lacked visibility into the routing protocols used between the multi-homing networks and lacked the diagnostic functionality to determine the true cause of abnormalities. So when the network went down, the control systems for picking, packaging and shipping went down. Perishables expired and store shelves went empty, resulting in lost sales and revenue. Over the course of several months, the distributer tallied losses from repeated service outages at more than $500,000.
One global enterprise software manufacturer Entuity worked with initiated a network upgrade program to account for their explosive growth. The program was divided into two components: one to select hardware, the other to select a network management solution capable of handling continued growth. After an exhaustive evaluation, the company chose Entuity as best suited to their environment and began to implement it in a controlled environment. During the same period, they selected a hardware provider who surveyed their network and recommended $1.5 million in network equipment.
Before placing the final hardware order, network engineers used Entuity to analyze their current production network, finding excess capacity previously unknown. Entuity’s unique Spare Ports Report clearly identified ports that may be physically filled, but have not had any utilization for a selected period of time. With a few alterations to their existing circuits, the company was able to eliminate half of the proposed hardware devices and still provide the scalability required for years to come. Their use of Entuity not only gave them the visibility to save $750,000, but also raised their NOC’s credibility with senior management and their investors.
The Flip Side of Capacity
While excess capacity can certainly be inefficient and reduce the return on hardware investment, running with little or no capacity can be downright dangerous and can lead to sizable unplanned expenses.
Maintaining What You No Longer Use
Most businesses understand that the additional cost of a support contract is well worth the investment, particularly for business-critical infrastructure resources. Maintenance payments keep the hardware and software resources up to date on the latest additions and security patches while also ensuring quick service should any issues arise. But even with a structured inventory process, the dynamic nature of today’s networks makes it is easy to lose sight of exactly how much equipment has been deployed throughout the organization, where it is located, or whether it has been retired. The hardware vendors will certainly provide customers with a list of what they’ve purchased when it comes time for maintenance renewal, but without an automated inventory accurate to the minute, those customers could easily be wasting their maintenance monies.
It is a well-documented chestnut in IT management that over 80 percent of all IT problems are a result of change. As networks have grown in complexity and in strategic importance to businesses, visibility into the change alone is no longer sufficient, since the downstream effect of the change can be far more detrimental. Sometimes even the most insignificant changes (done by technically competent individuals) can have far-reaching and costly consequences.
Case Study: Spanning Tree Visibility Prevents Disaster
To avoid the arduous trip back and forth between his office and the staging room for the deployment of updated laptops, a junior systems engineer at one financial services company installed an inexpensive un-configured switch in his office. With a few more ports, he could install all the laptops with the requisite business applications, OS patches, and security fixes, suffering through all the inevitable reboots that go along with the process. At the same time, he could keep up with the never-ending stream of support calls. Unknown to him, however, the spanning tree root bridge in the datacenter where he was situated wasn’t explicitly set as master, and the smaller, slower, inexpensive switch elected itself as root, causing a massive performance problem felt throughout the company. Fortunately his peers in the data center were using Entuity to ensure optimum performance of their network.
Where other commercially available tools don’t even recognize spanning tree status, Entuity proactively gathers spanning tree details including root status. Entuity detected and alerted operations personnel by email of the change, and they were able to correct it immediately. Entuity’s unique web-based user interface graphically shows topology and connectivity status and can filter to isolate spanning tree status to speed troubleshooting and repair. Without Entuity, a seemingly innocuous, undocumented change flying in under the radar could impair company productivity for days or weeks.
What is the value of your company’s good name? What if your company ended up on a blacklist without your knowledge? These days it’s hard to decide who’s out in front between the antivirus and security solution providers and the hackers. Companies are under constant threat, and sometimes even an up-to-date defense can fall short. As much as business users complain when email is down for a few minutes, the CEO is bound to be more than a little bothered if his company’s domain name ends up on a blacklist and knows that it will take weeks for the appeal process to reach a conclusion.
Case Study: Network Under Attack
Who would ever want to attack a small Midwest sheet metal manufacturing company? That’s exactly what the network operations folks at this company thought before they found out the painful truth. The company had been using open source monitoring tools to manage their network but were looking for an automated and integrated solution to eliminate much of the manual steps still required by that tool-set.
During the first night of their evaluation, Entuity captured and alerted them to the fact that their Exchange server was operating at 100% CPU utilization between the hours of 1:00–2:00 AM, bringing immediate attention to a potentially disruptive situation. Since no personnel were typically in the data center at this time, their current red-light/green light monitoring would have never seen or caught this issue. With Entuity’s historical database and Report Server, a graphical trend report clearly identified the issue and sped the way to isolating a Trojan program using their Exchange server as a spam-bot. The malware was set to only run after hours to avoid detection. With Entuity keeping watch 24X7, there are no after hours.
Management You Can Trust
Sound business decisions must be made with accurate data delivered reliably. Incorrect management information is worse than no information at all.
Case Study: We’re Being Robbed!
A network manager at an insurance company was working after hours one evening when he suddenly started receiving a number of events from Entuity telling him that his fiber cards were failing and his MSFC’s weren’t responding. First there was one, then another, and another, and another. Within minutes, a whole series was down.
Since Entuity had never steered him wrong, he ran down to the communications room to find that someone had broken in, was cutting the cables, ripping the cards out of the running racks, and stuffing them into his backpack— they were being robbed! Reducing the risk of theft is not typically a reason why customers implement a network management solution, but this customer was certainly pleased that it worked out this way.
Regardless of the industry segment or the size of the company, the network is a critical strategic resource delivering business services to power profitability. From trading floors where millions of dollars per minute electronically flow with each transaction, to SMBs where Internet orders drive profitability, a network outage or performance slowdown could be extremely costly, if not deadly. Many organizations have realized (sometimes the hard way) that network management is a “must have”, not merely a “nice to have” technology. One that needs to be continually reevaluated in light of new advancements in network technology.
In each case presented here, the exclusive combination of advanced network management capability (coverage) in Entuity enabled network professionals to monitor and control the health of their networks 24×7, giving them the proactive diagnostic capability to keep their business in good health and prevent catastrophe.
With new insight provided by Entuity, companies reduce the risk factors lurking just below the surface of their networks, giving them the remedy to keep their network from becoming toxic to their business. By reducing the ever present risk of network failures, the opportunity for increased operational efficiency can be realized.
Entuity takes the work out of network management. Our highly automated, unified enterprise-class solution puts deep network insight at your fingertips, frees IT staff to focus on strategic projects and easily integrates with major frameworks and networking environments. Entuity’s support and services teams are frequently praised for their rapid response, networking expertise and involvement in special engagements. Founded in 1997 by two senior-level IT executives from the financial industry, Entuity is headquartered in London with US operations in Boston.