Engineering Uptime to Keep You Running at Peak

7 Ways to Prevent Outages Based on Findings from the Uptime Institute Survey

Written by Stephen Vechy | Tue, Nov 24, 2020

While outage information is always difficult to obtain because companies keep failures close to the vest, the recently released 2020 Uptime Institute Global Data Center Survey identifies some alarming statistics. According to their findings, not only do outages occur frequently, serious outages are having more damaging and costly consequences. 

While the survey shows that data center processes have improved, gains in efficiency have been offset by the complexity of maintaining more sophisticated systems. In short, avoiding downtime is still a major challenge for owners and operators.

Uptime’s latest survey found 78% of companies had experienced an outage in the past 3 years. Most strikingly, about 20% of organizations had a “serious or severe outage,” defined as being costly or damaging the business’ reputation. Sixteen percent of outages cost more than $1 million. That was up from 10% in the previous year, indicating that outage costs are skyrocketing. Since the Uptime survey showed only half of the companies even attempted to quantify the cost of an outage, millions of dollars likely went uncounted.

What may be most surprising is that 75% of managers and operators said their most recent downtime was preventable. That means that most outage costs -- not to mention, the associated reputational damage -- didn’t need to happen.

So why do these preventable outages happen? Uptime didn’t delve into the causes in this survey, but suggested that upfront investment in management, process and training would reduce outages significantly. Given that so many outages were considered preventable, it’s probably safe to assume that operators are stretched too thin or are prioritizing the wrong problems. 

Uptime also dug into the primary cause of major outages and found that on-site power problems were the single biggest cause of outages. (See figure below.)  

These findings suggest that power might be the smartest place to prioritize when examining how your center can create an effective plan to reduce outages. 

What might an outage reduction plan entail? Here are seven steps we’d recommend:

1. Quantify the cost of your outages - If your data center falls into the 50% not assigning costs to outages, it’s time to mend your ways. If you can demonstrate how much outages are costing your company, you’ll have an easier time getting the attention of senior management when it’s budgeting season. After all, if there’s no data, there’s no budget to fix those outages. 

2. Invest in management, process, and training - As the Uptime Institute suggests in their survey analysis, the fact that 75% of outages are preventable indicates most companies are falling short in these areas. While many companies are reluctant to invest in upfront costs, allocating budget to these areas is likely to prevent much more expensive losses. 

3. Prioritize your plan by outage cause - If you aren’t sure where to begin, create a plan of “outage attack” ranked by the likelihood of an issue occurring. For example, power is the #1 cause of outages, which makes it a logical starting point. First, examine your critical power system and find out what’s lacking, if anything. Once you’ve corrected the most glaring deficiency, address the second most frequent issue -- and so on.

4. Design a critical power system based on data - When you create an “outage attack” plan, we’d suggest truly assessing the effectiveness of your critical power system. The first step should be a power assessment that measures your usage needs and uses that data to crunch the numbers. Anything else is a gamble -- and one that might be a losing bet.

5. Ensure your resilience plan is up-to-date - We get it. You’re busy. However, you’ll be busier if you’re scrambling because your resilience plan didn’t cover a particular scenario and you’re struggling to get back online. A resilience plan not only builds in any necessary redundancies, like critical power, so any downtime has no real effect, it also allows you to plan upgrades, migrations, and maintenance -- which segues into our next item.

6. Preventative maintenance - This is not the place to skimp on time, effort, or money. Too many data centers put off preventative maintenance, resulting in greater downtime and cost in the long run. We’ve heard many stories of this happening more frequently during the pandemic, as delayed preventative maintenance causes cascading results. 

7. Trusted partners - Running a data center requires an array of increasingly varied technical skills. If you don’t have all the skills or resources you need in-house, outsourcing may be the answer. It allows you to partner with specialists in any areas that need supplementation. 

While some outages may be inevitable, downtime and major issues are not. Remember, operators said 75% of outages were preventable, so ensure you’ve taken the necessary steps to costly mistakes and reputational damage. Both your success and your data center’s success depend on it.