« Back to Facilities Management Data Centers Category Home

Best Practices for Data Center Disaster Recovery

By Mark Rentzke

Today’s mission-critical data centers are designed and built with the mindset that failure is not an option. Data center operators spend their days (and often sleepless nights) worrying about the availability of their data center and work around the clock to ensure 100 percent uptime. They build in redundancy to minimize risk, rigorously plan and test to ensure continuity, and take precautions to physically protect their infrastructure from environmental threats. However, as we’ve seen over the last several weeks with some of the most severe weather events in recent history, even the best laid plans and preparedness measures can go awry when pitted against the wrath of Mother Nature.

Truth be told, there are some extreme situations where advanced planning and preparation can only do so much, and disaster recovery becomes an integral part of ensuring businesses can remain functional while communities are rebuilding.

In any disaster situation, time is of the essence so data center staff need to be aware of the appropriate actions to take in the minutes, hours and days immediately following an event.

Staff safety is first and foremost the most important factor, so once it’s been established that all personnel are safe and accounted for, it’s time to begin implementing the following recovery activities:

• Proactively monitor all critical equipment to search for hidden issues – Look at equipment directly affected by or exposed to the events such as generators, oil and filters, etc. For instance, in situations where recovery efforts may take days or possibly weeks, it will be critical to monitor fuel levels and understand how long generators can run without refueling. In a natural disaster emergency, hospitals, first responders and other public-safety organizations will be first in line to receive fuel supplies, regardless of whatever fuel delivery contract a data center already has in place.
• Increase site walk throughs for a specified period of time – This will help staff discover any ongoing deficiencies such as water leaks or wind damage. It’s also important to ensure each employee has a partner during his or her walk throughs to provide back-up and assistance.  
• Ensure staff rotation to relieve stress from onsite staff – To provide sufficient rest and recuperation, businesses can look to possibly recruit staff from other sites that have not been impacted or entrust a third-party vendor to manage part of the recovery efforts.
• Establish a communications protocol to provide critical updates and onsite feedback – Host regular briefings to keep staff and management informed of any data center or site impact after verifying all areas and systems. In a disaster recovery situation, everyone from the CIO and down the reporting chain will have a role to play, so it will be essential that all involved parties are kept up to date of what’s happening in real time.
• Monitor incoming power – Electrical power distribution and quality can deviate significantly during a storm or outage as nearby systems are cycled on or off, so it’s important to get in touch with the utility company to understand a reasonable time when it will be safe to transfer off of the generator and back to the utility.
• Apply lessons learned straightaway – As time permits, immediately begin compiling and documenting lessons learned and best practices to prepare for the next event.

While natural disasters are an unavoidable risk, with the right disaster recovery plan, data center operators can minimize the impact of these catastrophic events to quickly and safely restore operations.

Mark Rentzke is a senior manager for the global data center operation services team at Schneider Electric. He has more than 25 years of experience in the data center industry.

Contact FacilitiesNet Editorial Staff »  

posted on 10/4/2017