Recovering from Sandy
Part 1: Q&A: Data Center Expert Chris Wade on Disaster Preparedness, Planning
Q&A: Data Center Expert Chris Wade on Disaster Preparedness, Planning
By Casey Laughman - November 2012 - Emergency Preparedness
In the wake of Hurricane Sandy, those facility managers fortunate enough to not be affected are checking emergency preparedness plans, discussing contingency options for similar situations and testing their facilities and staffs to see what works in reality and what only sounds good in theory.
In this Q&A, Chris Wade, principal, Resilient Critical Facility Solutions, offers advice on how to prepare a data center for an event like Sandy and how to make sure you can keep operating when it happens.
Q: What sort of preparation can you do in a data center for an event like Sandy?
A: You can’t wait ’til the sky is falling to start planning for man-made or natural disaster events. From a data center perspective, you must have a pre-defined disaster preparedness plan of action for events such as tornadoes, hurricanes, loss of power, fire, loss of communications, etc. Your business continuity plan should include emergency action plans and disaster recovery plans for the data center.
It is important that the proper steps are taken to ensure that the data center infrastructure support systems will stay up and running during an event such as Hurricane Sandy. It is crucial to ensure that your disaster recovery and secondary sites are available and ready to be on-line when needed. If you are using a hosting provider for your secondary or disaster recovery site, it is also important to validate that they have an adequate emergency action plan in place for disaster recovery. Many times, these types of locations are dark sites with minimal staffing. Make sure your disaster recovery plan includes a plan for ramping up the staffing at these sites to support your operations.
Here are some essential steps required to ensure a data center remains on-line and operational:
1. Ensure you can manage the load.
• Test all backup power systems (imitate a total power outage) and ensure the generators and the transfer switchgear operate properly. I have found that for most events, testing out emergency systems is the first priority. A best practice is to start with a power failure checklist. Always make sure the power system has been tested and that you can fail over to generators (or an alternate feed) for continuous operations at the facility.
• Contact your generator maintenance and service vendor to have a generator technician on site.
• Make sure fuel tanks are full, levels verified, and backup fuel vendors have been placed on standby in the event of extended power interruption. Generators eat up a lot of fuel when they are operating, and you need to know the generator burn rate (gallons per hour of fuel burned) to determine generator runtime so you will know when to call in fuel reserves.
• The emergency fuel provider should be notified immediately that you may require additional fuel, and it will be delivered within 24 hours of a call for service. It is a good practice to have a pre-defined agreement for emergency service with your fuel provider. (You won’t be the only one looking for fuel in an event of this magnitude).
• Many data centers have on-site fuel storage tanks that can provide fuel for an extended period (some can run on generator for 72 hours or more without refueling).
2. Have sufficient resources onsite and staged to operate without utility power for weeks if needed
• Additional data center engineers and technical support staff should be on-site throughout the duration of the storm to monitor operations and ensure infrastructure availability.
• Ensure your list of employee phone numbers is regularly updated and the required leadership or department head each has a copy.
• Make sure there is adequate food and water available for onsite staff in case the storm’s duration is extended or additional supplies are unavailable.
• The data center should have an emergency or disaster supply kits ready for emergencies containing items such as: food (canned goods, non-perishables, meals ready to eat (MREs)), water (one gallon per person per day), manual can opener and other eating utensils, personal hygiene items such as soap, deodorant, shampoo, toothbrush and toothpaste, toilet paper, first aid kit and manual, fire protection equipment or fire extinguisher, rainwear, gloves, inflatable mattresses or cots, blankets, towels, etc. These items should always be stocked and regularly audited.
3. Cancel all scheduled maintenance prior to and for the duration of the storm.
4. Review and confirm emergency action plans
• Review procedures with the staff on how and where to turn off the electrical power, water, gas, and other utility services within your facility at main switches.
• It is a good practice to be aware of your facility’s Federal Emergency Management Agency (FEMA) 500-year flood plain level.
5. Ensure your Emergency Communications plan is up to date
• Correct communication in any crises is critical to mitigating impending risks.
• Monitor the National Weather Service to have way of receiving reports of significant changes in weather conditions so you can keep track of the storm development.
• Make sure monitoring is set and ready to send alerts on any degradation or failures.
• Have a dedicated support services group ready to respond to any questions or concerns.
• Have several battery-operated radios, spare batteries and a weather radio available to ensure you can receive emergency information.
• Executive leadership and critical staff must remain in constant contact with the teams operating the data center throughout the event.
Managers Discuss Challenges In Preparing For, Responding To Crises
Facility Managers With Disaster Experience Say Careful Planning, Realistic Training Are Crucial
How To Define A Business Continuity Plan:
Developing a Business Continuity Plan with Sean P. D. Nelson, assistant director, facilities engineering with Johns Hopkins Outpatient Center in Baltimore: