Training Helps Mitigate Risk During Critical Facilities Failures
3. Comprehensive training mitigates risk during a failure.
Use EOPs as the basis for equipment-specific operator training before occupancy. Initial training should be completed during construction commissioning, videotaped, witnessed, and verified by a third party (usually the commissioning agent).
Don't stop there: continuous training helps mitigate 75 percent of failures attributable to human error. Drills, continuous operator training, and simulations within a live operational environment are critical to ensuring that operators will respond correctly in the event of a failure.
One global financial institution conducts an annual black-start drill in every one of its data centers, with detailed test scripts, training, monitoring, and restart back-up support.
Understandably, not every owner has the appetite for the operational risks involved in this process, even a well-managed one with expert back-up support. An alternative for the risk-averse is the use of computerized simulation programs to train operators on EOPs for the electrical systems and BAS.
4. Understand and include post-failure work process authorization.
Although many failures can be avoided using well-documented SOPs, MOPs, and EOPs coupled with continuous operator training, the reality is that no data center is immune to failure. Operations staff should be aware that restoration of normal operations following an emergency requires a rigorous work approval process and scheduling, which typically exceeds that for routine work orders.
In fact, most critical facilities are required to remain on the back-up system while operations engineers conduct an investigation to identify the nature of failure, ascertain its root cause, and develop mitigation and repair plans and schedules. Post-failure work process authorization typically occurs at a much higher organizational level and receives more scrutiny than a normal repair; for example, a critical facility may remain on emergency generator power for several weeks before the root cause of the failure is determined and work authorization process is complete.
By far the most challenging operational period in the life of a data center is the transition from construction to operations, typically, the first year after beneficial occupancy. The more O&M planning and documentation that can be completed before beneficial occupancy, the greater the value to the owner in terms of cost reduction and risk reduction. At a minimum, the owner and operations staff should have a "starter kit" of the "Top 10" (or 15, or 20, as appropriate to the facility) electrical and mechanical EOPs at turnover so they can do everything by the book in the event of a failure. (See sidebar.)
Owners who take a comprehensive approach to sustainable operations require that their project team complete O&M planning during design, validate EOPs (along with all other operating procedures) during commissioning, implement operating procedures during transitional operations, and then sustain operations with continuous improvement. At the end of the day, this approach reduces the opportunity for equipment failure, as well as human error, and increases response effectiveness in the event of failure.
Terrence J. Gillick is president of Primary Integration Solutions, Inc., an international mission critical commissioning firm headquartered in Charlotte, N.C., with offices across the U.S. The firm also develops and validates EOPs as part of construction commissioning, develops the outline for O&M manuals, and verifies that manuals meet the specification. He can be reached at firstname.lastname@example.org.