New Content Updates
Educational Webcast Alerts
Building Products/Technology Notices
Access Exclusive Member Content
By Terrence J. Gillick
August 2014 -
Data Centers Article Use Policy
Human error is the root cause of 75 percent of all critical facilities failures. As the complexity of critical facilities has increased to reinforce fault tolerance and concurrent maintenance, it is essential to develop equally robust operations and maintenance (O&M) practices.
There are an infinite number of potential emergency scenarios reflecting some combination of human error, utility, or automation failures. Failures in the building automation system are among the most common automation failures, and these have potentially disastrous consequences. Consider the cascading effect of the failure of the BAS that operates the main mechanical plant. A loss of the mechanical plant serving a densely loaded data center can lead to overheating in the raised floor space in minutes.
Utility failures are another common cause of critical facility "meltdowns," for example, a failure of the main utility feeds, either individually, sequentially, or simultaneously. If there is a utility failure, in turn, the uninterruptible power supply system may fail to accept the critical load. If operators have been inadequately trained or do not have well-defined procedures to intervene, human error will typically exacerbate the problem.
To reduce risk and ensure sustainable operations in critical facilities, owners and operations staff should be prepared at turnover with well-documented emergency operating procedures (EOPs) and completion of initial operator training. Unfortunately, these steps are often left until the occupancy phase of a project, leaving operations staff ill-prepared for an emergency situation, especially during the first year of operations and the transition from construction to operations.
In fact, comprehensive O&M services planning should be started during the schematic design phase. This will ensure that all procedures, standards, staffing, and training are established and documented prior to the transition and occupancy phase, including: processes/procedures and operational planning; equipment maintenance and standards development; operations staff evaluation and training development; and development of standard O&M manuals that include EOPs. A commissioning agent can develop and validate EOPs as part of construction commissioning activities at minimal additional cost if the process begins during schematic design when it can be integrated with other commissioning activities normally in the contract.
In the design and engineering community, it is common not to specify the content and format for the O&M manuals. What's more, equipment manufacturers often provide operating manuals for a range of equipment models as opposed to a specific piece of equipment. As a result, the O&M manuals are typically delivered as boilerplate documents rather than supporting owner-specific, site-specific equipment.
For the same cost and level of effort, an overall outline can be developed for the O&M manuals and steps taken to ensure that vendors meet these specifications. As the result of this effort every manual will be equipment- and site-specific, and it will contain associated safety procedures, training requirements, spare parts list, warranty, and all operating procedures — standard operating procedures (SOPs), maintenance operating procedures (MOPs), and EOPs.
As manuals are delivered, they should be verified to ensure they meet the bid specifications. That way, every O&M manual will be consistent in content and format — right down to the numbering system, section titles, and type of ring binder — in every location across the data center portfolio. This enhances usability and saves time, especially in an emergency.
At a minimum the owner and operations staff should have a "starter kit" of the "Top 10 EOPs" (or 15, or 20, as appropriate to the facility) at turnover. Certain EOPs are often overlooked in developing the list (designated by asterisks below). Why? Unless one has had experience with more unusual failures, it is difficult to imagine the unimaginable, for example, the need for black start.
— Terrence J. Gillick
Developing, Validating Emergency Operational Procedures Reduces Risk In Critical Facilities
Develop Site-Specific Emergency Operating Procedures For Critical Facilities
Training Helps Mitigate Risk During Critical Facilities Failures