All fields are required.
Part 1: How to Minimize Human Error, Prevent Data Center Downtime
By David Boston
August 2013 -
Over the last 15 years, most building operators have come to recognize people account for the majority of interruptions to critical operations. Human error is identified as the root cause in 60 percent to 80 percent of data center downtime events, year after year. Infrastructure systems and component failures still merit attention, but today's rigorous design, construction, and commissioning practices generally provide an expectancy of smooth equipment operation for 10 years or more. Assuming your facility is provided adequate systems, redundancy, and capacity, more attention should be focused on successful operating practices to ensure human error potential is absolutely minimized.
A majority of building owners fail to develop and implement effective operating strategies. This is alarming, given the industry's awareness that people present the greatest risk. People are critical to successful building operations, to ensure regular maintenance is performed, customer requests are fulfilled, and to respond to unexpected system incidents. It is the facility manager's job to provide them the tools to be successful.
As a facility manager begins to implement (or enhance) the optimal facilities operations strategy, the first step is precise delineation of responsibilities between departments. Next, will be developing work rules unique to the facility and securing required executive endorsement. Once staff size and structure effectively match operations goals, annual objectives, and assigned ownership of systems, tasks and processes may be completed. With assigned owners and dedicated time provided for procedures and training programs, multi-month projects may be conducted to complete these objectives. Staff retention incentive plans may be developed simultaneously with the procedures and training program efforts.
Here's how facility managers can incorporate each of these components into their critical facility operations strategy.
1. Clarity of task and process ownership. In most facilities, multiple departments are involved in delivering services to the organization's end customers. Those who operate and install computer hardware, those who manage networks, the security team, and the facilities group are all present in a typical data center. These groups often occupy their own designated spaces where some of their tasks and processes are performed. When these areas are physically separate and secured, it is generally understood which department is responsible for functions performed within, making written processes less critical.
Written processes are much more important when addressing areas where personnel from multiple departments have access. In the case of a data center facility, the computer room is most critical. Tasks performed there present the greatest risk of error, because multiple departments are involved and a higher frequency of human activity occurs within the room.
To reduce the high potential for error when multiple groups work together in one space, it is necessary to develop written mutual expectations between the departments involved. Some organizations refer to these as internal service level agreements. The documents can be as simple as one page, but must be endorsed by each department head and be consistently enforced. (See "Example of Internal Service Level Agreement," right.)
A significant level of detail is needed in establishing "ownership" of key functions such as power distribution and master planning (the location of computer hardware devices for optimal cooling and performance). Without it, interruptions to the operation may become common. Interruptions most often occur when someone that does not have knowledge, training, and experience with the proper procedure attempts to install or remove a computer device.
Facilities Operations Commitments:
Information Technology Commitments:
Information Technology Manager Date
Facilities Operations Manager Date
Critical Facilities: 7 Steps to Minimize Human Error
Part 2: Operations Objectives Should Drive Data Center Staffing Decisions
Part 3: Develop Comprehensive Work Rules, Procedures To Minimize Human Error In Data Centers
Part 4: Site-Specific Infrastructure Training Can Help Limit Data Center Human Error
Part 5: How To Use Incentives To Improve Data Center Staff Retention