Critical Facilities Summit
Firestone, click here...

How to Minimize Human Error, Prevent Data Center Downtime

By David Boston - August 2013 - Data Centers


Over the last 15 years, most building operators have come to recognize people account for the majority of interruptions to critical operations. Human error is identified as the root cause in 60 percent to 80 percent of data center downtime events, year after year. Infrastructure systems and component failures still merit attention, but today's rigorous design, construction, and commissioning practices generally provide an expectancy of smooth equipment operation for 10 years or more. Assuming your facility is provided adequate systems, redundancy, and capacity, more attention should be focused on successful operating practices to ensure human error potential is absolutely minimized.

A majority of building owners fail to develop and implement effective operating strategies. This is alarming, given the industry's awareness that people present the greatest risk. People are critical to successful building operations, to ensure regular maintenance is performed, customer requests are fulfilled, and to respond to unexpected system incidents. It is the facility manager's job to provide them the tools to be successful.

As a facility manager begins to implement (or enhance) the optimal facilities operations strategy, the first step is precise delineation of responsibilities between departments. Next, will be developing work rules unique to the facility and securing required executive endorsement. Once staff size and structure effectively match operations goals, annual objectives, and assigned ownership of systems, tasks and processes may be completed. With assigned owners and dedicated time provided for procedures and training programs, multi-month projects may be conducted to complete these objectives. Staff retention incentive plans may be developed simultaneously with the procedures and training program efforts.

Here's how facility managers can incorporate each of these components into their critical facility operations strategy.

1. Clarity of task and process ownership. In most facilities, multiple departments are involved in delivering services to the organization's end customers. Those who operate and install computer hardware, those who manage networks, the security team, and the facilities group are all present in a typical data center. These groups often occupy their own designated spaces where some of their tasks and processes are performed. When these areas are physically separate and secured, it is generally understood which department is responsible for functions performed within, making written processes less critical.

Written processes are much more important when addressing areas where personnel from multiple departments have access. In the case of a data center facility, the computer room is most critical. Tasks performed there present the greatest risk of error, because multiple departments are involved and a higher frequency of human activity occurs within the room.

To reduce the high potential for error when multiple groups work together in one space, it is necessary to develop written mutual expectations between the departments involved. Some organizations refer to these as internal service level agreements. The documents can be as simple as one page, but must be endorsed by each department head and be consistently enforced. (See "Example of Internal Service Level Agreement," right.)

A significant level of detail is needed in establishing "ownership" of key functions such as power distribution and master planning (the location of computer hardware devices for optimal cooling and performance). Without it, interruptions to the operation may become common. Interruptions most often occur when someone that does not have knowledge, training, and experience with the proper procedure attempts to install or remove a computer device.

Internal Service Level Agreement

Facilities Operations Commitments:

  • Ownership of electrical power path up to the remote power panel connections (to whips) - only three individuals designated for this work
  • Single designee to share computer room master plan ownership with IT counterpart
  • Escorts provided for any facilities systems contractors working in building
  • Monthly updates to IT on load vs. capacity for each infrastructure system
  • Shared expense and capital budget planning
  • Thorough methods of procedure prepared, approved, and rehearsed in advance of scheduled maintenance activities that will involve risk or reduced redundancy
  • Incident reports provided to IT contacts within 4 hours of any near miss or downtime event, utilizing a consistent format (follow-up report issued as root cause is identified)
  • Updates every 30 minutes when an unexpected facilities event is in progress

Information Technology Commitments:

  • Ownership of all network connections and all power connections within server cabinets - only five individuals designated for this work
  • Single designee to share master plan ownership with Facilities counterpart
  • Escorts provided for any computer hardware and network contractors working in building
  • Weekly updates to Facilities Operations on contemplated computer hardware additions
  • Annual updates to Facilities Operations on computer hardware long term strategy
  • Shared expense and capital budget planning

_____________________________     __/__/__

Information Technology Manager     Date

_____________________________     __/__/__

Facilities Operations Manager          Date





Comments


Browse Articles

On FacilitiesNet: critical facilities, data center reliability, data centers

FaciliyZone

Search for critical facilities, data center reliability, data centers articles on FacilityZone

Find us on Google+
Upsite Technologies


QUICK Sign-up - Membership Includes:

New Content and Magazine Article Updates
Educational Webcast Alerts
Building Products/Technology Notices
Complete Library of Reports, Webcasts, Salary and Exclusive Member Content



click here for more member info.