How to Manage Critical Data Spaces
The three major facets of managing critical spaces: Clarify responsibilities, establish policy and protocol, and understand and manage critical systems.
The first step is to set expectations for uptime based on the tier levels developed by the Uptime Institute. Once expectations have been established, it is imperative to define the roles of those who will be ensuring the agreed-to uptime. The delineation of responsibility between IT and facilities is all too often blurred, allowing maintenance, replacement, and compliance requirements to fall to the wayside.
Dividing accountability at the rack appears to return the most success. In this approach, the facilities department is responsible for providing the needed data space, delivering sufficient power and cooling, maintaining the fire suppression system, and managing access (assuming access control resides in facilities). IT owns everything inside the rack. The racks themselves and power distribution units (PDUs or plug strips) are a toss-up that usually lands on IT. In-row cooling and rack-mounted UPS should fall to facilities, as the facilities department has the computerized maintenance management system in place to manage the assets, schedule maintenance, and plan for component replacement (e.g., batteries).
Responsibilities should be documented, perhaps in the form a service level agreement that clearly defines boundaries, access, design and construction practices, and operating requirements. The intent is to establish a clear, symbiotic relationship that promotes and achieves the organization’s expectations.
Establish policy and protocol
After responsibilities have been identified and agreed upon, policy and protocols must be drafted that govern access control, change management, and general operating practices related to the critical environment. Access control simply states who is allowed in the space, when, and for what reason; this includes rules regarding escorting contractors. Access control feeds into change management, which provides a clear communication channel for informing affected parties of planned (and unplanned) maintenance, upgrades, moves, and alterations. For example, facilities would submit a change request to perform upcoming annual UPS preventive maintenance; IT would use a change request to inform the stakeholders of a server replacement. The change management process identifies potential risks and related mitigation strategies, coordinates access, and provides an opportunity for stakeholders to ask questions to ensure bases are covered (e.g., placing the fire suppression system in bypass during an underfloor cable install). This level of communication helps to prevent mishaps, overbooking activity in a space, and stepping on each other’s toes. While this adds an administrative burden, any pain it creates is considerably less than the pain that would be felt from an incident that could have been avoided if protocol and process were in place.
Critical environments and their associated systems should be included in the emergency response management program. (Click here to learn about five keys to effective emergency plans.)
Understand critical systems
Now that the facilities department knows what they own and the requisite processes, it is crucial that staff understand the critical nature of these environments and how their actions can impact them. This includes educating the team on the established processes and three primary systems that support these spaces: electrical, cooling, and fire detection/suppression. Below, the components of each system and maintenance practices are briefly reviewed.
Electrical system: Working incoming utility power downstream toward the load, components include utility feed(s), transformers, main distribution, generators, automatic transfer switches, UPS, and PDUs.
Of these components, generators typically require the most maintenance out of these components, including weekly to monthly runs. Automatic transfer switches are generally maintained at the same frequency as the associated generator and by the same service provider. Ideally, automatic transfer switches are exercised with the generator runs; at the minimum, load should be transferred to the generated quarterly. UPS usually necessitate quarterly service visits, primarily to inspect and replace batteries (if needed). Other components require annual maintenance and inspection. Most prescribed service is performed by a qualified contractor.
Cooling system: There are a wide range of options to cool these heat-laden data spaces, from in-row cooling to high volume, wet-side economizer (no mechanical cooling) to the more common computer room cooling units (CRU/CRAH/CRAC). Cooling systems are often the Achilles heel for uptime, as redundant components are sometimes overlooked, or dual distribution paths inconspicuously merge for a discrete portion of the supply.
Data centers are, in general, overcooled, which costs more energy and increases runtime and wear and tear on equipment. A warm data center is acceptable, as long as temperature and humidity are maintained within the specifications of the hosted technologies (e.g., temperature ranges of 64 degrees F to 80 degrees F and relative humidity of 40 percent to 60 percent).
Given the number of moving components in a cooling system, it stands to reason that the maintenance requirements would be correspondingly higher than for electrical systems. In addition to daily rounds in the critical environments, cooling systems maintenance manuals usually dictate monthly inspections with preventive maintenance tasks occurring quarterly.
Fire protection systems: Fire detection and suppression can vary from one critical space to another. Smaller, less critical spaces may be covered by the building’s wet sprinkler system, while gaseous suppression systems are more common in smaller data centers, network rooms, or building main distribution frames. Rooms protected by gaseous systems typically have signage posted outside the room. To determine if gaseous suppression systems are present, look for cylinders (typically red) tucked off to the side of the room or in the corner (likely near a red or yellow control panel).
Large data centers can have gaseous fire suppression or dual-interlock, pre-action systems. The sprinkler pipes for a pre-action system are filled with air fed by a compressor. To activate, a sprinkler head must open, and the fire control panel must alarm, opening the water valve. Some systems may require sensors to trip from two different zones before opening the valve, allowing water to flow (one sensor would only initiate a trouble alarm.) Maintenance includes daily and weekly inspections with annual tests. NFPA 75 and 76 are good sources for code compliance and maintenance requirements. Note that it is important to put the fire control panel in bypass and disable emergency power-off interlocks prior to performing system maintenance, as no facility manager would want routine maintenance to drop power to the load or discharge the clean agent gas, which costs thousands of dollars to recharge; policy and procedures should be drafted dictating this practice.
Critical spaces scattered throughout buildings must be operated and managed with great diligence and deliberation. Assess these environments and talk with management, stakeholders, and IT. Then put processes in place to protect the department, the customers, and the organization.
John Rimer (firstname.lastname@example.org), CFM, is president of FM360, LLC. In more than 20 years of facility management experience, he has implemented and managed facility programs for companies such as Intel, Microsoft, JP Morgan Chase, and Charles Schwab.