Cooling Loads, Human Factors Impact Legacy Data Center Effectiveness
FMs must make sure legacy data centers can keep up with new cooling and staffing demands.
Many issues in a data center revolve around cooling. Energy costs are a major concern. One way to improve energy efficiency is to raise the return temperatures. “Raising the return temperatures of your cooling units raises the efficiency of that cooling device,” points out Smith. “Separating supply and return streams as cleanly as possible accommodates that.”
Another common challenge is getting cooling where it is required to prevent hot spots. One sure sign of an airflow issue is the use of portable fans — a stopgap measure to push cool air to or warm air away from a problem area. When facility managers consider permanent solutions to the issue of hot spots, they should pay attention to the overall impact of those solutions. For example, simply adding another CRAC unit to increase the volume of cold air going into a data center “is not energy efficient and is sometimes counterproductive,” says Michael Fluegeman, director of engineering and principal of PlanNet Critical Facilities.
Another strategy is to use booster fans in floor or ceiling tiles or in chimney cabinets. But there are challenges with booster fans. “Booster fans may add failure points and are difficult to make fully redundant,” says Fluegeman.
In raised floor environments, the floor itself is a crucial element of the air distribution system.
“Leaks in the floor waste cooling capacity by allowing air to short circuit directly back to the intake without ever doing any work,” says Smith. In addition, such leaks lower overall static pressure, reducing the ability to deliver cool supply air to the equipment mounted at the top of the rack, points out Smith.
Many legacy data centers are supported by power and cooling equipment that also may be used by office or other functions, notes Fluegeman. “Reconfiguring or recircuiting non-data center loads, to get them off data center support systems, not only frees up capacity for the data center but also makes it more reliable.”
To eliminate airflow choke points, Manula recommends that facility managers consider employing a cable remediation program to examine what’s under the flooring. Discarded cabling and other hidden surprises underfoot may be blocking air flow. Manula also stresses the importance of proactive rather than reactive maintenance programs. Facility managers may want to consider replacing critical equipment before it fails, he suggests.
The human factor
Most downtime events in legacy data centers are the result of human error, according to David Boston, national director, global critical environments team North America, BGIS. “Perhaps the greatest challenge FMs face in addressing this is justifying staff size and a shift coverage plan which allow them to mitigate risk,” says Boston.
To justify appropriate staff size and shift coverage, facility managers need to show the boardroom how the avoided cost of a downtime event compares to the investment in human resources needed to prevent one.
Boston says that when operations are truly critical, at least two people need to staff each shift, 24x7, which typically amounts to 11 to 12 staff members.
“Nine is the minimum to cover all the shifts (over 7 days and three shifts),” says Boston. “Then you need one to be available to fill in for those on vacation, sick time, etc.” The other one or two positions should focus on procedures and training programs for the group of shift technicians; they also take care of special projects, upgrades, and similar responsibilities so as not to take the core group of shift engineers away from their daily preventive maintenance and task work.
“For cooling system and generator incidents, this level of staffing and shift coverage will often result in an incident being resolved before it causes downtime,” says Boston. “For electrical incidents, conditions can be stabilized more quickly and restoration of service begun more expeditiously.”
It’s not just the size of the staff that counts. “Take a look at your staff and see if they need more training,” suggests Manula.
Proper staff size facilitates education. “In an optimum critical operation, every team member spends equal practice time performing every system transfer and in simulating every emergency response,” says Boston. “In this way, they are each prepared for a confident response to an emergency. And, with two people present on each shift, they can consistently employ the ‘read and repeat back’ (pilot/co-pilot) process when executing a procedure to avoid skipping or duplicating a step.”
Boston estimates that critical facilities operations require about 50 to 75 emergency response procedures as well as 75 to 100 system transfer procedures annually. (System transfer procedures are those that isolate a building system before maintenance or repair and then restore it to service afterward.)
If the downtime cost does not warrant 24x7 coverage with two people per shift, Boston suggests that a single day shift operation is appropriate. However, with just five to seven team members, extensive procedure development and training programs will have to be foregone. And the organization will need to accept delayed responses to incidents that occur during evenings and weekends, according to Boston.
Rita Tatum, a contributing editor for Building Operating Management, has more than 30 years of experience covering facility design and technology.