Data Centers: No Time for Downtime
A healthy enterprise data center operating at maximum efficiency has never been more critical than now.
Legacy data centers are energy intensive and, if not properly managed and maintained, their aging equipment can threaten reliability. To remain competitive, legacy data centers must balance the need for reliability with operational and energy efficiencies.
At one time, sustainability and profitability were seen as separate objectives. But that boundary is crumbling.
According to “The Future of Enterprise Data Centers – What’s Next,” published by Gartner, Inc. in 2019, investor, regulator, and shareholder demand is driving 78 percent of the Global 250 corporations “to integrate corporate responsibility into their financial reports,” writes analyst Henrique Cecci. “This represents an increase from 65 percent in 2015.”
Operating, upgrading, or renewing a legacy data center has risks and challenges, of course. But it also has some advantages over brand new centers, cloud, and even co-location options, provided building systems and equipment are not at the end of their useful lives. Power generation, uninterruptible power supply (UPS), fire suppression, and cooling equipment that are in good operating condition represent valuable assets that can be maximized.
Some building areas can tolerate reactive maintenance in on-going efforts to reduce expenses. However, experts argue a proactive approach is crucial whenever organizational data is involved.
“Maintenance, maintenance, maintenance. The number one thing you can do is maintain your gear,” says Steve Smith, director of physical IT networks at Arvest Bank Operations.
Smith believes so strongly in preventive maintenance that he’s switched from four cooling inspections annually to six, “simply to ensure we are staying ahead of potential emergency break/fix items and keeping the equipment operating at peak efficiency.”
For example, condensing units do not take long to get dirty. That dirt not only reduces efficiency by making the equipment work harder under higher pressures, it also can increase failure rates.
A common failure point is bearing failures in mechanical equipment, says Tim Kittila, director of data center practice at Parallel Technologies.
“Make sure bearings in all rotating gear are reviewed and checked,” says Kittila.
If the data center has dual-path power or a maintenance window, Kittila also suggests cycling them annually as part of routine maintenance.
“It’s the simple stuff that can take legacy data centers down,” he says.
Most facility managers already are attentive to main mechanical and electrical systems needs. Experts recommend that those managing legacy data centers pay special attention to the subsystems connected as well.
“When you’re cooling hard, you are by default also dehumidifying hard,” says Smith. “Paying attention to the humidifier system is as important as managing the cooling plant. They need each other to create a stable environment for your compute gear. They are dependent on one another for improved cooling efficiency and management of electrostatic discharge risk.”
Can operational reliability be delegated to others? The experts think that may not be the best option for most legacy data centers.
“Standard vendor service provider maintenance is often less effective than expected and may provide a false sense of security,” says Michael Fluegeman, director of engineering and principal at PlanNet Consulting, LLC. “Periodic performance testing (commissioning) is key, to make sure support equipment functions properly in all realistic operating scenarios, including partial failovers, bypass operations, and loss of redundant components.”
Charles Manula, integrated facilities management lead for government at JLL, stresses the importance of legacy data center maintenance, down to inspections underneath perforated floor tiles and server racks.
“Look under obstructions,” Manula says. “Clean under the flooring tiles and fix any holes.”
In tandem with maintenance, facility managers tending legacy data centers should document variances and equipment testing reports and make sure all operational procedures are kept current.
“Maintaining your documentation of operational procedures and keeping them up to date is critical,” says Kittila.
Performance data is also crucial.
“Always have a set of reliable metrics that are as up to the minute as possible,” says Smith. “If you don’t have the budget to build a dashboard, at the very least gather numbers and publish your status weekly. Establishing trends in critical power and cooling loads and of rack unit utilization as a percentage over time will help you plan expansions and upgrades and help validate that your maintenance program is working.”
Also, facility managers should stay abreast of factory change notices or recalls.
Operationally, experts recommend regular testing of failover and redundancy, such as monthly on-load generator tests. These tests ensure the legacy center’s assets are performing as expected. But, possibly even more important, the tests ensure that building operations staff is familiar with the process and procedures should a failure occur. This prevents them needing to follow a particular protocol for the first time under the stress of a crisis.
Staff members need to be as familiar with the equipment and procedures as possible — human response plays a significant part in the legacy center’s operating effectiveness.
“Human error is responsible for an average of 70 percent of downtime events in data center facilities,” says David Boston, national director, global critical environments team North America, BGIS.
Both facility managers and technology operations leaders should stress to management that the data center’s staffing needs as much investment as its assets.
Managers need to justify staff size and a shift coverage plan that mitigates risk. Boston insists that when operations are truly critical, at least two people need to staff each shift, 24/7.