Don't Overlook These Critical Data Spaces in Your Facilities
Even the smallest spaces that house critical data can't be treated like just another closet. Here's how to make sure critical data spaces stay operational and efficient.
Facility managers are often responsible for critical spaces that can range from a small converted closet that now contains a network rack to a very large data center that requires a map and column labels to navigate. Even the smaller spaces cannot be treated as any other closet or obscure room. In most cases, organizations are crucially dependent upon the 1s and 0s flying in and out of these rooms. While the facility department is not responsible for the IT equipment itself, it is typically held accountable for delivering adequate cooling, power, fire suppression, and security. That makes it important for the facility manager to understand the various systems typically found in these critical rooms, recommended maintenance practices, and needed processes to ensure business uptime.
Note that the term “critical environment” can include rooms or buildings that support high priority or dangerous processes, such as in a healthcare, industrial, and power generation facility. While the focus of this article is data centric spaces, the principles discussed can be expanded to support other critical environments.
Large data centers generally have teams dedicated to managing them and ensuring uptime. Many entities outsource such management to independent third parties that specialize in delivering the requisite services. The smaller footprints that are peppered throughout a portfolio are the ones that tend to fall through the cracks. Unfortunately, the slipping starts as early as the design stage, with the prescribed infrastructure not aligning with the expected uptime.
When asked about the organization’s expectations for uptime of these critical spaces, most facility staff will strongly answer that they can never go down, knowing that if they do, heads could roll. Sadly, the facility and IT departments are fighting an uphill battle that they will eventually lose — the odds are against them.
The Uptime Institute defined standards by tier level more than 20 years ago, and they still serve as a good baseline to calibrate reliability expectations against the supporting infrastructure. For example, a Tier I data center provides a single, non-redundant path for power and cooling needs (known as N). This generally includes an uninterruptible power supply (UPS) to handle short utility anomalies and a generator to provide power for longer outages. This bare-bones infrastructure is likely to support many, if not most, of a portfolio’s critical environments. A Tier I data center should deliver 99.671 percent uptime. While that sounds pretty good, it equates to 28.8 hours of downtime per year, which includes planned (e.g., for maintenance and upgrades) and unplanned interruptions. Inevitably, the downtime will occur in the middle of a busy day or during a crucial meeting resulting in lost productivity, revenue, or credibility. Is 99.671 percent uptime (28.8 hours downtime/year) acceptable to your stakeholders?
The next level is Tier II, which adds redundant components to the single path power and cooling (N+1). In theory, it should have redundant UPS modules, chillers, pumps, generators, etc. However, many intended Tier II data centers are closer to a Tier 1.5, as not all components are redundant (especially on the cooling side). Tier II should yield 99.749 percent uptime or, conversely, 22 hours of downtime per year; the return on the investment of upgrading from a Tier I to Tier II is not that significant: Uptime increases only 0.078 percent (6 hours less downtime per year).
Data centers that fall into Tier III or Tier IV require a far greater investment and promise uptime of 99.982 percent and 99.995 percent or 1.6 hours and 26.3 minutes downtime per year, respectively. The cause for this significant improvement versus Tier I/II is that Tier III/IV introduce a second power and cooling path. Each path is completely independent from the other and has redundant components (N+1), including dual powered servers and network gear. This allows concurrent maintainability, meaning a component can be taken offline to perform maintenance without impacting the supported load. (Note that risk is increased when a component is taken offline, as that redundant component is not available; thus, said maintenance is normally scheduled to occur after-hours, during low business cycles.) For Tier III, one path is active, carrying the load fully, and the second path acts as the alternate (the alternate is likely supporting its own load); thus, in the event of a primary path failure, the power or cooling load would switch to the alternate source. Tier IV edges Tier III, because its dual paths are active, in that each path is supporting the load (e.g., 50 percent load on path A and 50 percent on path B). In the event of one path failure, the other path would carry the full load. Tier IV data centers are very costly, so many data centers are built as a Tier III. In fact, some entities construct two Tier III facilities in lieu of one Tier IV due to cost and other factors.
This basic discussion of data center tier levels is meant to assist you with setting management’s and stakeholder’s understanding of the infrastructure required to achieve the desired outcome. It should also highlight the need for the facilities department to operate and administer these critical environments very deliberately to ensure the estimated uptimes are realized and not jeopardized.