How Good is Your Hospital’s Data Center Reliablity?
By R. Stephen Spinazzola - May 2008 - Data Centers
The data center is commanding new respect. Nowhere is this more evident than in the health care arena. As technology enables data to be transmitted in real time to an integrated operating room, patient’s bedside or physician’s office, IT infrastructure is becoming a more critical link in basic care delivery. In fact, once a hospital makes the leap into digital imaging, digital pharmacy, clinical communication systems and digital medical records, it’s a new ball game. As a result, today’s business model for health care data centers is undergoing rapid transformation with immense implications. Unfortunately, technological advances are happening so quickly that some health care providers may be unknowingly at risk.
IT departments in many health care organizations are actively assessing opportunities to deploy more clinical systems and employing new options like virtualization, a process that allows one server to replace two or more existing servers to run multiple applications. But is this enough? With patient safety and data center reliability becoming inextricably united, developing a data center strategy to accommodate projected demand is fast becoming a top priority. Four primary issues are driving the need to re-evaluate an organization’s data center infrastructure:
1. Consolidation. The pendulum is swinging back to consolidated/centralized computing. Many institutions are realizing the cost to support the mechanical and electrical infrastructure for distributed/de-centralized computing is prohibitive.
2. Compaction. New computing equipment is constantly being deployed either to replace legacy equipment or to support new missions — and both scenarios can cause problems. New equipment is usually more compact, which allows more updated technology in the same cabinet space. This creates power and cooling issues that can affect the function of the entire data center. As a result, if scalability wasn’t designed into a data center, supporting more applications can reduce the infrastructure’s reliability, although it may appear that sufficient capacity exists.
3. Clinical Computing. Today’s data center has leapt into the “clinical” world. Picture archiving and communications systems (PACS) — also known as digital imaging for MRIs, x-rays, and CAT-scans — generate large image files. They must be stored in the data center and then communicated to the clinician for diagnosis and to the operating room during surgery. With the addition of digital pharmacy and clinical communications systems, instantly a greater level of reliability is required, Tier III or IV, for example.
4. Disaster Recovery. The deployment of clinical computing systems has changed the way hospital management should look at disaster recovery. Because clinical computing systems need synchronous (or near synchronous) recovery, continuing to outsource disaster recovery introduces difficulties that can interfere with the need to get the data center back online almost immediately. Levels of disaster recovery should be considered as part of a comprehensive disaster recovery plan, which can be quite lengthy. That’s because, in theory, each application has its own disaster recovery plan.
Higher Tiers of Reliability Needed
Not all data centers are created equal. Multiple levels of reliability are available, and the level of reliability required depends on the specific business mission a data center must support. Of course, moving from one tier of reliability to another brings significant increases in design and construction costs.
The standard benchmark used throughout the industry is the tiered classification system developed and defined by The Uptime Institute, a research entity dedicated to providing information and improving data center management. Over the years, the Institute has created a standard measure for data center reliability based on a series of tiered benchmarks. In essence, this four-tier classification system defines the degree of reliability built into the mechanical and electrical infrastructure. These varying levels are typically based on “Need” or “N.” N represents the quantity of components necessary to support the mission. A good analogy is the number of tires on a car. It “needs” four tires, but with the spare tire, there are five. Thus, this system is referred to as “N+1”. (See “Levels of Reliability” on page 76.)
Not all data centers are as reliable as the organization needs them to be. Just as with other building types, problems typically occur when a data center is asked to be more reliable than it was originally designed to be.
Existing hospital data centers are a prime example. Most were originally created to support non-clinical functions such as payroll, insurance processing and eventually some basic patient information. Data centers with this business mission can be supported with a Tier I or II facility. There are costs associated with an outage, but the benefit gained by spending more to increase reliability did not exist previously.
Today, data centers are becoming critical to the actual well-being of the patient. A Tier III or IV facility is a must to mitigate the risk. The days when a good uninterruptible power supply (UPS) and generator would avert disaster are long gone.
One approach for improving reliability is the use of two data centers: A new data center — “Site A” — handles clinical and non-clinical applications, while the existing data center — “Site B” — is used for disaster recovery.
For disaster recovery, a three-tiered model is being developed to address the various levels of mission support deployed in a modern health care data center. These are:
Level 1: Critical clinical applications that require synchronous or near-synchronous computing.
Level II: Near critical applications that can be supported with a two-hour downtime.
Level III: Non-critical applications that can be supported with an eight-hour downtime.
Most health care facilities have little room on-site to accommodate the needs of a new, modern data center that will support all three levels of applications, which leads to a move off-site.
Key features that a modern “Site A” data center must include are:
• Concurrent maintenance. This is the ability to maintain any component in the mechanical and electrical distribution system without an outage. Once the data center goes clinical, it is linked to the patient 24x7. There are no nights and weekends for the maintenance staff to shut down the systems in order to do repairs.
• Scalability. This is the capability of being easily expanded or upgraded, on demand, without an outage. A well-designed data center must be scalable because of its functional nature.
A Successful Mission
One institution that completed a mission critical data center is Johns Hopkins Hospital, one of the world’s largest health care institutions. Its new mission critical data center addresses the need for consolidation, the deployment of clinical computing systems, compaction and disaster recovery.
The planning process determined the need for a new Tier III, Site A facility capable of supporting 100 cabinets at 2 kW per cabinet and scalable to 200 cabinets at 4 kW per cabinet. The existing on-site data center was converted to Site B. The information systems mission for the data center was to increase its capabilities from insurance processing and financial accounting to the ability to include PACS, digital pharmacy and digital communications systems. As a result, the design is based on a mechanical and electrical solution that meets short-term requirements but is expandable, without an outage, to accommodate future needs.
Under the direction of Mary Hayes, Johns Hopkins Hospital director of data center services, data center requirements were viewed from a business perspective. This allowed the team to focus the design on critical elements over the lifetime of the facility and avoid costly and unneeded data center capabilities from day one. This also eliminated significant future retrofit expenses by including components that allow for a cost-effective expansion. For example, the UPS system is designed so that it can be expanded without an outage. This required the space to be built for the future UPS up front, but the cost for the future UPS module was deferred until it was needed.
Clearly, expanding a hospital’s data center is a complex challenge. The right solution relies on communicating with IT, understanding the mission that needs to be supported, finding the right real estate, determining the best place to house the systems and facilities, and determining what mechanical and electrical systems are required. Once all this is done, the result will be a data center that predicts and addresses the institution’s needs — even before they are discovered.
Levels of Reliability
Data Center reliability “tiers,” or classifications, are defined at four levels by the Uptime Institute. Although the percent of variance may seem minor, the impact on annual anticipated down-time is significant.
Tier I: Single path for power and cooling distribution, no redundant components, all systems are “Need” or “N.” This consists of a single utility feed for power, a single uninterruptible power supply (UPS) and a single back-up generator. The mechanical systems do not have redundant components, and maintenance of mechanical and electrical systems requires an outage. The result is 99.671 percent availability with an annual anticipated down time of 28.8 hours.
Tier II: Single path for power and cooling distribution, redundant components. A Tier II electrical system is similar to Tier I, with the addition of N+1 components for UPS and generators, and N+1 components for mechanical systems. Maintenance of mechanical and electrical systems requires an outage. A Tier II data center can provide 99.741 percent availability with an annual anticipated down time of 22.0 hours.
Tier III: Multiple power and cooling distribution paths, but only one active, redundant component, concurrently maintainable. This is similar to Tier II with the addition of a second path for power and cooling. For electrical distribution, this can translate into dual-corded (two power cords) electronic equipment connected to two separate UPS systems and two emergency generator sources. Mechanical systems would have two paths for chilled water. Maintenance of mechanical and electrical systems can be accomplished without an outage. This level will see 99.982 percent availability with annual anticipated down time of 1.6 hours.
Tier IV: Multiple active power and cooling distribution paths, redundant components, fault tolerant. This is similar to Tier III with the ability of the systems to have at least one worst-case unplanned failure or event and still maintain operation. This is accomplished by having 2 (N+1) systems with two active paths, which provides 99.995 percent availability with annual anticipated down time of 0.4 hours.
Learn more about reliability tiers, visit the Uptime Institute’s Web site.
The Two Data Center Plan
One business model emerging today for hospitals involves the use of two data centers. An off-site data center houses all clinical and non-clinical applications (Site A). The existing on-site data center is then converted to serve as the primary disaster recovery site (Site B). Because the computers and applications are being replicated at two sites, the tier level of Site B can be reduced, as the risk is being addressed at the application level.
R. Stephen Spinazzola, P.E., is the vice president in charge of the Applied Technology Group at RTKL, a large architecture and engineering firm. Based in the firm’s Baltimore office, the Applied Technology Group provides integrated architecture and engineering services demanded by mission critical, technology-intensive operations on an international basis.