High availability in the data centre refers to systems and components that are continuously operational for a long time. It typically means the systems have been thoroughly tested, are regularly maintained and have redundant components installed to ensure continuous operation.

How does a data centre ensure reliable power? What level of redundancy is necessary for a high-availability data centre? These are two critical issues that weigh on the minds of data centre and IT leaders. Why? Because they understand that uninterrupted power is the lifeblood of their operations. And they know the devastating effects an unplanned outage can have on the well-being of their organizations.

Downtime can result from a power outage, equipment failure, natural disaster, human error, fire, flood or a wide range of other causes. It can lead to lost revenue, customers, productivity, equipment and brand loyalty. As a data centre or IT leader, your goal is to provide continuous operation of your facility under all circumstances. Many factors contribute to data centre reliability. People, processes and equipment all play a huge role in increasing availability.

Data centre managers address reliability by implementing many measures, such as hiring and training the right staff members and developing, implementing and testing proven procedures. They also make sure the data centre infrastructure has built-in redundancy and reliability—for power, network connectivity, fire detection, moisture detection, lightning protection, sophisticated monitoring systems, generator and UPS backup systems, fire-detection and fire-suppression systems, moisture-detection systems, and lightning protection.

To create higher levels of redundancy, for example, you can configure servers to switch responsibilities to a remote server when needed. This backup process is referred to as failover. Failover is a backup method that uses a secondary component to take over functioning whenever the primary component becomes unavailable. Secondary components can assume operation during scheduled maintenance or when an unexpected power outage occurs.

Failover techniques make systems more fault-tolerant and are necessary to ensure constant availability of mission-critical operations. When a primary component offloads tasks to a secondary component, the procedure is seamless to end users.

In addition to configuring failover components, high availability also involves good design factors. All aspects of data centre infrastructure must be evaluated for durability, beginning with a thorough understanding of each component’s metrics as published by the manufacturer, including capacity limitations and life expectancy.

Let’s examine three systems areas that data centre managers should consider when looking to improve reliability.


Redundant Systems and Components

Providing redundant systems and components can help eliminate single points of failure in the IT infrastructure. But each data centre manager must determine the appropriate level of redundancy for their operation. A thorough analysis is needed to arrive at an effective redundancy strategy.

Certainly, incorporating redundancy into a data centre operation is critical. Achieving 100 percent redundancy, however, comes with a hefty price tag. And it’s important to note that high levels of redundancy don’t always mean a system is more reliable. Although this point may seem counterintuitive, increasing component redundancy creates a much more complex infrastructure. As complexity increases, management of the infrastructure becomes more challenging. Working with local data centre experts can help you arrive at the right redundancy strategy for your organization.


Backup Systems

Backup systems include the proper configuration of generator units and uninterruptible power supply (UPS) systems. In a generation system, every available generator unit can be programmed to start automatically during a loss of utility power. As long as sufficient fuel is available, the generators power the entire data centre load until the utility power source is restored.

When regular power is restored, the generators transfer the load back to the utility and stop operating. The transition to and from the backup-generator power is seamless when configured properly. The most effective designs will incorporate the necessary generators to supply power, as well as backup generators should any one unit fail.

Redundancy should also be built into the UPS system so that one failing module won’t affect the overall capacity of the system. Both generator and UPS systems can be configured for automatic and manual power transfer. Automatic transfer is critical during unexpected outages. Manual transfers are used for scheduled maintenance and testing of data centre equipment and procedures without interfering with normal operations.


Detection and Monitoring Systems

Although cyber-attacks get the bulk of publicity, environmental factors can be equally devastating to IT equipment and data centre facilities. To minimize the impact of downtime, a data centre operation must integrate detection systems. These systems can alert you to a problem before it becomes a crippling event.

Detection and monitoring systems will monitor environmental factors such as the following:

  • Temperature: Sensors will measure the heat being generated by equipment as well as the air-conditioning system’s intake and discharge.
  • Humidity and moisture: Sensors ensure high moisture levels won’t corrode electronic components and low levels won’t cause static electricity. They also monitor for leaks inside cooling equipment, leaks in pipes and flooding from a disaster.
  • Airflow: Sensors ensure air is properly flowing through racks and to/from the air-conditioning system.
  • Voltage: Sensors detect the presence or absence of line voltage.
  • Power: Monitoring systems measure current coming into the facility and determine when electrical failures occur.
  • Smoke: In addition to advising data centre personnel of a potential fire, smoke alarms can also be configured to report directly to the local fire department.
  • Video surveillance: Real-time surveillance of data centre activities, especially in sensitive areas, provide data centre managers with a first-hand look of what’s going on in the facility, including who’s entering and exiting.

To meet an organization’s requirements and avoid costly consequences, data centres must deliver continuous uptime. Any unplanned downtime, even for just a few minutes, can disrupt your business operations and result in dire consequences. Even installing the best equipment available on the market cannot guarantee business continuity. A high-availability, reliable data centre requires redundant designs, the right configuration of backup systems and advanced monitoring systems.