Home » Our Company » Newsroom Archive » Technology Investments and Best Practices for Assuring Data Centre Availability

Technology Investments and Best Practices for Assuring Data Centre Availability

The increased focus on rapid cost-cutting and energy efficiency has left many business-critical data centres at an increased risk for equipment failures and unplanned downtime events. While companies began adopting high-density configurations, virtualisation and other strategies intended to boost the capacity of existing IT equipment, many overlooked the need for a robust data centre infrastructure to ensure continuity for business-critical applications.

As a result, a number of high-profile outages in various industries were observed in Asia Pacific, most of them costlier than investments that could have prevented them altogether. In addition to an overall disruption of service and (in some cases) loss of customers, these outages translated to hundreds of thousands of dollars in financial losses and future customer business.

To gain a better understanding of which vulnerabilities contribute to such a high occurrence of costly downtime events, a survey by the Ponemon Institute asked more than 450 data centre professionals to cite the root causes of data centre outages experienced during the past two years.

survey results


Culprit: UPS battery failure

Solution: Regular maintenance checks/assessments can help prolong a battery’s life, as well as monitoring solutions that can provide comprehensive insight into battery health. It would also be forward-thinking to keep charged spares onsite to cover any cells that may have expired between service visits.

Culprit: UPS capacity exceeded

Solution: To keep UPS systems operating within capacity, data centre professionals will find much value in utilizing an integrated monitoring and management solution. Establishing a redundant UPS architecture also enables facility owners to increase capacity of their backup power system, with the added benefit of eliminating single points of failure.

Culprit: Accidental EPO/human error

Solution: Data centre professionals should observe and enforce the following rules to minimize potential for errors and accidents:

  • Shield emergency OFF buttons.
  • Strictly enforce food/drinks policies.
  • Avoid contaminants.
  • Document maintenance procedures.
  • Accurate component labelling.
  • Consistent operation of the system.
  • Ongoing personal training.
  • Secure access policies.

Culprit: UPS equipment failure

Solution: Depending on system topology, there are various reasons behind UPS equipment failure. But with this in mind, it’s important to service/maintain UPS fleet to maximise UPS life cycle. Moreover, similar to dealing with UPS over-capacity, system redundancy is also crucial.

Culprit: Water incursion

Solution: The use of a refrigerant-based row-based cooling solution is a best practice for minimizing the risk of cooling-related equipment failures. Unlike water-based systems, refrigerant-based cooling does not rely on electrically conductive cooling element, minimizing the risk of system failures in the event of a cooling fluid leak.

Alternatively, integrating a comprehensive leak detection system into the cooling infrastructure is essential to mitigating the risk of system failures due to water incursion. Leak detection systems signal an alarm when moisture reaches potentially hazardous levels.

Culprit: Heat related/CRAC failure

Solution: One way to minimize the risk of heat-related CRAC failure is to optimise the air flow within the data centre by adopting a cold-aisle containment strategy. In this set-up, hot air expelled from the rack is not able to re-enter the cooling environments, ensuring the cooling capacity is utilized as efficiently as possible.

Another way to increase cooling system effectiveness is by using a row-based cooling-solution. Row-based cooling solutions can reduce the annual cooling related power consumption by nearly 30 percent.

Culprit: PDU/circuit breaker failure

Solution: To reduce cases of PDU overload, data centre professionals should consider investing in a PDU that has integrated branch circuit monitoring capabilities. Branch circuit monitoring solutions utilize branch circuit sensor modules and individual current transformers to monitor current input/output for the main panel board as well as individual branch circuit breakers.

Read the complete whitepaper on the leading culprits for data centre downtime here: http://www.emersonnetworkpower-partner.com/ArticleDocuments/Addressing%20the%20Leading%20Root%20Causes%20of%20Downtime.pdf.aspx.

Click here for more CSS news