Alarm fatigue, root cause analysis and human error: It’s time to explore new tools to improve data center management with predictive and prescriptive solutions. Source: Pixza Studio / Shutterstock.com
To meet the new demands being placed on data centers, industry leaders must rethink the way they approach their environment, delivery model and how they can leverage the cloud. This launches our article series on improving data center management by designing for observability, resiliency, and better operations.
New demands require modern organizations to rethink their data center environment, delivery model and how to more effectively leverage the cloud. In a world of constant connectivity — uptime, efficiency, performance and observability are critical as leaders make decisions around their digital infrastructure.
What is a sometimes overlooked component required to protect data center operations? The people.
IDC estimates that human error costs organizations more than $62.4 million every year. A significant part of errors created by humans is because of tedious tasks and manual processes. Further, a recent Uptime Institute study points out that more than 70% of all data center outages are caused by human error and not by a fault in the infrastructure design. What does this cost when it all goes down? Quite a bit. Data center outages are expensive. Outages occur with and (often) without warning, leaving severe business disruption in their wake. At the same time, increasing dependency on the data center means that outages and downtime are growing costlier over time. According to a 2016 Ponemon study, the average cost of a data center outage has steadily increased from $505,502 in 2010 to $740,357. This averages out to about $9,000 per minute, and for larger data center operations this loss can be far higher.
Good visibility into the overall portfolio and structure of the digital ecosystem means more concise data, better information and improved visibility into the technical and business aspects of digital infrastructure.
To overcome outages and resiliency issues, data center leaders need portfolio-level visibility. Data center managers might manage thousands of edge locations or dozens of mid-sized colocation sites. It becomes even more critical as cloud, edge and data center systems become further distributed.
Therefore, it’s essential to have direct visibility into all data center operations. Additionally, it’s necessary not to become overwhelmed. Good visibility into the overall portfolio and structure of the digital ecosystem means more concise data, better information and improved visibility into the technical and business aspects of digital infrastructure.
This article series will explore critical new considerations as leaders design the future of digital infrastructure. Specifically, we’ll cover:
- Working with the new cloud, edge and data center balance. Understanding the unique balance between people, data center operations and new remote systems in a more distributed economy.
- Embracing sustainability, efficiency and uptime. Identifying the close link between uptime and resiliency and sustainability based on critical KPIs and how new solutions ease the burden on people and help create more sustainable and resilient infrastructure.
- Enabling operational efficiency that includes both OT and IT. Deploying solutions to make operational management easier including new tools that gather data and insights that help minimize downtime risk and operating costs.
- Deploying root cause analytics and predictive and prescriptive solutions. Exploring new tools that enable quicker troubleshooting and root cause analysis and solutions that help reduce stress levels for data center operators and technology leaders.
Finally, we’ll also dive into real-world use cases, design considerations and how leaders in the data center space can move away from legacy data center management concepts.
Making the Shift to Support Observability, Resiliency and Improved Operations
It’s not always that easy to shift to better and less expensive ecosystems. There are still fundamental issues facing distributed infrastructure as they support more users globally. Consider these emerging problems:
- The digitization era has arrived; however, data center managers are managing disaggregated data — with no actionable insights. Data is split across different subsystems that aren’t correlated, making it challenging to measure and track KPIs.
- Engineers are having issues identifying problems with unclear root causes. Bringing together data and insights to determine the root causes of operating issues can take a while.
- Compliance, regulations and data center professionals are working with non-standard operations. As issues arise, standard workflow procedures either aren’t available or are not followed.
- Data center leaders are running non-scalable solutions and closed systems. Rapidly growing demands require scalable and rapidly deployable solutions that can keep up with retrofit or new build expansions.
Managing a data center can be challenging, especially when operating multiple disparate systems with little connection between them. As noted in an
IEEE paper, data centers feature complex cyber and physical systems, often making resource management difficult. Therefore, it’s essential to jointly leverage potential solutions and partners to optimize computing and environmental resources in data centers. The paper points explicitly to leveraging systems that help with provisioning and utilization patterns in data centers and proposes a macro- resource management layer to coordinate cyber and physical resources.
Balancing business growth and resource constraints is critical for those who need to react quickly when finding operational solutions.
Download the entire paper, “The Data Center Human Element: Designing for observability, resiliency and better operations,” courtesy of Honeywell, to learn more about improving data center management. In the next article, we’ll explore the journey from legacy to modern data center infrastructure.