The importance of AI to modern observability

BY Kicker Comms
April 12, 2021

With thousands of microservices, hundreds of releases per day, and hundreds of thousands of containers, there’s no way that the human eye can cope with the level of complexity that defines modern technology environments. Traditional monitoring, limited to sending alerts after an anomaly occurs, is no longer sufficient.

Today’s digital world demands a seamless user experience, and IT teams need to resolve issues before they escalate. They need to know not simply what problem has occurred but why it is happening. And they need this information immediately. Legacy tools and processes, struggling to cope in the current high-volume data environment, simply can’t deliver on observability.

Enabling true observability requires collecting granular telemetry data, applying intelligence to contextualise the data, and making it actionable. Deploying AIOps, applying machine learning (ML) and data science to IT operations takes the headache out of incident response. It dramatically improves speed to resolution while reducing alert noise and guesswork.

These are three reasons why artificial intelligence is so crucial to ensuring modern observability.

1. Managing the variables

With huge variability between use cases, configurations and architectures, applied artificial intelligence in observability should sit side-by-side with tech teams to ensure that variables are managed.

AIOps can then automatically correlate events based on their context and relationship data across systems, with pre-trained machine learning models that eliminate steep and costly learning curves. And this all takes place in one curated, unified platform, giving a single pane of glass on the entire stack.

2. Proactive decision making

AI is not just ‘nice to have’ when it comes to proactive detection and decision making. It equips SRE teams with the tools that they need to do their job effectively. As well as getting to the root cause of problems faster, engineers can leverage AI and machine learning to predict possible issues, drive automation and identify problems long before they escalate and become problematic.

Business-critical apps can’t afford to have service disruptions. Downtime can cost millions of dollars, with reputational and revenue damage plus a loss of customers potentially having a long-term impact.

There can also be a loss to internal productivity, penalties and legal costs to contend with. A 14-hour Facebook outage in 2019 was calculated to cost $90 million and saw stock fall by 2% in after-hours trading. AIOps tools only need to avert one single disaster to deliver immediate ROI.

3. Complementing the human touch

The next five years aren’t going to see humans replaced by AI in the software reliability engineering arena. Humans remain absolutely central. This technology supports human innovation by augmenting SRE teams to make sense out of very complex architectures that complement their skill sets.

AIOps ultimately enables organisations to dedicate more resources and focus to User Experience (UX) and innovation, to maximise conversion and business growth. They become real-time data-driven organisations, where performance is continuously measured and improved to exceed customer expectations and beat the competition.

In summary, observability delivers significant competitive advantages. Gartner predicts that large enterprise use of AIOps monitoring tools will increase from 5% in 2018 to 30% in 2023. AIOps offers engineers the ability to harness AI and machine learning to predict possible issues, determine root causes, and intelligently drive automation to resolve them.

It also enables them to focus their skills on higher-value work, such as delivering new features and creating a flawless digital experience.

By New Relic cice president of customer solution group APJ, Jill Macmurchy

This article was first published by IT Brief