2022 was the year when we saw an end to the pandemic border closures as the world slowly started to open up again. Travel resumed, events picked up momentum, and the volume of Zoom calls were reduced ever-so-slightly. Although the pandemic wreaked havoc across many industries, it did provide an opportunity for savvy Asia-Pacific (APAC) organisations to build up their digital muscle and expedite projects that may not have seen the light of day had it just been business as usual over the past few years. While it’s great that digital projects have gotten off the ground, I’ve personally observed that there’s room for improvement in terms of engineering execution.
Below are four key predictions that I see gaining momentum in 2023 and beyond.
Full Lifecycle Observability – insights in context
Full Lifecycle Observability is about two things: fixing problems and getting ahead of them. By embedding observability earlier in the software lifecycle, engineers can plan for and fix performance problems that they ordinarily wouldn’t recognise until code is running in production. Sounds simple enough, but few organisations have achieved it.
Traditionally, developers write code and ship it, then someone else (i.e. QA and Operations teams) is tasked with identifying performance issues and running it in production. With Full Lifecycle Observability, software engineers are responsible for the performance, quality and reliability of their code because they have all the telemetry data in front of them, in context. They are able to understand how their code will behave in production because the data is no longer siloed within a separate team or tool. They have end-to-end observability across all of their environments, so they are able to make informed decisions and catch problems before they reach production. Think about full lifecycle observability as not just observing production, but observing the dev, test, and staging environments through the one unified lens, accessible to all teams.
Today, observability tends to be focused on the ‘operate’ phase of the software development lifecycle (SDLC). By embedding observability into the plan, build and deploy, as well as the operate stages, businesses will be able to benefit from early insights and achieve Full Lifecycle Observability.
FinOps: cost as a golden signal
Given the current macroeconomic environment, cost optimisation, right down to the cost per transaction, is going to be more and more important for organisations. If a business is doing thousands of transactions per minute, what is the financial flow-on effect of one small tweak? Is it positive or negative? Today major decisions are based on the traditional golden signals of latency, throughput and errors–what’s missing is cost.
Companies are starting to spin up FinOps departments which are responsible for ensuring that the budget is being consumed appropriately. However, in my opinion, that’s going backwards unless that data is shared. We can’t tell engineers that they need to be responsible for their code right up to deployment unless they know the cost implications.
By shifting left and enlisting cost as a golden signal, engineers have a cost dimension to consider when planning what to work on and shipping code. But cost can’t exist in a vacuum, like every other golden signal, it needs context. Cost alone doesn’t tell you much. It tells you what you’re spending, but it doesn’t tell you what that spend supports. To understand the weight of the cost, you need a third dimension: the relationship that cost has to the business and the importance of the business function that it’s supporting.
This is how AWS operates. It has armies of data scientists that look at all these different facets and experiment by tweaking certain levers and analysing the potential cost savings. It then completes a sensitivity analysis to assess the probability of those outcomes unfolding.
If done right, FinOps has the potential to help organisations on their data-driven spending decisions. Take the example of Australia Post which has spent the last 18 months solidifying its FinOps efforts, enabling its teams to maximise business value and gain greater visibility into its resource usage to make insightful decisions and increase cost accountability.
Security is the second missing golden signal
Security is another golden signal that’s currently largely missing from the start of the development process. Most vendors have built security mechanisms for security professionals, but not developers, so engineers are conditioned to outsource the responsibility of secure code to the security team and expect them to catch any vulnerabilities. If instead we provided sufficient signals to engineers with controls and policies in place, for example, engineers wouldn’t be able to merge code unless security thresholds were met. It makes security a forethought in the development process, rather than an afterthought.
Some might argue that these additional steps might be overbearing for engineering teams but ultimately, if the code isn’t meeting security or even cost thresholds, it will just yield a delayed response. Instead of fixing issues upfront, developers need to fix them after they’ve been deployed. These kinds of security controls would actually streamline software development because without these safeguards, the issues will still have to be fixed. It’s just a longer, inefficient process with more risk while internal politics play out. Incidentally, data from the 2022 Observability Forecastshows that an increased focus on security, governance, risk and compliance is the number one trend driving the need for observability globally at 49%.
Preventative observability (catching a falling knife)
CRISP, or critical path analysis of large scale microservices architectures is a concept that was born within Uber to address minor performance degradation which was very difficult to pinpoint using existing technologies. To address it, Uber developed a tool that uses machine learning to identify latent performance issues in very complex architectures. For example, you might have a transaction that touches 1,000 different services. A handful of those services have a minor deviation in the performance. If those services are on the critical path performance can be significantly impacted even though in isolation each service looks fine. So, they open-sourced it all as Critical Path Analysis, or CRISP.
While CRISP sounds like the holy grail, in reality, it’s probably years away from being introduced by commercial vendors. However, some level of machine intelligence that reduces the need for human capital across many APAC organisations is available and I’d argue, essential.
To prevent problems, engineers need to identify small deviations before they turn into much larger issues. This requires leading indicators that are hard for humans to see just by looking at a chart, and challenging for regular anomaly detection methods to pick up. By introducing technology like CRISP with specialised algorithms and machine learning, we can reduce the human capital required to detect and remediate latent performance issues in large-scale microservices architectures.
The good news is that tech professionals in the Asia Pacific region were the most likely to view observability as more of a key enabler for achieving core business goals (58%), compared to 48% surveyed in North America and Europe as revealed by the 2022 Observability Forecast. Conversely, respondents surveyed in Asia Pacific were the least likely to say that observability is more for incident response/insurance (15%), compared to 22% surveyed in North America and 24% surveyed in Europe. This is a hopeful sign that the region is ready and willing to adopt observability and realise the benefits it offers more widely.
By Peter Marelas, Chief Architect, APJ at New Relic
This article was first published by Technology For You