Header background

Why growing AI adoption requires an AI observability strategy

While AI adoption brings operational efficiency and innovation for organizations, it also introduces the potential for runaway AI costs. How can organizations use AI observability to optimize AI costs?

Growing AI adoption has ushered in a new reality.

As organizations turn to artificial intelligence for operational efficiency and product innovation in multicloud environments, they have to balance the benefits with skyrocketing costs associated with AI. An AI observability strategy—which monitors IT system performance and costs—may help organizations achieve that balance.

The good news is AI-augmented applications can make organizations massively more productive and efficient. Indeed, according to one recent survey, 72% of employees said AI-enhanced tasks make them more productive. And an O’Reilly Media survey indicated that two-thirds of survey respondents have already adopted generative AI—a form of AI that uses training data to create text, images, code, or other types of content that reflect its users’ natural language queries.

But a downside of increasing AI adoption is organizations can face skyrocketing costs because AI—particularly generative AI—is computationally intensive and the costs increase as the amount of data the AI models are trained on grows.

Growing AI adoption brings rising cloud costs

There are three key reasons that AI costs can spiral out of control:

  1. AI consumes additional resources. Running artificial intelligence models and querying data requires massive amounts of computational resources in the cloud, which results in higher cloud costs.
  2. AI requires more compute and storage. Training AI data is resource-intensive and costly, again, because of increased computational and storage requirements.
  3. AI performs frequent data transfers. Additional data transfer costs may occur because AI applications require more frequent data transfers between edge devices and cloud providers. These data volumes must be transferred from edge devices to the cloud.

For organizations to succeed with their AI adoption, they need to address these sources of skyrocketing costs. They can do so by establishing a solid FinOps strategy. FinOps, where finance meets DevOps, is a public cloud management philosophy that aims to control costs.

Additionally, organizations need to consider AI observability.

What is AI observability?

AI observability is the use of artificial intelligence to capture the performance and cost details generated by various systems in an IT environment. AI observability also arms IT teams with recommendations about how to curb these costs. As a result, AI observability supports cloud FinOps efforts by identifying how AI adoption spikes costs because of increased usage of storage and compute resources.

Organizations now recognize that they can’t get the business benefits from AI without paying strict attention to costs. AI observability can help them understand the ROI of their AI investments as AI hype gives way to real adoption.

According to the McKinsey Global Survey, “The state of AI in 2023: Generative AI’s breakout year,” 40% of respondents say their organizations will increase their investment in AI overall because of advances in generative AI. But as a recent Forbes Tech Council article notes, using increasingly more AI and running more data computations comes with costs.

“Like a snowball gathering size as it rolls down a mountain, the more data pipelines you have running, the more problems, headaches and, ultimately, costs you’re likely to have,” said Kunal Agarwal in “Why Optimizing Cost Is Crucial To AI/ML Success.”

Because AI observability monitors resource utilization during all phases of AI operations—from model training and inference to tracking model performance—it enables organizations to best balance between accuracy and resource efficiency and to optimize operational costs.

Best practices for optimizing AI costs with AI observability and FinOps

  • Adopt a cloud-based and edge-based approach to AI. Cloud-based AI enables organizations to run AI in the cloud without the hassle of managing, provisioning, or housing servers. Edge-based AI enables organizations to run AI functions on edge devices—such as smartphones, cameras, or even sensors—without having to send the data to the cloud. By adopting a cloud- and edge-based AI approach, teams can benefit from the flexibility, scalability, and pay-per-use model of the cloud while also reducing the latency, bandwidth, and cost of sending AI data to cloud-based operations.
  • Use containerization. Containerization enables organizations to package AI applications and dependencies into a single unit, which can be easily deployed on any server with the necessary dependencies. This optimizes costs by enabling organizations to use dynamic infrastructure to run AI applications instead of designing for peak load.
  • Continuously monitor AI models’ performance. Once an organization trains AI models based on its data, it is important to monitor algorithm performance over time. Monitoring AI models helps to identify areas of improvement and “drift”—or the decline of models‘ predictive power as a result of the changes in real-world environments that aren’t reflected in the models. Over time, models can easily drift from these real-world conditions and become less accurate. Teams may need to adjust models to accommodate new data.
  • Optimize AI models. This task goes hand in hand with continuous monitoring of models. It involves improving the accuracy, efficiency, and reliability of an organization’s AI by using techniques such as data cleaning, model compression, and data observability to ensure the accuracy and freshness of the AI results. Optimizing AI models can help save computational resources, storage space, bandwidth, and energy.
  • Proactively manage the AI lifecycle. Team tasks include creating, deploying, monitoring, and updating AI applications. Managing the AI lifecycle means ensuring that AI applications are always functional, secure, compliant, and relevant by using tools and practices such as logging, auditing, debugging, and patching. Managing an AI lifecycle helps avoid technical issues, ethical dilemmas, legal problems, and business risks. It also helps organizations maintain a competitive edge and customer loyalty.
  • Use generative AI in conjunction with other technologies. Generative AI is a powerful tool, but it is not a silver bullet. The true potential of generative AI comes from using it in conjunction with predictive AI and causal AI. Predictive AI uses machine learning to identify patterns in past events and make predictions about future events. Causal AI is a technique that determines the precise root causes and effects of events or behaviors. Causal AI is critical to feed quality data inputs to the algorithms that underpin generative AI. “Composite AI” brings causal, generative, and predictive AI together to elevate the collective insights of all three. With composite AI, the precision of causal AI meets the forecasting capabilities of predictive AI to provide essential context for generative AI prompts.

Making AI observability and FinOps part of an overall cloud observability strategy

As organizations continue to adopt AI, they can easily face sticker shock in running AI models.

As a result, they need to start considering how to glean cost-effective AI insight. Organizations that can evolve beyond mere AI adoption to cost-effective AI optimization will succeed in this new AI-enabled era. Organizations need to proactively monitor and manage their AI models to ensure data accuracy and cost-effective AI models.

Read the recent Dynatrace survey, “The state of AI 2024,” to learn more about the importance of composite AI to overcoming common challenges and risks of artificial intelligence.