Dynatrace helped VA modernize cloud operations and eliminate cloud complexity. See how shifting from reactive to proactive cloud operations can help you transform faster.
As organizations expand their cloud footprints, they are combining public, private, and on-premises infrastructures. But modern cloud infrastructure is large, complex, and dynamic — and over time, this cloud complexity can impede innovation. As operational complexity increases exponentially, conventional tools and operations approaches hit a cloud observability wall.
This typically happens when organizations graduate from lift-and-shift cloud migrations and begin creating truly cloud-native applications. Organizations hit this cloud operations wall when replacing static virtual machines with dynamic container orchestration and expanding to multicloud environments. However, shifting from reactive to proactive cloud operations helps organizations stay ahead of the game.
At Dynatrace Perform 2022, David Catanoso, acting director of cloud and edge solutions at the U.S. Department of Veterans Affairs (VA), explains how Dynatrace not only solves this cloud complexity, but gives his team confidence they can fulfill VA’s future cloud requirements.
Joining Catanoso are Dynatrace’s Peter Putz, senior technical marketing manager, and Michael Kopp, director of product management. Putz and Kopp share insight into the cloud operations challenges facing organizations today and how teams can overcome these obstacles to become more proactive and transform the way they work.
VA’s journey into the cloud
When the federal CIO Cloud-First Initiative came out in 2011, VA decided to start migrating applications to the cloud. “First of all, we wanted to provide better service to our veterans,” Catanoso says. “Second, we wanted to improve our rollout of the DevSecOps capability, as well as improve our agility and our ability to innovate faster.”
VA’s cloud journey was a long one. The agency executed one of the largest email migrations from on-premises Exchange servers to Microsoft Office 365 — moving almost 480,000 mailboxes to the cloud.
“A lot of consultants will tell you to move some small applications first and then get some experience. And then, at some point, you can move some big, complicated applications,” Catanoso says. “We kind of did the opposite. Right out of the gate, we moved some of our biggest, most mission-critical applications to the cloud.”
Despite many challenges, when the project concluded in 2019, the team had the confidence and experience to migrate anything to the cloud.
How Dynatrace solved VA’s cloud complexity
Today, VA uses Dynatrace to monitor over 150 different cloud instances — even hybrid instances of applications.
Dynatrace addresses several of VA’s cloud complexity challenges, such as finding and fixing telework bottlenecks. When the COVID-19 pandemic hit, VA’s remote workforce grew by over 100,000 people almost overnight, creating a new series of problems to quickly diagnose and solve.
As a result, VA had to rapidly scale its on-premises Citrix environment. “The team did a two-part attack on that, where we rapidly added more physical infrastructure, but also expanded the Citrix environment into all five CSP regions that we had available to us in the government clouds from Azure and AWS,” Catanoso explains. “We used Dynatrace to monitor that large increase in servers. We started out by instrumenting 2,000 servers overnight. In 48 hours, we had a total of 6,500 servers monitored.”
Since then, VA solved many cloud performance problems and kept its complex hybrid environment running without interrupting remote work. Dynatrace’s advanced monitoring capability enabled VA to get through the pandemic successfully while providing services to the nation’s veterans.
VA also used Dynatrace to instrument its Consolidated Mail Outpatient Pharmacy Application, a mission-critical app with a heavily distributed database hosted in seven locations across the U.S. In the past, severe database issues seriously hurt system performance, causing delayed shipping times for prescriptions.
“That team installed Dynatrace to monitor its applications across that environment in preparation, wanting to understand the application today and its current utilization,” Catanoso says. “This enabled us to fundamentally improve the application, remove performance bottlenecks, and also give us the data we needed to understand how to migrate into the cloud.”
Foundational observability paves the way for proactive cloud operations
While modern cloud systems simplify tasks — such as deploying apps and provisioning new hardware and servers — cloud environments can be surprisingly complex.
Kopp explains how Dynatrace brings all the observability data into context and automatically derives its Smartscape topology using signals — such as logs, events, metrics, and application traces. This allows customers to extend the topology from there as they see fit.
Comprehensive observability across all cloud environments is key. “Your solution needs to give you the tools to monitor all of these different clouds with ease,” Kopp says. “We recently made great strides to improve our analysis views to give our customers even more value.”
This allows Dynatrace to present relevant data, in context, for applications and operations — delivering the best observability possible.
How Davis delivers proactive cloud operations
To reach truly proactive cloud operations, you need AI. Davis, Dynatrace’s explainable AI, offers domain-specific answers to cloud-native problems and provides the necessary context to understand how it made that determination.
Davis can also detect problems before they affect your organization. It provides automatic threshold models, so you don’t have to set them yourself. On top of that, Dynatrace has structured models, such as seasonal, trends, and autoregressive component models.
“Tying it all together is the topology that is automatically detected and updated all the time,” Kopp explains. “This allows for a causation reasoning engine that isn’t prone to false positives and offers analysis you can easily understand.”
Dynatrace is also adding forecasting alerts that will allow Davis to detect problems before users notice the impact. This enables a site reliability engineering approach to cloud operations, in which organizations improve reliability using service-level objectives (SLOs) and error budgets. For example, you can use an SLO to define quality gates that trigger alerts when system health is expected to degrade.
“We have a rich metric expression language. This allows you to define any SLO you can imagine based on data that’s in the system,” Kopp says. “This facilitates what’s known as configuration as code or monitoring as code. And it allows you to shift left, giving responsibility for things to the application owners as opposed to the ops team — which is typically overburdened with work.” Dynatrace will soon make it possible to bring SLOs back into the context of observability and AI.
Modernize cloud operations with Dynatrace
Dynatrace customers, such as SAP and Kroger, are already achieving proactive cloud operations and delivering unparalleled value across stakeholders. Development teams can innovate faster with higher quality. Operations teams can run more efficiently. And the organization can consistently drive better outcomes.
To learn more, check out the session, “Modernize your cloud operations from reactive to proactive.”