Cloud operations and observability boost resilience for American Family

Many organizations are investing in multicloud, making improving cloud operations critical. Learn how American Family Insurance implemented Dynatrace to increase resilience.

As more organizations invest in a multicloud strategy, improving cloud operations and observability for increased resilience becomes critical to keep up with the accelerating pace of digital transformation.

When American Family Insurance took the multicloud plunge, they turned to Dynatrace to automate Amazon Web Services (AWS) event ingestion, instrument compute and serverless cloud technologies, and create a single workflow for unified event management.

At Dynatrace Perform 2022, Technology Services Manager Thomas Janik and AWS Monitoring SME Matt Gault, both from American Family, explain how they boosted their cloud operations to increase resilience. Additionally, Dynatrace’s Michał Naleziński, senior product manager, and Rob Jahn, senior technical partner manager, share how Dynatrace helps teams overcome obstacles in these complex environments, become more resilient, and transform faster.

Multicloud observability enables IT teams to focus on what matters most

American Family turned to Dynatrace to help them monitor complex environments without the hassle. Dynatrace combines continuous automation and observability into a single, end-to-end platform. This gives organizations visibility into their hybrid and multicloud infrastructures, providing teams with contextual insights and precise root-cause analysis. With a single source of truth, infrastructure teams can refocus on innovating, improving user experiences, transforming faster, and driving better business outcomes.

Dynatrace OneAgent provides automatic full-stack data capture for dynamic multicloud environments. Additionally, PurePath provides distributed tracing with code-level detail at scale with contextual data. Dynatrace’s most unique feature is built into the core of its platform: Davis. The intelligent AI engine instantly processes billions of dependencies for precise answers, prioritizing them by business impact and including root-cause determination for issues.

“Dynatrace is enterprise-ready, including automated deployment and support for the latest cloud-native architectures with role-based governance,” Naleziński explains.

American Family turned to observability for cloud operations

When American Family began crafting an application focused on an AWS-first concept, Janik and Gault had three requirements:

  1. A single monitoring platform
  2. Self-service instrumentation capabilities
  3. The ability to drive orchestration within a single workflow tool for multiple event sources into ServiceNow

The team couldn’t do everything within AWS, so they contracted some pieces to several SaaS providers. American Family went through proofs of concept and cost-benefit analyses to select a monitoring solution. With capabilities such as application performance management, real user monitoring (RUM), and advanced Session Replay, the insurance company initially decided to switch from an on-premises perspective to Dynatrace.

“Dynatrace’s out-of-the-box AWS functionality and future roadmap functionality motivated us to convert to Dynatrace in 2020,” Janik says. After American Family completed its initial conversion to Dynatrace, they needed to automate how their system ingested Amazon CloudWatch metrics.

Step 1: Automate AWS metrics ingestion with Dynatrace

American Family uses OneAgent installations and ingests Amazon CloudWatch metrics into Dynatrace to monitor resources in hundreds of AWS accounts. Now, the team has dashboard capabilities beyond what Amazon CloudWatch provides, including network visibility, ingress and egress metrics, SLO monitoring, and individual user endpoints for synthetic monitors.

“We quickly understood there was a need for automation,” Gault says. The biggest prerequisite was to set up the required identity and access management (IAM) role in each AWS account to give Dynatrace access to CloudWatch metrics. American Family deployed this IAM role through the code pipeline to all existing AWS accounts. Then, the team set up the provisioning process to include this IAM role in any new AWS account.

American Family used Python scripts to call the Dynatrace API, setting up the configuration in Dynatrace for each account. To reduce the manual effort of account reconciliation and running the scripts, they converted the Python scripts to Lambda functions. This gave them a separate Lambda function for every Dynatrace environment in their managed cluster.

“Each Lambda function has an automated account reconciliation process that queries the organizational ID of the landing zone. It compares that list with the previous run to automatically add any new account into the Dynatrace console,” Gault explains. Once the accounts are set up in Dynatrace, the system queries Amazon CloudWatch for new metrics every five minutes. It only costs about $.01 for every 1,000 metrics.

Step 2: Instrument compute and serverless cloud technologies

Once American Family completed event ingestion, they needed to provide a simple, reusable way for application teams to instrument compute and serverless cloud technologies using Dynatrace OneAgent.

“We were early adopters of OneAgent Lambda monitoring. So we published documentation, including a template for application teams, to set up their code to add OneAgent to their Lambda functions,” Gault explains.

To boost its cloud operations, American Family developed AWS Systems Manager (SSM) parameters that they distributed to every AWS account for the layer ARN, which is hosted in the Dynatrace AWS accounts. This allowed them to provide the OneAgent version for each supported Lambda runtime. American Family set up SSM parameters for the environment variables required for the OneAgent ingestion, like tenant info and wrapper.

From there, American Family set up templates in both AWS CloudFormation and Terraform for Amazon EC2, ECS, and Kubernetes. They also enabled RUM for Lambda functions, giving PurePath visibility into AWS functions using the X-DTC header. In addition to AWS Lambda, they can now track the user experience on single-page apps. PurePath provides options when OneAgent isn’t available — such as SaaS providers, shared infrastructure, or places where other monitoring agents might be in place.

American Family also implemented agentless monitoring for the user experience on vendor applications. These capabilities provide enterprise-wide transactional tracing across multiple data centers, cloud accounts, and instances.

Step 3: Create a single workflow for unified event management


Next, American Family needed to utilize a single workflow service for event and incident management from multiple sources — such as AWS, Google Cloud Platform, Microsoft Azure, Dynatrace, and other proprietary monitoring services. They had various components sending monitoring data into Dynatrace. From there, Dynatrace drove orchestration that occurred in ServiceNow. Orchestration was handled through ServiceNow because, from an event management standpoint, that was their single source of truth.

American Family created an on-premises synthetic monitor that automatically restarts a JVM if it has a 503 error.

“Once this alert is triggered, Dynatrace takes the information and runs the script through ServiceNow. This will remove the JVM from the load balancer, restart the JVM, and put the JVM back into the rotation. And then the synthetic problem closes in Dynatrace to validate the issue is fixed,” Gault explains. “All this takes place with no impact to the customer,” he adds.

Full cloud observability for increased resilience

American Family plans to expand monitoring across the enterprise using Dynatrace as a single source of truth.

Dynatrace works closely with cloud vendors to provide the broadest view of multicloud environments. This includes metrics, logs, distributed tracing, and user experience data,” Dynatrace’s Naleziński says.

Instead of assembling a loosely coupled toolset and displaying observability data on dashboards, Dynatrace keeps information in a unified, all-in-one platform, in context, and tied to business impact. The advanced observability enables better time to market, efficiency, cloud operations, and lower total cost of ownership than general-purpose data analytics solutions.

Dynatrace provides out-of-the-box support for all major cloud platforms and hundreds of technologies. It also supports custom integrations for APIs. Therefore, organizations can extend their capabilities across existing ecosystems and drive automation in development, releases, business processes, and application security.

To learn more about how American Family Insurance leveraged Dynatrace to achieve cloud operations for increased resilience, watch the Dynatrace Perform session Build a proactive approach to cloud operations for increased resilience.

Stay updated