Are you spending too much time manually remediating issues or manually executing remediation runbooks? You’re not alone. But for many, adopting a NoOps approach to software development and operations that incorporates auto-remediation into daily operations is only a dream.
One company, however, has been making that dream a reality. At Perform 2021, Inna Tabak, ITSM Technology Lead at Northbridge Financial Corporation, joined J-P Contreras, Practice Manager of Innovation Services at Dynatrace, to talk about Northbridge Financial Corporation’s journey to NoOps. Using Dynatrace’s integration with ServiceNow and Ansible, Northbridge has automated their problem reporting, root-cause analysis, and remediation into a single workflow to help reduce MTTR and gain back valuable time.
What is NoOps?
TechTarget defines NoOps as “the concept that an IT environment can become so automated and abstracted from the underlying infrastructure that there is no need for a dedicated team to manage software in-house.”
NoOps expands on an existing DevOps approach by integrating tools, streamlining processes, and implementing automation. Through automation, NoOps dramatically helps reduce the time and effort teams spend on configuration and deployment and removes the need to have a dedicated team managing software. Instead, teams can focus on what’s important and maximize development time.
Taking the first steps to NoOps
Northbridge Financial Corporation is a leading commercial property and casualty insurance group, protecting Canadian business for over 100 years – based in Toronto, Ontario. At Northbridge, a future-forward employee experience was a key strategic initiative with four goals for automation: establish complete observability, improve service, introduce a NoOps approach to eliminate risk, and increase collaboration among teams.
To support this initiative, IT put environment monitoring, preventive automation, and self-healing automation on this roadmap.
“In collaboration with Dynatrace, we made three steps towards our goal: CMDB enrichment, incident creation and resolution, and automated remediation,” explained Tabak. “These three steps helped us to achieve visibility, improve service, and deliver a better employee experience.”
The importance of employee experience
Contreras explained that although preventing outages from happening or problems from occurring is where the financial return on investment comes in, it’s the employee experience to that competitive maintenance approach that is so important. With a reliable IT service, the line-of-business employees at Northbridge Financial experience many benefits, including:
- Having a better level of service with their applications
- Being more confident using those applications, knowing they are reliable
- Being able to focus on their business without having to worry about outages
“Nobody’s really interested in solving problems all day long, doing manual work and having to fight fires,” Contreras said. “Having the opportunity to resolve problems before they turn into something impactful and do it in an automated fashion – ultimately eliminates those inevitable 2:00 a.m. calls where workers are up troubleshooting in the middle of the night.”
Doing work in an automated approach is great for the business, but employees still need to know what’s happening. “With Dynatrace, our teams leveraged tools Northbridge was already using to better connect workers to what was happening,” Contreras said.
“In our engagement with Northbridge, we used the standard Microsoft teams,” Contreras explained, referring to the solution Northbridge uses for ChatOps. “We sent notifications to that platform as things were occurring in the process, and that made sure a lot of teams in Northbridge were aware of what was happening during these times.”
How Northbridge Financial achieved a successful NoOps approach
Prior to adopting Dynatrace, Northbridge Financial teams were spending more time manually inputting data and incidents than their day-to-day activities. To achieve their automation goals, Northbridge utilized Dynatrace to up-level the way the company operates through three steps:
Step 1: CMDB enrichment
Tabak explained how before engaging with Dynatrace, Northbridge’s CMDB was populated manually and was heavily customized. Dynatrace helped Northbridge assess its CMDB process, remove unnecessary customization, clean up its data, and then adjust the Dynatrace workflow so the existing process would be preserved where needed.
CMDB integration application was used to load data from Dynatrace to ServiceNow, with teams working with Dynatrace to ensure alignment to the existing CMDB structure. This was a small foundational step but provided additional benefits including accessing a broader IT audience with better visibility into infrastructure, and visibility to integrations and their dependencies.
Step 2: Incident creation and resolution
Before adopting a NoOps approach, all incidents reported by Dynatrace were entered manually into ServiceNow. This resulted in inconsistency and overhead for the support teams.
Now, incidents are automatically received in ServiceNow, properly categorized, and assigned with enough information that support teams can act upon an incident right away. Support teams are notified through ServiceNow notifications and Microsoft Teams channels.
“Setting up incident integration from Dynatrace was easy for initial configuration, but then we needed to do a few tweaks to align to our incident process and priority matrix,” Tabak explained. “Dynatrace helped us to map their attributes to our existing structure, and this resulted in better visibility into system health, and faster incident resolution.”
Once they integrated Dynatrace and ServiceNow and optimized their incident response processes, then the automation could begin.
Step 3: Auto-remediation
“The third step was where we reaped the most benefits,” Tabak said. “This is where the magic happens, and problems are remediated without human intervention.”
Contreras dug into the details of auto-remediation, and how it can be successfully adopted within an organization, starting with three foundational stages: restart processes proactively, escalate to restart hosts, and communicate to appropriate team channels.
“Dynatrace is the trigger for identifying a problem occurring in an environment, using our AI engine, Davis, to send a notification to ServiceNow to create the incident,” Contreras explained. “Dynatrace enriches the incident with the right configuration and severity and makes sure it’s attached to the right assignment group. ServiceNow will then execute the workflow, communicating with Ansible,” an open-source software provisioning, configuration, and deployment tool.
Once completed, Dynatrace automatically closes the problem and resolves the incident, notifying teams through Microsoft Teams. If the problem is not automatically resolved, the incident is escalated to the host to restart the process.
The final step in the process is effectively communicating the remediation or the reassignment to restart the host to appropriate team channels. This is where Microsoft Teams is a vital support element to the process, as it’s accessible to everyone from all different groups, enabling the company to communicate effectively and bring everyone together to align on what’s happening in the environment at every stage in the process. Without this, a lack of communication and visibility would lead to a less productive workforce.
Having this automated NoOps approach enables organizations like Northbridge Financial Corporation to focus on what matters, rather than troubleshooting with manual processes. Tool integrations and streamlined processes coupled with AI and automation ultimately lead to bigger benefits and enables organizations to achieve their goals.
To hear the full story from Inna Tabak and J-P Contreras, check out their breakout session from Perform here: