How do you monitor and maintain a perfect digital experience for your customers when even a single interaction can touch hundreds of technologies? This was the key theme discussed with one of our Dynatrace customers as part of a virtual breakfast Dynatrace recently hosted. Joining the conversation was our Dynatrace colleagues Naima Iqbal, Account Director, and Carl Morphet, Manager, Business Insights EMEA.
How the last year changed impacted digital experience and the challenges that brought
We opened the conversation with the Lead from a Financial Services Company explaining his role and how, regardless of the technical components, it’s all about putting the customer first and ensuring an excellent digital experience. In their case, this is specifically about the pensions element of their platform which had seen 6-7x as much traffic during the pandemic. “Due to the uncertainty created during the pandemic, customers wanted instant, self-service access to know the value of their account,” the Lead for Service Reliability explained. But how do you meet your KPIs in such circumstances? And even more problematic, how do you do this when the entire operations team is working from home?
The first thing the team did was make sure system performance and responsiveness were front of mind and visible to all stakeholders. The team lead explained how dashboards have been key to achieving this and allowed improvements in performance to be seen not just by the technical teams. The deep dive data in Dynatrace has then been used to make many system improvements where bottlenecks had been automatically highlighted with the context available for the team to prioritize what to fix first.
One of the key insights that have helped the team from a technical standpoint is the end-to-end visibility all the way from the user’s interaction in the browser, through to the front-end microservice layer (which is typically in the cloud), and into the legacy mainframe applications. By achieving a single source of truth in this way, reduces the reliance on different teams to prove their innocence and instead starts fixing the relevant component straight away. These time savings have also translated into pre-production, where sprints have been improved to being completed in 1 or 2 weeks compared to 1 – 2 months, enabling the team to keep up with the incoming volume of work.
Now everything is stable, how is your day-to-day working improved?
One of the biggest shifts the team mentioned was moving towards a tangible alignment between business and IT. This also extends to the executive committee where some of the more technical committee members have been consuming Dynatrace data from the dashboards previously mentioned, but also through the ServiceNow integration. This creates a much more efficient relationship, where executive members can “self-serve” for data relating to performance and users’ digital experience, without having to tie people up in multiple meetings.
Going beyond general performance and experience, the team is now able to pro-actively analyze user behavior and usage. One component that’s seen a huge amount of popularity is the Chatbot. By integrating key metadata into Dynatrace (using session or user action properties) such as the conversation ID – not only is there insight into the overall usage, but also does performance have an impact on their likelihood to open a chat?
Finally, with the increased visibility into performance and user experience, all teams are encouraged to get involved in SLO/KPI creation. If you know what sort of tolerances people have with different parts of the platform before they call you or open a chat, for example, you can set realistic goals on performance that have a tangible reason on why they should be hit. This is an area Carl and the business insights team provide their valuable expertise not just on how technically to achieve this, but also guidance on what would provide a meaningful output to all stakeholders.
What is the next set of challenges?
Rapidly approaching in June, Google will be tracking three new KPIs in the form of “Core Web Vitals” – Largest Contentful Paint (LCP), First Input Delay (FID), and Cumulative Layout Shift (CLS). These will now become ranking signals and so, working with Carl and the team, the company has been working to get these looking healthy ahead of their official inclusion. “With this new inclusion of very technical metrics, this is creating a stronger collaboration between technical teams and the marketing or SEO teams,” Carl explained.
Our guest speaker then discussed that diving even deeper into AIOps would be a key focus, so not just detecting Problems with the platform but also taking some sort of remediation action. This could be taking corrective action like restarting a service or going all the way to rolling back a deployment if Dynatrace has detected that as the root cause. To provide confidence in this space, the team would like to move into the “chaos engineering” (with a specific focus on Gremlin) as not only would it confirm the level of observability for the platform, but also increase meant-time-between-failures (MTBF). This is starting as an activity performed in pre-prod as part of “game days,” to give teams the time they can dedicate to this objective and add their own degree of creativity.
Finally, “observability-led development” is something the team is working towards over the next 12 months. Essentially, for a new release to be signed off, a certain level of observability must be achieved – with the “chaos engineering” mentioned above helping to prove why this is necessary!
What’s been the key to success?
Culture has played a big part in the success of the company’s digital experience. Not only adopting Dynatrace into their tooling strategy but having a mindset change all the way from the top-down where everyone understands why performance and user experience are key to the success of the platform. It’s then up to the teams at the company to keep lines of communication open and, using Dynatrace, give everyone visibility on key performance data and how it affects the success of the business.
A huge thank you to Naima, and Carl as well as all the attendees on the day for driving a very thought-provoking discussion!
To learn more about how Dynatrace can help your team master chaos engineering experiments, join us for the on-demand performance clinic, Mastering Chaos Engineering Experiments with Gremlin and Dynatrace today.