We don’t need to go into the details of how COVID-19 is impacting our daily lives, but what should our CIOs, Digital Delivery Directors, Web Masters, SRE (Site Reliability Engineers), etc… be considering during this period? Events are causing an unprecedented amount of e-commerce/online activity, what we’d normally see on Black Friday or Cyber Monday is now the everyday norm. For the foreseeable future, this will continue, so let’s look at some things to consider.
1/. Time to transform.
Most of you now know, you have no choice but to start to transform how you operate. Your employees are not in your central offices, your VPNs and infrastructure are stressed, and your processes may or may not be up to the task of supporting a distributed remote workforce.
You have no choice but to focus on innovation, automate operations, and improve production quality. Disparate tools, used by separate teams, which do not talk to each other need to be replaced with platforms that provide shared visibility, so your remote teams can all be working with the same data towards the same goals.
Now is the time to become increasingly proactive and focus on optimizing customer experience and improving customer care. What are you using to understand your customers? Traditional web analytics only provides so much and does not tie into your backend processes and services.
Remember that improving customer experience and performance frees up capacity; the faster an end-user can complete a transaction, the more users your existing capacity can serve.
2/. Observability is not enough
Let’s talk about what metrics you are using to manage. The industry is focusing on Observability trends: Logs, Metrics, and Traces. This is a good start, but not nearly enough. You should also be looking at user experience, automated discovery, real-time topology mapping, and Causation-Based AI.
Now is the time to focus on user experience; your end-users/customers are, more than ever, looking for quality experiences. You need to know what they are doing, and when they are doing it. Using machine learning to analyze metrics for trends a week later is too late in an economy where things are changing by the hour.
As your workforce is pushed to work remotely, and you (or your partners) consider shifting workloads to the cloud, things are going to change and break. You don’t want to be wasting resources tracking this down when you can automatically discover changes and the impact those changes are having.
You also don’t have time to catalog your infrastructure as it changes. One of the key things you need in order to keep up with this change, and understand the change, is real-time topology mapping. Your workloads are going to change, you don’t have the time or resources to analyze things after the fact. You need to be on top of this in real-time so that you can adapt to the change as needed. Your topology is not going to simplify; it will become more complex and with complexity, there is more risk that your infrastructure can be impacted by relationships you had not been aware of or considered.
Consider using AI to help connect the dots, but remember that classic machine learning will take time (that you do not have); investigate using a Causation-Based AI approach instead. The approaches seen in Deterministic AI (Causation) are going to provide answers faster, and are more practical to deploy. Causation-Based AI can be deployed in minutes, as opposed to traditional machine learning AI projects that can take weeks or months to stand up.
3/. Practical things
Here are some practical things you can do. Some of these apply to the very largest organizations, some apply to single, jack-of-all-trades, webmasters.
Get Visibility Now
You can’t manage what you can’t measure. You don’t have time for creating a DIY approach using some combination of ELK or other open-source tools. Within a week, everything we thought we knew changed. You need visibility within minutes, not days or weeks later.
Aggressively Manage Capacity
Alois Reitbauer just posted this (https://www.linkedin.com/posts/aloisreitbauer_i-am-not-really-in-the-profession-to-do-something-activity-6645947958128893952-_VSo), and bless his heart, never a truer word spoken in these times. Consider aggressively managing your capacity before you start laying off employees. Remember it will be your employees that will help you innovate and transform, not excess capacity that is not being used.
In the article, Alois points out a simple technique to identify where unused capacity could be reclaimed to reduce costs, that can then be used to retain the staff which will help you transform.
Manage Your Design
This often goes overlooked but equally important for the largest organization to the smallest. Simple things like page weight (the amount of data transferred) will not only impact end-user performance, but also your bandwidth and server capacity. The obvious slow down impact happens when your network throughput becomes saturated. Most large organizations use CDNs or Cloud providers to mitigate this; however, smaller orgs that don’t use these services can be severely impacted by this. The longer it takes a page to load, the more likely the web server or load balancer is tied up serving those connections, leading to failed network requests.
Find large unnecessary objects (images, large text objects, etc…) and compress their size. This won’t solve all capacity issues, but it can help.
Protect your VPN
Most VPNs were sized, based on what was normal usage patterns. Well, that sizing has gone out the window now. Increase capacity on your VPN if you can; if not, try rotating the hours when certain users can access the VPN (if applicable) or consider the Cloud. As your employees work from home, consider what workloads (even desktops) can be moved to the Cloud. Clouds offer secure access; in some case you can include your federated SSO to allow you to better manage access.
Now is the time to consider if there are Web UI options that can be used in place of the traditional clients running on your desktops.
Decades-old procurement practices are only going to hinder how organizations are going to be able to address the transformation. Do you have time and resources to go through your traditional tender practices? Wouldn’t those resources be better off focused on developing and executing transformation strategies?
When things are changing hour to hour or day to day, taking months to procure new services, infrastructure, etc… doesn’t scale to the challenge you are facing. If you are a CIO, consider increasing (not decreasing) discretionary spending limits for your senior managers.
What are some other things?
If you can think of other options and opportunities, feel free to leave those tips in the comments section below.
We will get through this
We will, of course, get through this; however, things are going to change, and if anything, this will spur a period of transformation. Those organizations which can transform will do well, those which don’t are going to struggle.
The team here at Dynatrace is ready to help have these discussions with you. If you need to get started now (which you likely do), sign up for a free trial of Dynatrace here and start to get some visibility to help make better decisions faster. /signup/