What is DevOps, release management, and how do you do it?
The Art of DevOps: Embark on a mission to continuously deliver assets to the operational battlegrounds safely, securely, and quickly
An introduction to the landscape
This eBook is inspired by the famous 6th century Chinese manuscript: The Art of War. Today, we will embark on a mission to continuously deliver assets to the operational battlegrounds safely, securely, and quickly.
Our strategy requires optimal monitoring of the frontlines, maximum situational awareness, and enhanced communication with the troops supporting asset deployment—from development and testing to operations.
While I know most of you are battle veterans and have fought many application development wars, I have conducted some reconnaissance over the years and have recommendations that will put you at strategic advantage to win today’s war.
- Islands of development—This is the asset development zone. Engineers receive orders to research, build, fix, or change existing units, all of which must be thoroughly tested. In this stage, Shift Left with continuous testing using automation and virtualization to eliminate long test cycles and increase quality. Adding advanced performance monitoring technology to your arsenal can prevent problems from infiltrating your code after check-in.
- Staging grounds—In this phase, you will increase testing to better prepare for battle readiness. To ensure a successful frontline deployment, supplement manual tests with automated testing and arm your environment with advanced performance monitoring.
- Operational battlegrounds—These are the frontlines of production. Here is where deployments are put to the ultimate test: be prepared to quickly identify and prevent casualties resulting from poor performance. Best in-class communications and advanced performance monitoring is critical here.
You are fighting to deliver the absolute best services and features to your customers quickly, without requiring the war room scenario we all know too well. Continuously delivering the highest quality assets to the front lines—both quickly and reliably—is our mission. We WILL win this war!
Islands of development
Communicating orders via a world-class feedback and delivery loop
The elite DevOps team performs critical preparation drills on the islands of development necessary to research, build, and assemble high-quality assets and deploy them to the staging grounds. The drills provide each development troop member with clear and concise orders.
The critical preparation drills are:
- Document all KPIs and SLAs.
- Prioritize troop orders to maximize the value of each deployment sprint to the operational battlegrounds.
- Optimize communications (between business commanders, officers on the islands of development, staging grounds, and the operational battlegrounds) with a world-class feedback and delivery loop.
For our mission to be successful, it’s imperative for troops to use the same tools, aligned by their functional purpose, across the end-to-end application delivery pipeline. The orders must be tracked in a system that prioritizes and provides full traceability. This enables officers to quantify areas of weakness early, identify accountability, and target training and/or rework.
Good tracking and release management systems provide line of sight to weaker troops and assets, if casualties are sustained in battle. Jira Agile, Testtrack, and Caliber are a few examples. With release management and tracking, each developer has more incentive to maintain high quality assets. Orders can then be tracked with confidence throughout each delivery sprint.
Our development arsenal: Advanced performance monitoring
Now it’s time to look at ways we can mitigate deployment risks. Anything we can do to avoid deploying assets plagued with issues will ultimately save time and money and reduce fatalities on the battlefield.
The earlier we find the issues in the application delivery pipeline, the better. One way to reduce risk is to enhance our developers’ IDEs such as Eclipse, Visual Studio or IntelliJ with advanced performance monitoring.
Advanced performance technologies, like Dynatrace Application Performance Monitoring, provide:
- Visual representations of runtime transaction flows
- Sequence diagrams
- KPI degradations (i.e. response times, method use, execution times, database query timings, exceptions, error counts, logging, and number of remote calls)
- Full comparison analysis between prior local builds or operational builds
- Comparison analysis between specific transactions being repaired vs. a fatal transaction collected during an operational incident
With advanced performance technology, we’ll pinpoint issues faster and reduce the need for numerous break points and line-by-line debugging. Developers will conduct end-to-end post mortem analysis on every local run because call stacks are recorded each time code is run. Not only does this prevent a risky check-in, but designers (i.e. solution engineers and architects) can perform visual architecture validations instead of having to do full code reviews.
Our unit testing arsenal: Advanced performance monitoring
At the next stage on islands of development—asset check-in and build on the CI server—there is another opportunity to use advanced performance monitoring technology.
Completion of the following steps will fully prepare our assets for the upcoming staging grounds. They will provide maximum situational awareness on the condition of the build and on how it will be impacted by future configuration changes.
CI server build
CI servers, such as Jenkins or Bamboo, allow the development force to test and piece together the outbound assets necessary for deployment. During the CI build step, a series of unit testing scripts are executed to exercise as many critical methods as possible. Code coverage tools such as Clover, SonarQube, Team Test, or dotCover will determine critical areas of code to unit test.
Advanced performance monitoring
Once unit test automation scripts are executed, the rich intelligence collected by advanced performance monitoring can be put to use. Performance technologies raise alerts or perform automated tasks such as issuing change requests back to the development troops when KPIs or SLAs are violated. Build-scripting tools like Maven, Ant, or MSBuild align and label performance data for each build, enabling actions to be auto-orchestrated.
Infrastructure configuration and deployment
Finally it’s time to automate the infrastructure configurations and deployments. Common tools like Chef, Puppet, and Ansible coordinate infrastructure changes across one or more machines and certify that all of the proper performance monitoring settings are configured within each targeted machine.
Determining battle readiness
Troops should now have deployed the build from the islands of development to the staging grounds.
We have outfitted our environments with advanced performance monitoring technology and prepared for testing to evaluate the battle readiness of the build.
In the staging grounds, each build must be exhaustively tested to uncover the issues—with performance, security, disaster recovery, UI, processes, and infrastructure—that can infiltrate a successful operation.
Risk analysis performed early in the islands of development (during the requirements gathering phase of each sprint) drives the testing approach.
Risks must be assessed based upon how each addition or change in the sprint will affect the overall operation. The risk level for each change will determine:
- How comprehensive testing should be
- The build priority of the automated test cases
- The data required to identify additional risks
- The effect of configuration changes
- The number of possibilities to test (i.e. browsers or thick client, operating systems, geographical regions, and native mobile applications)
- Business continuity and disaster recovery scenarios
- Login users, roles, and rights
- Workflow paths
Automated testing as our supplemental guard
It can be challenging, time consuming, and expensive to construct automated tests. While they bolster our defenses, they also introduce new complexity because they need to be maintained and regularly updated.
We’ll use testing automation as a supplemental guard to our manual testing approach. Test automation reduces issue identification time, and enables troops to focus manual test cases on specific scenarios.
Automated tests require a solid feedback loop between operations and development. As a best practice, we’ll track and manage test cases in a test case manager like Zephyr, Silk Central, or TestRail that integrates with our requirements tracking tool. This will maintain traceability by linking test cases with requirements throughout the development lifecycle.
- Ensure staging grounds infrastructure imitates that of the operational battlegrounds so test results will accurately assess the operations environment.
- Establish new virtual machines using baseline images (when deemed useful).
- Apply version-controlled configuration templates with the same configuration management tools used in the islands of development.
Our testing arsenal: Advanced performance monitoring
After we have established our testing strategy including use of automated tests, we are ready to select our testing weapons.
Advanced performance monitoring weapons are also a critical component of a testing arsenal in the staging grounds, lowering MTTR (mean time to resolution) and providing a major strategic advantage.
Performance monitoring services offer comprehensive testing services via a global network of machines. These machines create a simulation emulating a vast number of end users, with a diverse set of desktop configurations and mobile devices located around the world. Traffic patterns can be increased to simulate high volumes of concurrent users to provide troops with accurate real world impacts on the build. This automated testing technique in combination with highly focused manual tests provides the optimal approach to issue identification.
Intelligence provided by advanced performance monitoring weapons:
- End-to-end call stacks associated with the failed test cases
- Monitored processes and machine states at the time of the issue
- The problem tier, host, process, component, and class with every method or event level accuracy
- Build-to-build comparisons and deltas at either a macro or very granular level
- Positive or negative fluctuations in key performance indicators at the test case level, e.g. UpdateAccount
- Degradation or improvement analysis against prior builds per test case
- Accumulated baselines per test case
- Response times and duration (including background asynchronous timings)
- Overall database counts (including individual SELECT, UPDATE, DELETE counts)
- Error counts
- Exception counts
- CPU timings
- Significant baseline deviations
- Selected method execution timings
- SLA violations
- Remote call counts
Building lines of defense
The final step in the staging grounds is building our lines of defense. The lines of defense will cut the time it takes to promote changes into the operational battlegrounds and reduce the need for war rooms.
First line of defense: automatic alerts
Our first line of defense is automatic alerting, which includes:
- Signaling dangers in the build
- Raising change requests
- Immediately pointing out where problems are
- Aborting build promotions
- Automatically adding change requests into the team’s work queue
- And much more
Second line of defense: dashboard command center
A second line of defense is to construct an accurate picture of the build’s operations and make it available to every level of command. We will do this by creating consistent dashboards that offer visibility into the environment. Elite DevOps troops from the islands of development through to the operational battlegrounds should use the shared dashboards to view, communicate, and analyze the same metrics.
Third line of defense: drill-down intelligence
The third line of defense is the ability to isolate an issue down to a specific test case, metric, or even an end-to-end enterprise call stack. This intelligence must be packaged to send directly to development troops responsible for analysis or the change. This is what separates a modern battle strategy from an antiquated one. Implementing all three lines of defense will reduce MMTR and increase our troops’ confidence in moving the build successfully from the staging grounds to the operational battlegrounds.
Maximizing situational awareness
Everything up to this point has been preparation for the operational battlegrounds. This is where the world of users converges into our environment and often pushes assets to unexpected extremes.
Winning the war means happy customers and business commanders. We must have a comprehensive performance monitoring strategy for the operational battlegrounds with laser accurate detection and rapid resolution techniques. The same performance monitoring tools and metrics we used in development and testing—tools like Dynatrace should be deployed here.
Our virtual command center—the performance dashboard—maximizes situational awareness by collecting and managing intelligence on:
- All vital infrastructure and software changes in the pipeline
- Operational goals and metrics
- Expected business events through synthetic tests
- Real-time situations
Standard dashboard views must be designed via user scenario before we start asset deployment to the battlegrounds, but ad hoc capabilities are also a necessity. These capabilities allow our troops to react quickly to gather intelligence in a changing environment.
Strategic intelligence to gather
Active user and user experience metrics
- Active sessions
- Abandon rates
- Number of logins
- User or tenant frustration levels
- User experience vs. competitors’ user experience
Hardware and infrastructure health
- Active infrastructure state
- Process health
- Disaster recovery state
- Errors and alerts
Application and transactional activity
- Baselines and deviations
- Errors and alerts
- Synthetic traffic reports and status
Delivery pipeline statuses
- Recent changes
- Upcoming changes and fix list
Business related metrics and health
- Conversion rates if applicable
- Business and high traffic events
Risks and issues board
Security details and risks
Scheduled outages and dependency board
- 3rd party dependency health
- Enterprise service statuses
- External systems and web services
Resource links board
- Contact numbers
- Project plans and documents
- Up-to-date design documents
- Procedure documents
Winning the war
"You can win the battle but lose the war." Operations can provide crucial intelligence on prioritizing the battles.
For example: what if Sargent Joe has been diligently working on two query issues over the last few days, but as it turns out the problem is affecting less than a quarter percent of the users?
When the entire engineering team works closely together informed by strategic business goals and metrics, we can prioritize sprints based upon business value, and the value of each deployed asset will be greatly increased.
Communication and issue resolution
Communication is the absolute key to winning the war. It’s one thing to simply have data visibility and quite another to get actionable information that can be shared with the entire engineering team. With common tools and metrics, troops can identify an issue on a dashboard and drill down for the root-cause analysis, and the team member responsible can conduct problem diagnosis and resolution.
Common communication procedures can now be established when handling each class of issues, for example:
- Server is down. Identify server name and data center, then contact Senior Network Engineer—Enterprise Servers.
- Application A is experiencing high response times. Drill down, and determine specifics such as it’s only when requests are hitting the fourth node in the application server cluster. Contact Senior Network Engineer—Linux Servers.
- Verify credit card calls are failing some of the time. Find the recorded transactional call stacks that fail, and package them up for analysis and re-work for Senior Software Engineer—Enterprise Service Bus.
Intelligence for future missions
The Art of DevOps is a continuous delivery approach that accelerates time to market by integrating performance monitoring into the islands of development and staging grounds to eliminate the enemy of poor performance on the operational battlegrounds. It’s been fun sharing some of the insights I’ve collected over the years. I hope you’ve gained valuable intel during this series to aid you in fighting your battles more effectively and embracing diplomatic solutions between development and operations.
I am leaving you with some additional intel to review for your future missions. So, until next time, stay strong, push onwards and upwards.
- Continuous Delivery by Jez Humble and David Farley
- Release It by Mike Nygard
- The Other Side of Innovation by Vijay Govindarajan
- The Phoenix Project by co-author Gene Kim
- The Speed of Trust by Steven Covey
DevOps tools we love
Here is a collection of tools that foster collaboration amongst Product Management, Development, IT Operations, and Technical Support teams to allow you to build more quality into your products, and support you in establishing better feedback loops.
- Configuration Management—Ansible, Chef, and Puppet
- Test Automation—LoadRunner and Selenium
- Change Controls—JIRA
- Source Control—Git/GitHub
- Build Automation—Ant and Maven
- Virtual Machines—Vagrant, Packer, VeeWee, Docker, and Cloud Foundry
About the Author
Global DevOps Practice Lead, Dynatrace
Brett has twenty-five years of experience—from lead developer, product designer, solution architect, application manager to senior management—that has given him a unique 360°perspective that has earned the respect of customers and peers alike. Specializing in team dynamics and delivering complex mission critical projects under methodologies such as Agile, Lean and Waterfall, Brett brings a deep and broad experience of life cycle expertise, continuous delivery and DevOps into a practice at Dynatrace.