What is DevOps? A Beginner's Guide
DevOps is a general collection of flexible software creation and delivery practices that looks to close the gap between software development and IT operations.
More and more organizations have adopted DevOps practices to streamline software development, increase developer productivity, and enhance continuous delivery workflows to deliver better software faster.
As DevOps pioneer Patrick Debois noted in 2009, tactics, — not just technology solutions, — define a successful approach to DevOps that can fundamentally transform IT. But while this tactical focus offers increased flexibility for teams, it can quickly lead to data and communication silos across the organization which can impair software quality and speed of delivery. Without help, it can be extremely difficult to gain strategic insight into how development teams perform their day-to-day tasks, how to automate DevOps pipelines, and how to architect software for reliability and resiliency in modern cloud-native environments.
That help comes in the form of artificial intelligence (AI). AIOps, the discipline of applying AI and advanced analytics to IT operations, has transformed how organizations manage complex systems. Using these same principles, organizations can take a more intelligent approach to DevOps —an AIOps approach to DevOps that leverages AI throughout the software development life cycle (SDLC). DevOps, together with complementary technologies and tactics, such as site reliability engineering (SRE), has the potential to transform the business.
To better understand the transformative power of DevOps, we'll explore the basics of DevOps and the growing role of site SRE; delve into key DevOps benefits and challenges; discuss DevOps best practices and key DevOps metrics; and examine how AI and automation at every stage of the DevOps lifecycle can transform the way organizations develop and deliver better software faster.
Getting to know DevOps and SRE
What is DevOps?
DevOps is a flexible framework of software development practices organizations use to create and deliver software by aligning and coordinating software development efforts — "Dev" — with IT operations — "Ops."
Development meets IT Ops
The easiest way to conceptualize DevOps is as a continuous loop. Instead of discrete processes, development, and operations become part of an ongoing cycle that includes planning, coding, building, testing, releasing, deploying, operating, and monitoring applications and service
This continuous workflow approach enables teams to immediately identify and address issues related to both form and function earlier in the process to avoid problems before software is released into production.
The recent development of cloud-native technologies, open-source solutions, and flexible APIs have further enhanced DevOps efficiency. With its roots in Agile development, DevOps is ideally suited to help teams keep pace with accelerating development and release models, such as continuous integration and continuous delivery (CI/CD).
What is SRE? Software resiliency built in
SRE is a software operations practice that manages the details and big-picture concerns of software resiliency to ensure software systems’ availability, latency, performance, and capacity. Site reliability engineers understand the needs of software systems and set up processes and structures to meet those needs.
Google VP of Engineering Ben Sloss coined the term in 2003 when he and his team began to apply software engineering principles to software operations to create more reliable and scalable software systems. Implementing SRE can help organizations reduce friction between development and operations components of DevOps teams — streamlining efficiency and reducing error rates.
SRE complements DevOps practices by offering increased automation to reduce reliance on manual tasks. These practices help users solve their own problems and deliver reliability-by-design earlier in the development process.
Ultimately, SRE helps organizations achieve their operational goals, such as reduced downtime or faster resolutions, by defining and automating service level objectives (SLOs).
How do SRE and DevOps interact?
SRE and DevOps are essentially two sides of the same coin. While DevOps frameworks focus on whole-lifecycle collaboration and breaking down silos, robust SRE helps implement and automate DevOps practices using SLOs and ensures those systems—and the software they produce—are resilient.
According to Andi Grabner, DevOps Activist at Dynatrace, "DevOps and SRE are a balance between speed and safety." While DevOps helps organizations move from left to right along the development and operations lifecycle to boost overall speed, SRE moves right to left to help reduce failure rates earlier in the development cycle.
While it's possible to have DevOps without the SRE, these two processes work best as a pair, effectively creating a continuous cycle that delivers ongoing improvements across both directions of the CI/CD pipeline. As a combined practice, companies can seek to increase automation, speed up delivery, improve software quality and much more.
With this background information in hand, let’s now dive into the fundamental components of the DevOps mindset to get a better idea of where your team will need to concentrate its efforts to get the most out of DevOps practices.
Fundamentals of DevOps
DevOps is a cultural shift that requires vision, planning, executive buy-in, and tight collaboration to successfully establish a more integrated way of developing and delivering applications. By embracing a few fundamental practices, teams can improve their efficiency and develop a deeper understanding of their workflows, toolsets, and processes so they can release better software more quickly.
Because DevOps is a continuum, these practices should also be continuous and ongoing. This chapter covers the basic tenets or practices that form the fundamentals of adopting a DevOps approach.
Continuous integration (CI) is a software development practice in which developers regularly commit their code to a shared repository. Because microservices architecture is distributed, continuous integration allows developers to own discreet, manageable chunks of code and individual features and work on them in parallel. The distributed nature of these applications allows for frequent updates — often multiple times a day.
However, developers can't just push build updates haphazardly. CI is tightly controlled; new commits trigger the creation of fresh test builds via the build management system. Redundant code is rejected, and breaking changes are minimized once master branches are altered. Incremental changes are encouraged. Additionally, reduced reconciliation prevents mandatory code freezes that commonly stem from conflicts.
Overall, continuous integration enables teams to build and test software faster and more efficiently. By regularly merging code, teams also always have an up-to-date build that speeds up testing and bug fixing, boosts merge confidence, and helps to shorten the development pipeline.
While CI focuses on regular, independent code updates to a central repository, continuous delivery (CD) focuses on releasing completed code blocks to a repository at regular intervals. These blocks of code should always be in a deployable state for testing or release to production.
Continuous delivery is often confused with continuous deployment — the next process in line, which releases finalized code into production. Deployment is the act of making new and updated software available to end users. Accordingly, the CD primarily denotes "continuous delivery," or both "continuous delivery and deployment," but rarely just continuous deployment.
CD takes code and adds it to a repository, such as GitHub or, in the case of a microservices-based environment, a container registry. The end goal is to increase release consistency by perpetually keeping code in a deployable state. Software development becomes more nimble and more predictable as a result.
Continuous testing and validation
Continuous testing in DevOps is important at every stage of the SDLC. It involves many stakeholders including the development team, quality assurance, and operational staff. The goal of continuous testing is to evaluate the quality of software as it progresses through each stage of the delivery lifecycle. This not only stops bad to code in its tracks but also provides fast and continuous feedback to the Development teams with the information they need to address any quality concerns.
Whilst continuous testing is important to scale, manual validation of test results derails the software delivery process. This is where continuous validation comes in - automating the evaluation process of test results against your pre-defined service level objectives. Continuous validation compliments the implementation of continuous testing by eliminating any manual analysis required whether its comparing data on dashboards or checking off boxes on a spreadsheet. Instead, DevOps teams can set up techniques like quality gates that automatically enforce predefined quality criteria and prevent bad code from progressing to the next stage
Continuous monitoring and observability
Though organizations strive for airtight CI/CD processes, there are often opportunities for improvement. Monitoring and observability are key to understanding viability of code as it progresses through the pipeline. While detecting issues and vulnerabilities is always important, the sheer amount of observability data associated with modern multicloud apps create means that there's simply no way to manually track everything that's occurring across the software stack.
Traditionally, DevOps monitoring was closely associated with “ops” teams but has since evolved across the full software development lifecycle (SDLC) as key stakeholders are increasingly requiring answers to solve their challengers faster. These answers are possible when you have a system that is continually monitoring and analyzing observability data. Continual data capture can be empowering when leveraged intelligently, and this is where the introduction of observability is critical to DevOps.
Observability is more than just collecting metrics and arranging them in dashboards. Having an AI engine that’s working 24/7, 365 days of the year to analyze data and provide answers to anomalies and problems helps teams remediate issues faster and make better release decisions. This drives better code quality, better application performance which translates to better end user experiences. As software complexity increases, it is becoming harder for DevOps teams to deliver new features and releases faster without sacrificing quality. Therefore, empowering your teams with continuous observability and an AI engine to analyze all the data and provide answers is critical for success.
Another fundamental DevOps practice is continuous security based on testing, monitoring, authorization, and inventory tracking. This is the evolution towards DevSecOps. Simply put, continuous security is the process of making security part of the CI/CD process, covering the full SDLC, by adding an extra layer over the DevOps process and pipelines to ensure your infrastructure and applications don’t have vulnerabilities and risks associated with them. Today, as we see environments become increasingly more complex, a “bolt-on” approach to security is not scalable or sustainable, and therefore baking security into your automated processes to enable the continual testing across your development lifecycle is imperative.
Similar to the shift-left mentality with regards to quality testing, security measures should be baked into planning and creation from day one, and they should occur constantly throughout the development lifecycle including when software is running in production. Further, any continuous security measures implemented should be automated as appropriate, as not to hamper efficiency. In terms of culture, security personnel must be regarded as full partners in the DevOps process, on par with developers and operations specialists – hence the shift towards DevSecOps.
Busting down silos is paramount to ensuring good communication and unification across the DevOps pipeline. Effective DevOps execution means establishing a single source of truth — aggregating data from many sources into one collective location. Testers, engineers, QA, and even non-technical stakeholders can gain valuable insights from these bits of information. Under this paradigm, each will contribute in their own way to the creation of software that drives high-level business outcomes.
Seamless cooperation between developers and ops teams is especially critical. DevOps processes coexist in a continuous cycle known as a feedback loop. Different portions of a project are completed and reviewed by stakeholders, and feedback is returned from those steps. Code must be written, tested, validated, delivered, built, and ultimately deployed for end users.
Cross-team collaboration benefits from accelerating this cycle. Accordingly, automation has become a key ingredient in shortening the cycle from end to end — chiefly by reducing the friction caused by multiple parties working at the same time.
Taking DevOps to the next level
To keep pace with innovation and the need to deliver services to market faster, organizations need to move beyond the basics and take their DevOps practices to the next level. Evolving these fundamental practices into elite performing DevOps requires some best practices, which we cover next.
DevOps best practices
Many organizations claim to have a fully functioning DevOps process. But DevOps is more than just a workflow and a few tools your organization can implement and move on. It's helpful to think of it as a philosophy — a culture and mindset — that takes continuous optimization, creativity, and flexibility to maintain. Maybe an organization has implemented organizational changes and tools that lay the foundation for a good DevOps process, but they may be missing some of the benefits DevOps can offer.
With this journey of improvement in mind, let's explore some DevOps best practices that can take your investment in the fundamentals to the next level and ensure you're making the most of your DevOps strategy.
Automation is a cornerstone of every company’s DevOps strategy. In short, automation reduces toil, helps you accelerate your delivery pipelines across the full SDLC, and enables you to scale your DevOps practice.
Traditionally, processes such as testing, monitoring, error discovery and remediation comprised a little automation and a lot of manual intervention. This worked when small teams worked on monolithic applications. But with modern microservices-based applications and with digital transformation putting even more pressure on IT, automation is crucial to increase velocity and quality by driving consistent processes across every stage of the DevOps lifecycle. As a result, you can push code to production more frequently and produce consistent, reliable, and secure software whilst saving your DevOps team
Monitoring and Observability
Monitoring and observability are essential to incorporate across every stage of the software development lifecycle, from pre-production to production. Whilst automating as many processes as possible increases the efficiency of your DevOps workflows, monitoring and observability provide your teams with visibility into those automated processes to detect and pinpoint the root causes of any problems or bottlenecks.
Many tools provide data and dashboards to track the health of individual systems. But to develop an effective observability strategy that yields actionable answers about systems throughout the DevOps toolchain, you need more than just data on dashboards – you need an intelligent approach.
This is an important practice to implement so your team can identify failures or performance problems before any impact is felt by your customers.
Data can be an IT team's best friend, especially when it comes to testing and delivering code and monitoring services more efficiently. However, processing the massive amount of data created by today's applications is beyond the ability of humans alone. This paves way for an AI engine that can constantly analyze all the observable data down to the code-level detail, and that gives the development team the power to identify issues, get answers, and quickly remediate problems when they happen
Harnessing AI as part of your DevOps processes enables you to enhance functionality and automation in development, testing, security, delivery, and release cycles as well as constantly monitoring the performance of deployed software far more efficiently than using manual efforts.
SREs live and breathe service level objectives (SLOs). Ensuring production service levels are on track requires continuous evaluation of service level indicators (SLIs) against SLOs. But that begs the question: why shouldn’t developers ensure the code they build meets the same production SLOs? This concept of shifting left improves software quality, helps detect issues much earlier in the lifecycle, and prevents code that doesn’t meet production SLOs from progressing to the next stage. The results are fewer SLO violations in production, time and money saved due to fewer or no war rooms, but more importantly, ensuring 100% of business service level agreements (SLAs) are met.
One way to automate this shift-left process is through quality gates, which allow you to automatically compare SLIs from any pipeline tool (such as monitoring and testing) against pre-defined SLOs. If code does not pass the SLO-based quality gate, it cannot progress to the next stage, and the system automatically notifies the development team to remediate the problem.
Progressive delivery (also referred to as shift-right) focuses on expanding overall CI/CD practices to help deliver applications and services with more control. It allows organizations to precisely manage how and when new features, updates, and fixes are delivered to minimize the potential negative impact to the user base. Some common practices include blue-green deployments, A/B testing, canary deployments, and feature flags:
This application release model gradually transitions users from a current version of an application or service (the "blue" version) to a slightly different new release (the "green" version) whilst both blue and green are running in production. This change should feel seamless to the user base, and blue can stand by in case an unforeseen problem with green requires rollback to an earlier, more stable version.
Also known as split testing, A/B testing refers to randomized experimentation processes where two or more versions of some variable — for example, a service, webpage, or page element— are shown to different end-users. From there, you can monitor app performance as well as user behavior and satisfaction to determine which option is best for business goals.
Also known as toggles, feature flags are a development practice that allows software and development teams to enable and disable parts of a codebase with a simple switch (or flag). Feature flags help organizations decouple code deployments from feature releases, allowing them to make code changes in production that remain hidden from the users until they are activated. This results in increased deployment speeds, improved system stability, and better cross-team collaboration.
All deployments in production carry risk even with comprehensive monitoring and testing. One method for developers to mitigate serious disruption is through Canary deployment. The term originates from when canaries (that breathe in air much faster) were used to detect toxic gases within coal mines. If the canary died, miners would know to get out before the gas reached them. A canary deployment is a release of software that’s deployed to a small percentage – referred to as the canary – of the whole userbase. If things run well in your canary, you can then deploy the release to the rest of the userbase. If things don’t run well, at least the impact is much smaller, less disruptive, and you can roll back the software. Canary deployments provide the ability to test actual users, who can provide real feedback, while reducing risk by mitigating impact if problems lead to a better-quality product.
Shift-left and shift-right security
The concepts of shift-left and shift-right also apply to security. First, let’s talk about shift-left.
Ever since DevOps teams started using containers as a way to package applications, and started releasing software at a faster cadence, there has been a desire to automate application security tests and provide test results earlier in the software development lifecycle. By providing test results earlier, software developers can fix security flaws faster and easier. They don‘t have to remember a change they made weeks ago that accidentally introduced a security vulnerability — and unravel everything that has been done since then.
In addition to information being provided earlier, automated release decisions can be done earlier, based on the security test results. This has been the holy grail for DevSecOps — providing more automation and less manual work. The result is better, higher-performing, and more secure software — with less work needed by human beings.
How about shift-right security? That’s important too. After several years of “shifting left”, enterprises are realizing they also need to maintain visibility to the production environment. We’ve seen many successful attacks against Kubernetes environments — from the malicious images that were inserted into Docker hub in 2020, to the attacks against Azure and Tesla by “cryptojackers”. This is why 44% of enterprises say they are planning to adopt new runtime security controls (shift right) over the next 12-24 months.
In a nutshell, here are the reasons why automated security (DevSecOps) can and should shift right into production environments:
Building resiliency with chaos engineering
Chaos engineering is a development discipline that subjects software to failures in a simulated production environment as a way to build resilience into distributed production software systems. This practice builds confidence in software’s ability to withstand unexpected or unlikely circumstances, such as outages, slowdowns, excessive loads, and so on.
Testing the performance of your application under random and extreme circumstances is a helpful exercise to ensure your team delivers durable, reliable, and highly available systems in any given situation. The only way of doing this is in production environments with real users and actual load levels.
Adopting a platform approach for your DevOps value stream
There is no shortage of DevOps tools IT teams can use today to execute different parts of the DevOps lifecycle. But as your DevOps approach matures and you look to scale DevOps across multiple applications, toolchain sprawl becomes a reality. What once worked well becomes manual, cumbersome, costly, and reverts to a siloed approach. Imagine having multiple teams trying to use the same tools each for their own applications.
Standardizing on a platform approach that provides automation, intelligence, and observability on top of the regular DevOps processes helps reduce overhead, reduce toil, and improve scale efficiencies. An all-in-one platform approach creates a single source of truth that tears down silos, integrates toolchains, and enables self-service models. This approach helps automate the entire development pipeline and gives developers and operations teams the right tools and data for every stage of the DevOps cycle — from coding to delivery and back again.
Driving futuristic development today
The goal of any business and technology leader is to make the development of these apps and services easier. These emerging best practices include efficient ways to develop critical applications, code, and services. Further, by leveraging these emerging tools, you can deliver a more proactive and prescriptive development architecture capable of meeting today's digital demands.
Check out other e-books
We offer several premium e-books on aspects of modern observability.Learn more