Introduction to DevOps
Thinking DevOps
General guidance
User stories: Persona; Action; Benefit
Avoid mono repos
Create a new branch for every issue
Work in small batches
A minimum viable product is an experiment
Consider test-driven development
CI/CD requires automated testing
Consider behaviour-driven development
This ensures that you're building the right thing; TDD ensures that you're building the thing right
Consider Gherkin syntax
Cloud-native microservices
"Stateless" microservices actually have state, with each service maintaining its own database
The microservices can be scaled independently
Failing instances are killed and respawned
Designing for failure
Plan to be throttled
Plan to retry (with exponential backoff)
Cache where appropriate
Circuit-breaker pattern
If failure rates pass a threshold, the circuit breaker is tripped from closed to open and further requests to the service are blocked, preventing further strain
After a timeout, the circuit breaker moves to half open, allowing a small number of requests through
It will then transition to either closed or open depending on whether these requests succeed
Bulkhead pattern
Segment the application into isolated components (bulkheads) so that it remains functional even if a component fails
Can increase complexity
Chaos engineering (or monkey testing)
Deliberately kill services to check for robustness
Working DevOps
Infrastructure as code
Executable text format
Stored in configuration-management systems
Use version control
Treat servers as cattle, not pets (they should all be treated the same)
Allows identical environments to be run in parallel
Infrastructure should be ephemeral
Applications are packaged in containers
Contains dependencies
Limits side effects
Changes are always made to the image, not to running containers
An example was given of Knight Capital, which went bankrupt after failing to update one of eight servers as part of a manual process.
Continuous integration and continuous delivery
Continuous integration
This means continuously building, testing, and merging to master
Work in small batches
Commit regularly
Pull requests should be automatically built and tested
Can be triggered by CI systems that monitor the version-control system for changes
Continuous delivery
This means continuously deploying to a production-like environment
This ensures that the changes could be deployed to production
Continuous deployment
This involves deploying to production rather than just a production-like environment
CI/CD pipeline components
Code repository
Build server
Integration server
Artifact repository (for binaries)
Automatic configuration and deployment
Summary pipeline
Continuous integration
Push to version control
Automated build and testing
Continuous delivery
Release automation (store any artifacts)
Delivery automation (deploy binaries to a given environment)
Continuous deployment
Production automation (promote deployment to production)
Feature flags can decouple deployment from activation
Organizing for DevOps
Agile teams should be cross-functional, self-organizing, and organized around business domains
Teammates, not tickets
Conway's law: Complex systems tend to become shaped like the organizational (communication) structures from which they emerge
For example, if a software project involves multiple teams, it may end up with a module per team
There shouldn't be a separate DevOps team (in the same way you wouldn't have a specific Agile team); the whole point is to get rid of silos
Introducing a separate QA team can reduce quality as developers no longer feel that it is their responsibility to check that the code works; it's important not to separate people from the consequences of their actions
The course also recommends keeping teams small, but the range given was 5-10 people, which is the same range given in Leading teams, which recommended large teams, so "small" and "large" are clearly ambiguous.
Measuring DevOps
DevOps metrics
Metrics should emphasize the social aspect (are other people using your code?) rather than the competitive aspect
Prefer mean time to recovery (MTTR) to mean time to failure (MTTF) — the end user sees uptime, not whether any particular container is still running
Actionable metrics
Time to market
Overall availability
Time to deploy
Proportion of defects detected before production
Efficient use of infrastructure
Timeliness of performance feedback
These contrast with vanity metrics (e.g., number of website hits) where it isn't clear what action should be taken
DevOps vs. Site Reliability Engineering
SRE is "what happens when a software engineer is tasked with what used to be called operations"
Tenets of SRE
Only hire software engineers
SRE teams are separate from development teams
Development teams can deploy straight to production provided that error rates are within the error budget
Developers rotate through operations
SRE maintains the infrastructure; DevOps uses the infrastructure
Case studies
The course concluded with a set of quizzes that involved applying the ideas discussed to a set of case studies.
Last updated