Convergence to Kubernetes

Standardisation to Scale

I asked our CTO, as part of preparing this content for a conference presentation, about what he thought was interesting about our use of Kubernetes and he replied:

Teams don’t realise how much they haven’t had to do.

His comment was inspired from having recently read Factfulness: it’s harder to notice smaller but continual improvements and we consequently fail to recognise the progress we’ve made.

A “steady state” [that] is a superposition of ongoing wavefronts of change.

uSwitch is a fine example of such an idea.

Operating software iceberg. Screenshot of my immaculate Google Slides diagram ;)
Use of Low-level AWS Services (EC2, IAM, STS, Autoscaling etc.) over time. Data covers January 2015 to January 2017.
  • We operate and configure our clusters to minimise coordination

Application focused abstractions

At the core of Kubernetes are concepts that map closely to the language used by an application developer. For example, you manage versions of your applications as a Deployment. You can run multiple replicas behind a Service and map that to HTTP via Ingress. And, through Custom Resources, it’s possible to extend and specialise this language to your own needs.

Minimise Necessary Coordination

In the Accelerate book the authors highlight characteristics of loosely-coupled architecture that drives IT performance:

  • All clusters are configured with the same Namespaces. These map approximately 1:1 with teams.
  • We use RBAC to control access to Namespaces. All access is authenticated and authorised against our corporate identity in Active Directory.
  • Clusters are auto-scaled and we do as much as we can to optimise node start-up time. It’s still a couple of minutes but it means that, in general, no coordination is needed even when teams need to run large workloads.
  • Applications auto-scale using application-level metrics exported from Prometheus. Application teams can export Queries per Second, Operations per Second etc. and manage the autoscaling of their application in response to that metric. And, because we use the Cluster autoscaler, nodes will be provisioned if demand exceeds our current cluster capacity.
  • We wrote a Go command-line tool called u that standardises the way teams authenticate to Kubernetes, Vault, request temporary AWS credentials and more.
Authenticating to Kubernetes using u command-line tool
Growth in Namespaces/teams over time
Low-level service use has improved since our convergence to Kubernetes in early 2017
Deploys per person per week. Data covers approximately 1 year to May 2018.

The orthodox view of scaling software development teams states that while adding developers to a team may increase overall productivity, individual developer productivity will in fact decrease

Plotting the same data as above but showing the relationship between people and deployments shows that we’re able to increase our release frequency even as we add more people.

Our ability to release increases as we add people

CTO. Formerly of Forward, ThoughtWorks and more. pingles almost everywhere (GitHub, Twitter etc.)