Kubernetes was designed with stateless microservices in mind. But these days, it also comes with support for stateful applications, which is especially handy if you want to migrate your applications gradually. At first glance, StatefulSets are very similar to standard Kubernetes Deployments, but there are some important differences. In this post, you'll learn what StatefulSets actually are and when and how to use them.
What Are Stateful Applications?
Before we start explaining Kubernetes StatefulSets, you need to understand what stateful means and the difference between stateless and stateful applications. When you think about cloud-native applications, you most likely have a picture of an application that can run in multiple copies and where any copy can be restarted at any time while traffic is being redirected effortlessly to other instances.
In order for this model to work, the application needs to get some data from somewhere, execute some functions, and return the data. It can't store the data itself, and it shouldn't be dependent on other pods. If it were, you wouldn't be able to easily kill that instance without risking data loss. But in general, if an application doesn't store data itself in persistent storage and doesn't need to be started together with other microservices in a specific order, then it's stateless.
Stateful vs Stateless
And as you can probably guess, stateful applications are the opposite. They do need to keep some data in order to work. The most common example of a stateful application is a database. The whole point of, for example, MongoDB or MySQL applications is to store data. Therefore, both MongoDB and MySQL are stateful. You can't simply kill the instance of MongoDB and restart it somewhere else and expect it to work.
First of all, by killing it unexpectedly, the data may get corrupted. And second, you can't simply restart MongoDB somewhere else because you need to first somehow reference the same data for it, which usually means either copying data or attaching the same persistent storage to it.
Using persistent data is not the only thing that can make an application stateful. If your microservice doesn't store any data but needs to be started in a specific order with other microservices, then it's also stateful. Or if you can't simply roll out a new version of the application because you also need to follow specific update procedures, then your application is most likely stateful.
Now that you have that clear, let's talk about Kubernetes StatefulSets.
Traditionally, a normal Kubernetes Deployment assumes that your application is stateless. Therefore, Kubernetes may, at any point, just kill one of your instances and redeploy it elsewhere on the cluster when necessary. If your application is stateful, this could easily create an issue. You would either end up with corrupted data or your application could simply crash and require manual intervention.
Therefore, specifically for stateful applications, Kubernetes offers so-called StatefulSets. These are special Kubernetes objects that will create and manage pods for your stateful application. Unlike in a standard Deployment, StatefulSets are aware that your application is stateful and will therefore treat it accordingly.
Stable And Ordered
Kubernetes StatefulSets provide two main advantages (for stateful applications) over Deployments: a stable identity of the pods and the ability to follow specific Deployment orders.
Stable identity means persistent identity in this case. And persistent pod identity means that when a pod gets rescheduled for whatever reason, it will have the same network identifiers and the same storage assigned to it. So, from the perspective of other pods, it will look like the same pod. This is not the case when using Deployments, and it's very important for the proper working of stateful applications.
We already mentioned that if your application needs to be deployed or updated in a specific order, that's a good indication that it's stateful. In a traditional Deployment, if you'll have multiple pods in one Deployment, they would be deployed in a random order, which in the case of stateful application would probably mean that the application won't start properly. And therefore, this ability to follow a specific order when deploying or updating is built into the StatefulSets.
Enough theory. Let's create some StatefulSets. The YAML definition of StatefulSets is very similar to standard Deployments and in a simple example looks like this:
Once you save the above code in a YAML file, you can deploy it, as usual, using kubectl apply:
You can then validate that everything is working with kubectl get:
OK, your first StatefulSet is up and running. Congratulations. This was, however, a very simple example with only one pod in your StatefulSet. But you'll most likely use StatefulSets with multiple pods to get all the benefits.
Let's spice things up a little to see StatefulSets doing its job. Execute the following command to scale your nginx from one to ten replicas:
Now, if you watch what's happening, you'll see the specific behavior of StatefulSets:
You can see that Kubernetes provisioned all replicas in order, one by one. This is one of the differences between Deployments and StatefulSets. In Deployments, all pods will be deployed in random order, with more than one pod being created at once. In StatefulSets, it happens sequentially, and pods are even numbered and do not get a random hash assigned as part of the name, which is the case in Deployments. Moreover, if at any point one of the replicas fails to start, the whole process will stop. So, for example, Kubernetes will only create example-statefulset-5 after example-statefulset-4 is up and running.
Name Stays the Same
Following the same logic, if something happens to any of the pods, it will be recreated with the same name.
This, again, differs from Deployments, where you'd get another randomly named pod. This is important for stateful applications because, most likely, each pod will hold its own state. Therefore, it's crucial not to mix them up. Also, other microservices that would connect to these pods will probably need to always connect to the same pod even if it dies and is rescheduled.
The same applies to networking. You can always connect to a specific pod by its domain name, like example-statefulset-6.nginx.default.svc.cluster.local, and you'll have a guarantee that you'll always reach the same pod. That's not the case with Deployments.
Kubernetes StatefulSets are really useful. In theory, using them means doing something that Kubernetes wasn't designed to work with in the first place. But it's really hard to have every single application on your cluster stateless. Especially in big environments with dozens or even hundreds of applications, there will always be some microservice that needs to hold some state. In some cases, it simply doesn't make sense to spend time and money on redesigning a stateless application to be stateful if it won't bring much difference or business value.
Release is the simplest way to spin up even the most complicated environments. We specialize in taking your complicated application and data and making reproducible environments on-demand.