Kubernetes StatefulSet - Everything you need to know

December 27, 20249 min read

What is a StatefulSet in Kubernetes?

StatefulSets are Kubernetes objects used to runstateful applications that require both stable network identities and persistent storage. In contrast, deployments treat Pods as fungible units that are usually created or scaled by their controllers, as well as stable, persistent storage.

Think of StatefulSets as a way to managing a group of pods that have to keep their state: databases, message queues, or any application in which instance identity and its data does matter. Every pod of a StatefulSet will get:

An easy-to-predict name: web-0, web-1, web-2, etc.
A permanent hostname that doesn't change after rescheduling
Persistent storage that remains attached to the same pod
Ordered deployment and scaling operations

This makes StatefulSets perfect for such distributed databases as PostgreSQL, MongoDB, or Elasticsearch, where each instance needs to have its own identity and data.

Steps we'll cover:

What is a StatefulSet in Kubernetes?
How to Choose Between StatefulSet vs. DaemonSet vs. Deployment
How to Create a Redis Cluster Using StatefulSets
How to Perform Rolling Updates in StatefulSets
How to Debug Common StatefulSet Issues
Common StatefulSet Limitations and Best Practices
Frequently Asked Questions About StatefulSets

How to Choose Between StatefulSet vs. DaemonSet vs. Deployment

The first time that I started using Kubernetes, determining which of these different workload type options to select was thoroughly confusing. Allow me to break down each type in a way that reflects experiences I have faced:

Deployments: Your Go-To for Stateless Applications

Think of Deployments as perfect for applications that needn't remember anything from one restart to another, I use them for:

Web applications where any pod can serve any request
API servers that don't store data locally
Image processing services that process and return results

For example, deploying a Node.js web server, I'd use Deployments because it does not matter which pod is handling the request - they are all identical. If a pod dies, Kubernetes creates a new one with a random name and everything keeps on working.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: webserver
spec:
  replicas: 3
  selector:
    matchLabels:
      app: webserver
  template:
    metadata:
      labels:
        app: webserver
    spec:
      containers:
      - name: nginx
        image: nginx:1.25

StatefulSets: When Identity and Data Matter

StatefulSets are like assigned parking spots - each pod gets its own unique, permanent identity. I learned their importance while deploying a PostgreSQL cluster:

Each database instance had required its own persistent storage.
Replication needed predictable hostnames for Pods - Scale-up and scale-down had to happen in a specific order

Here is a concrete example: a PostgreSQL cluster, where pod-0 is the primary and pod-1 and pod-2 are replicas. Each needs its own storage, and must maintain its identity even after restarts.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi

DaemonSets: One Per Node Operations

DaemonSets are like having a security guard at every building entrance. I use them when I need exactly one pod on every node. Real-world uses include:

Log collectors like Fluentd that need to run on every node
Node monitoring agents collecting metrics
Network plugins that need to configure each node

For example, to collect logs, I will use Fluentd as a DaemonSet to collect the logs of every node:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      containers:
      - name: fluentd
        image: fluentd:v1.16
        volumeMounts:
        - name: varlog
          mountPath: /var/log
      volumes:
      - name: varlog
        hostPath:
          path: /var/log

Interactive Decision Helper

tip

Still having a hard time choosing the right workload type?

I've created this handy interactive tool based on hundreds of real-world Kubernetes deployments. Answer a few questions about your application, and I'll help you find the right choice:

Does your application need to maintain state?

How to Create a Redis Cluster Using StatefulSets

Set up a Redis cluster with StatefulSets. Here is an example in its entirety:

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-config
data:
  redis.conf: |
    appendonly yes
    protected-mode no
    cluster-enabled yes
    cluster-config-file /data/nodes.conf
    cluster-node-timeout 5000
    dir /data
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-cluster
spec:
  serviceName: redis-cluster
  replicas: 3
  selector:
    matchLabels:
      app: redis-cluster
  template:
    metadata:
      labels:
        app: redis-cluster
    spec:
      containers:
      - name: redis
        image: redis:6.2
        ports:
        - containerPort: 6379
          name: client
        - containerPort: 16379
          name: gossip
        command: ["redis-server", "/conf/redis.conf"]
        volumeMounts:
        - name: conf
          mountPath: /conf
        - name: data
          mountPath: /data
      volumes:
      - name: conf
        configMap:
          name: redis-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: redis-cluster
spec:
  clusterIP: None
  selector:
    app: redis-cluster
  ports:
  - port: 6379
    targetPort: 6379
    name: client
  - port: 16379
    targetPort: 16379
    name: gossip

Let's deploy and verify:

Apply the configuration

kubectl apply -f redis-cluster.yaml

Watch the pods being created

kubectl get pods -l app=redis-cluster -w

Get pods

Verify the PVCs

kubectl get pvc

Get pods

How to Perform Rolling Updates in StatefulSets

StatefulSets support rolling updates, which update pods one at a time in reverse order. Here's how it works:

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 0  # Optional: Only update pods with ordinal >= partition

When you update the StatefulSet (for example, changing the Redis version):

Update Redis version

kubectl set image statefulset/redis-cluster redis=redis:7.0

Watch the rolling update

kubectl rollout status statefulset/redis-cluster

rolling

How to Debug Common StatefulSet Issues

Common problems and how to debug them:

Pod creation stuck

Check StatefulSet and Pod Events

$ kubectl describe statefulset redis-cluster
$ kubectl describe pod redis-cluster-0

These commands help you understand what's happening with your StatefulSet and its pods in detail.

Check PVC Status and Details

$ kubectl get pvc
$ kubectl describe pvc data-redis-cluster-0

Always verify your PVCs are correctly bound before troubleshooting other issues.

Verify DNS Resolution

$ kubectl exec redis-cluster-0 -- nslookup redis-cluster-0.redis-cluster

DNS resolution is crucial for cluster communication - if this fails, your pods can't talk to each other.

Check Service Endpoints

$ kubectl get endpoints redis-cluster

If endpoints aren't showing up, your service might not be selecting the right pods.

Inspect Volume Mounts

$ kubectl describe pod redis-cluster-0 | grep -A 2 Mounts 
$ kubectl get pv | grep redis-cluster

Make sure your volumes are properly mounted - this is essential for data persistence.

Scale Down StatefulSet

$ kubectl scale statefulset redis-cluster --replicas=0

Always scale down to 0 first before deletion - it's safer for your data.

Monitor Pod Termination

$ kubectl get pods -w

Wait until all pods are gone before proceeding with deletion.

Delete StatefulSet

$ kubectl delete statefulset redis-cluster

Only delete the StatefulSet after all pods are terminated.

Clean Up PVCs

$ kubectl delete pvc -l app=redis-cluster

Be extra careful with this one - it will permanently delete your data!

Common StatefulSet Limitations and Best Practices

Understanding such limitations is paramount:

Storage Operations

PVC deletion is not automated
The storage class must support the requested access mode
The capability for volume resizing might not be supported.

Pod Identity

Pod names and HostNames can not be edited
DNS names are tied to the name of the StatefulSet
Pod ordinals fixed

Scaling Limitation

Operations scaling down may be slow
No metric-based auto-scale
Manual intervention is essentially required for rebalancing data.

Restrictions Update

Some fields can't be updated
Pod template updates affect all pods
Templates of volume claims are immutable

Frequently Asked Questions About StatefulSets

Can I change a Deployment to a StatefulSet?

No, it's not possible to turn a Deployment directly into StatefulSet, because these are a different type of workload fulfilling different needs. You are supposed to create a different StatefulSet and manage a data migration if that's what your application needs.

Does a StatefulSet automatically create PersistentVolumes?

No, StatefulSets only create PVCs. You need to have a storage provisioner in your cluster that can fulfill these claims by creating PersistentVolumes.

What happens to PVCs when I delete a StatefulSet?

This can be thought of as a feature that prevents accidental data loss in case of deletion. A StatefulSet deletion doesn't automatically delete PVCs. You have to go ahead and delete those PVCs manually if you want the persistent storage to go away.

Can StatefulSets span multiple namespaces?

No, a StatefulSet and all its pods must be in the same namespace. However, you can create identical StatefulSets in different namespaces if needed.

How does StatefulSets handle node failures?

If one node fails, the StatefulSet controller will create a new pod with the same identity on another node. The pod will keep its name and can reattach to its existing PVC if the storage allows it.

Can I use StatefulSets with ReadWriteMany (RWX) volumes?

Yes, StatefulSets can use RWX volumes, but that's less common; most of the use cases of StatefulSet require ReadWriteOnce (RWO) volumes to guarantee data consistency.

How do I backup data in StatefulSets?

You have several options:

Use volume snapshot, if volume snapshot support is provided by your StorageClass.
Deploy a sidecar container for the backup in the pod.
Use native backup tools of your application in question, such as pg_dump for PostgreSQL.

What's the difference between StatefulSet and Deployment with persistent volumes?

While both can use persistent storage, StatefulSets additionally provide:

Predictable pod names and DNS entries
Ordered deployment and scaling
Stable network identities
Automated PVC management

Conclusion

StatefulSets are the right choice for running stateful applications in Kubernetes, but they're not always the best fit. Use them when you need stable network identities, ordered deployments, and persistent storage. Remember to:

Always use them with headless services
Plan your storage needs carefully
Test scaling operations before production
Implement proper backup strategies

Success with StatefulSets mainly comes down to understanding what their abilities and their limitations are. Just take some time to design your stateful workloads correctly, and you're on to creating highly scalable applications reliably in Kubernetes.

How to Copy Files from Pod to Local with kubectl cp

We have provided a step-by-step explanation of the kubectl cp command for copying files and directories across your local system and a Kubernetes pod's container. We will walk through examples of common use cases and troubleshooting errors you may encounter.

October 2, 2024·4 min read

Kubernetes Jobs - Everything you need to know

After managing hundreds of Kubernetes clusters, I'll share my hands-on experience with K8s Jobs.

December 25, 2024·8 min read

Understanding Namespaces in Kubernetes - A Complete Guide for DevOps Engineers

A comprehensive guide to Kubernetes namespaces: Learn how to organize, isolate, and manage resources effectively in your Kubernetes cluster.

January 15, 2025·10 min read

Kubernetes StatefulSet - Everything you need to know

What is a StatefulSet in Kubernetes?​

How to Choose Between StatefulSet vs. DaemonSet vs. Deployment​

Deployments: Your Go-To for Stateless Applications​

StatefulSets: When Identity and Data Matter​

DaemonSets: One Per Node Operations​

Interactive Decision Helper​

Does your application need to maintain state?

How to Create a Redis Cluster Using StatefulSets​

How to Perform Rolling Updates in StatefulSets​

Watch the rolling update

How to Debug Common StatefulSet Issues​

Pod creation stuck​

Check StatefulSet and Pod Events​

Check PVC Status and Details​

Verify DNS Resolution​

Check Service Endpoints​

Inspect Volume Mounts​

Scale Down StatefulSet​

Monitor Pod Termination​

Delete StatefulSet​

Clean Up PVCs​

Common StatefulSet Limitations and Best Practices​

Frequently Asked Questions About StatefulSets​

Can I change a Deployment to a StatefulSet?​

Does a StatefulSet automatically create PersistentVolumes?​

What happens to PVCs when I delete a StatefulSet?​

Can StatefulSets span multiple namespaces?​

How does StatefulSets handle node failures?​

Can I use StatefulSets with ReadWriteMany (RWX) volumes?​

How do I backup data in StatefulSets?​

What's the difference between StatefulSet and Deployment with persistent volumes?​

Conclusion​

Related Posts