Optimize Your CI/CD Pipeline
Get instant insights into your CI/CD performance and costs. Reduce build times by up to 45% and save on infrastructure costs.
What is a StatefulSet in Kubernetes?
StatefulSets are Kubernetes objects used to runstateful applications that require both stable network identities and persistent storage. In contrast, deployments treat Pods as fungible units that are usually created or scaled by their controllers, as well as stable, persistent storage.
Think of StatefulSets as a way to managing a group of pods that have to keep their state: databases, message queues, or any application in which instance identity and its data does matter. Every pod of a StatefulSet will get:
- An easy-to-predict name: web-0, web-1, web-2, etc.
- A permanent hostname that doesn't change after rescheduling
- Persistent storage that remains attached to the same pod
- Ordered deployment and scaling operations
This makes StatefulSets perfect for such distributed databases as PostgreSQL, MongoDB, or Elasticsearch, where each instance needs to have its own identity and data.
Steps we'll cover:
- What is a StatefulSet in Kubernetes?
- How to Choose Between StatefulSet vs. DaemonSet vs. Deployment
- How to Create a Redis Cluster Using StatefulSets
- How to Perform Rolling Updates in StatefulSets
- How to Debug Common StatefulSet Issues
- Common StatefulSet Limitations and Best Practices
- Frequently Asked Questions About StatefulSets
How to Choose Between StatefulSet vs. DaemonSet vs. Deployment
The first time that I started using Kubernetes, determining which of these different workload type options to select was thoroughly confusing. Allow me to break down each type in a way that reflects experiences I have faced:
Deployments: Your Go-To for Stateless Applications
Think of Deployments as perfect for applications that needn't remember anything from one restart to another, I use them for:
- Web applications where any pod can serve any request
- API servers that don't store data locally
- Image processing services that process and return results
For example, deploying a Node.js web server, I'd use Deployments because it does not matter which pod is handling the request - they are all identical. If a pod dies, Kubernetes creates a new one with a random name and everything keeps on working.
apiVersion: apps/v1
kind: Deployment
metadata:
name: webserver
spec:
replicas: 3
selector:
matchLabels:
app: webserver
template:
metadata:
labels:
app: webserver
spec:
containers:
- name: nginx
image: nginx:1.25
StatefulSets: When Identity and Data Matter
StatefulSets are like assigned parking spots - each pod gets its own unique, permanent identity. I learned their importance while deploying a PostgreSQL cluster:
- Each database instance had required its own persistent storage.
- Replication needed predictable hostnames for Pods - Scale-up and scale-down had to happen in a specific order
Here is a concrete example: a PostgreSQL cluster, where pod-0 is the primary and pod-1 and pod-2 are replicas. Each needs its own storage, and must maintain its identity even after restarts.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
DaemonSets: One Per Node Operations
DaemonSets are like having a security guard at every building entrance. I use them when I need exactly one pod on every node. Real-world uses include:
- Log collectors like Fluentd that need to run on every node
- Node monitoring agents collecting metrics
- Network plugins that need to configure each node
For example, to collect logs, I will use Fluentd as a DaemonSet to collect the logs of every node:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
spec:
containers:
- name: fluentd
image: fluentd:v1.16
volumeMounts:
- name: varlog
mountPath: /var/log
volumes:
- name: varlog
hostPath:
path: /var/log
Interactive Decision Helper
Still having a hard time choosing the right workload type?
I've created this handy interactive tool based on hundreds of real-world Kubernetes deployments. Answer a few questions about your application, and I'll help you find the right choice:
How to Create a Redis Cluster Using StatefulSets
Set up a Redis cluster with StatefulSets. Here is an example in its entirety:
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config
data:
redis.conf: |
appendonly yes
protected-mode no
cluster-enabled yes
cluster-config-file /data/nodes.conf
cluster-node-timeout 5000
dir /data
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis-cluster
spec:
serviceName: redis-cluster
replicas: 3
selector:
matchLabels:
app: redis-cluster
template:
metadata:
labels:
app: redis-cluster
spec:
containers:
- name: redis
image: redis:6.2
ports:
- containerPort: 6379
name: client
- containerPort: 16379
name: gossip
command: ["redis-server", "/conf/redis.conf"]
volumeMounts:
- name: conf
mountPath: /conf
- name: data
mountPath: /data
volumes:
- name: conf
configMap:
name: redis-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: redis-cluster
spec:
clusterIP: None
selector:
app: redis-cluster
ports:
- port: 6379
targetPort: 6379
name: client
- port: 16379
targetPort: 16379
name: gossip
Let's deploy and verify:
Apply the configuration
kubectl apply -f redis-cluster.yaml
Watch the pods being created
kubectl get pods -l app=redis-cluster -w
Verify the PVCs
kubectl get pvc
How to Perform Rolling Updates in StatefulSets
StatefulSets support rolling updates, which update pods one at a time in reverse order. Here's how it works:
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0 # Optional: Only update pods with ordinal >= partition
When you update the StatefulSet (for example, changing the Redis version):
Update Redis version
kubectl set image statefulset/redis-cluster redis=redis:7.0
Watch the rolling update
kubectl rollout status statefulset/redis-cluster
How to Debug Common StatefulSet Issues
Common problems and how to debug them:
Pod creation stuck
Check StatefulSet and Pod Events
$ kubectl describe statefulset redis-cluster
$ kubectl describe pod redis-cluster-0
These commands help you understand what's happening with your StatefulSet and its pods in detail.
Check PVC Status and Details
$ kubectl get pvc
$ kubectl describe pvc data-redis-cluster-0
Always verify your PVCs are correctly bound before troubleshooting other issues.
Verify DNS Resolution
$ kubectl exec redis-cluster-0 -- nslookup redis-cluster-0.redis-cluster
DNS resolution is crucial for cluster communication - if this fails, your pods can't talk to each other.
Check Service Endpoints
$ kubectl get endpoints redis-cluster
If endpoints aren't showing up, your service might not be selecting the right pods.
Inspect Volume Mounts
$ kubectl describe pod redis-cluster-0 | grep -A 2 Mounts
$ kubectl get pv | grep redis-cluster
Make sure your volumes are properly mounted - this is essential for data persistence.
Scale Down StatefulSet
$ kubectl scale statefulset redis-cluster --replicas=0
Always scale down to 0 first before deletion - it's safer for your data.
Monitor Pod Termination
$ kubectl get pods -w
Wait until all pods are gone before proceeding with deletion.
Delete StatefulSet
$ kubectl delete statefulset redis-cluster
Only delete the StatefulSet after all pods are terminated.
Clean Up PVCs
$ kubectl delete pvc -l app=redis-cluster
Be extra careful with this one - it will permanently delete your data!
Common StatefulSet Limitations and Best Practices
Understanding such limitations is paramount:
Storage Operations
- PVC deletion is not automated
- The storage class must support the requested access mode
- The capability for volume resizing might not be supported.
Pod Identity
- Pod names and HostNames can not be edited
- DNS names are tied to the name of the StatefulSet
- Pod ordinals fixed
Scaling Limitation
- Operations scaling down may be slow
- No metric-based auto-scale
- Manual intervention is essentially required for rebalancing data.
Restrictions Update
- Some fields can't be updated
- Pod template updates affect all pods
- Templates of volume claims are immutable
Frequently Asked Questions About StatefulSets
Can I change a Deployment to a StatefulSet?
No, it's not possible to turn a Deployment directly into StatefulSet, because these are a different type of workload fulfilling different needs. You are supposed to create a different StatefulSet and manage a data migration if that's what your application needs.
Does a StatefulSet automatically create PersistentVolumes?
No, StatefulSets only create PVCs. You need to have a storage provisioner in your cluster that can fulfill these claims by creating PersistentVolumes.
What happens to PVCs when I delete a StatefulSet?
This can be thought of as a feature that prevents accidental data loss in case of deletion. A StatefulSet deletion doesn't automatically delete PVCs. You have to go ahead and delete those PVCs manually if you want the persistent storage to go away.
Can StatefulSets span multiple namespaces?
No, a StatefulSet and all its pods must be in the same namespace. However, you can create identical StatefulSets in different namespaces if needed.
How does StatefulSets handle node failures?
If one node fails, the StatefulSet controller will create a new pod with the same identity on another node. The pod will keep its name and can reattach to its existing PVC if the storage allows it.
Can I use StatefulSets with ReadWriteMany (RWX) volumes?
Yes, StatefulSets can use RWX volumes, but that's less common; most of the use cases of StatefulSet require ReadWriteOnce (RWO) volumes to guarantee data consistency.
How do I backup data in StatefulSets?
You have several options:
- Use volume snapshot, if volume snapshot support is provided by your StorageClass.
- Deploy a sidecar container for the backup in the pod.
- Use native backup tools of your application in question, such as pg_dump for PostgreSQL.
What's the difference between StatefulSet and Deployment with persistent volumes?
While both can use persistent storage, StatefulSets additionally provide:
- Predictable pod names and DNS entries
- Ordered deployment and scaling
- Stable network identities
- Automated PVC management
Conclusion
StatefulSets are the right choice for running stateful applications in Kubernetes, but they're not always the best fit. Use them when you need stable network identities, ordered deployments, and persistent storage. Remember to:
- Always use them with headless services
- Plan your storage needs carefully
- Test scaling operations before production
- Implement proper backup strategies
Success with StatefulSets mainly comes down to understanding what their abilities and their limitations are. Just take some time to design your stateful workloads correctly, and you're on to creating highly scalable applications reliably in Kubernetes.