💸Save up to $132K/month in CI costs!👉 Try Free
Skip to main content
Understanding Kubernetes Jobs
6 min read

Understanding Kubernetes Jobs

Optimize Your CI/CD Pipeline

Get instant insights into your CI/CD performance and costs. Reduce build times by up to 45% and save on infrastructure costs.

45% Faster Builds
60% Cost Reduction

Introduction

Jobs are fundamental to the world of Kubernetes for running pod-based tasks that require execution to completion. In contrast to other resource controllers, the Jobs have the concern of ensuring specified tasks are executed to successful completion even if they require more than one attempt.

This article covers basic operation, how configuration of such jobs is made, what functions they provide, and best practices for their use.

Steps we will cover in this article:

Understanding Kubernetes Jobs

Kubernetes Jobs are the answer to tasks execution for their successful completion; hence, they play an important part in managing workload efficiency. They 'create one of many Pods and ensure successful execution by adopting the approach of a retry mechanism in case of failures'. Some of the important fields in the definition of a Job include metadata, spec.template, and restartPolicy through which the method of operation is being defined for the Job.

The spec.template defines a pod template that almost mirrors a pod spec - it just lacks the apiVersion and kind fields, essentially. This is an important template, as it specifies the desired state of each executing pod in the lifetime of the Job. Another critical field is restartPolicy, since for Jobs it needs to be either Never or OnFailure, to guarantee there is no accidental restart after the task has completed.

Here is a working YAML configuration for a Kubernetes Job:

apiVersion: batch/v1
kind: Job
metadata:
name: pi-calculation
spec:
template:
spec:
containers:
- name: pi
image: perl:5.34.0
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 3

A Job, in this case compute 'pi' to 2000 decimal places, using a Perl container. In this case, 'backoffLimit' is 3, which means that the system can retry this Job as many as three times in case of failure. This goes quite a long way in understanding the configurations for making full use of Kubernetes Jobs for a range of operational tasks.

Configuring Job Specifications

Kubernetes Jobs have some requirements that need to be specified for them to work properly. Basically, every Kubernetes Job needs to specify some essential fields like apiVersion, kind, and metadata, which are responsible for describing its structure. The spec.template is one of the most important attributes responsible for describing how pods should be populated and processed during a Job lifetime.

The most important setting in any spec for Job is restartPolicy, the objective of which is to define the policy of the pod if something goes wrong. Setting restartPolicy to Never means that if the pod fails, it won’t be restarted; that way, the tracking and management of job completion is clearer.

Here is an example of a YAML configuration:

apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl:5.34.0
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4

In this example,

  • apiVersion defines the version of API applied and is set to batch/v1.
  • kind defines the object type as a Job.
  • metadata contains the name of the Job.
  • spec.template defines the specification to utilize in the creation of pods, executing a Perl command to compute pi.
  • restartPolicy is set to Never, ensuring no automatic restarts occur. backoffLimit allows up to four retries if needed, which improves the reliability of the tasks.

Parallelism and Completions

In Kubernetes, Jobs are configured to manage tasks with concepts of parallelism and completions. Parallelism controls the number of Pods that can run simultaneously, while completions regulate how many successful executions are required in order for the Job to be considered complete. Scaling these settings allows users to optimize resource use and task execution efficiency.

For example, a non-parallel Job may have both .spec.parallelism and .spec.completions unset, in which case only one Pod is created, which counts towards the completion count. Conversely, a Job can be a parallel one where multiple Pods running concurrently can execute the same work item in a quicker fashion.

Here’s how to set up these fields:

apiVersion: batch/v1
kind: Job
metadata:
name: pi-calculation
spec:
parallelism: 5 # Run 5 Pods in parallel
completions: 10 # Successful executions should be 10
template:
spec:
containers:
- name: pi
image: perl:5.34.0
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never

In the example, Job will run up to 5 Pods in parallel, expecting 10 successful calculations. In this setting the tasks finish faster while the jobs are robust: they will retry the Pods if necessary.

Handling Job Failures

Kubernetes Jobs are designed to be 'invincibles', but many times they fail in running tasks. Handling these failures might be important for application reliability. There are two basic mechanisms for managing failures: backoffLimit and podFailurePolicy. The backoffLimit defines how many times the failed Pods are retried before the Job is considered to fail. Default of 6 allows several attempts to succeed, improving resilience to transient errors.

For example, if a Job is created with a backoffLimit of 3:

backoffLimit: 3

By default, this setup allows a Job to make up to three retry attempts for all Pods that fail before the Job is failed.

The podFailurePolicy gives a more fine-grained level of control as to how Kubernetes should handle Pod failures based on exit codes. This policy can specify how a Pod should act based on the outcome of its execution. Here’s a simple policy to understand in YAML:

podFailurePolicy:
rules:
- action: FailJob
onExitCodes:
containerName: main
operator: In
values: [1, 2]

In this case - if 'main' container exits with code 1 or 2, Job will be marked as failed immediately. Together, these mechanisms enable Kubernetes to handle different types of failures gracefully, ensuring applications are always reliable and stable on a cluster.

Conclusions

Kubernetes Jobs are a potent means of running Tasks in a fault-tolerant manner. They guarantee the execution of these tasks to completion by handling retries and failures correctly. Being able to use advanced features, such as indexing and suspension, turns them into a key instrument when orchestrating containerized applications.