💸Save up to $132K/month in CI costs!👉 Try Free
Skip to main content
Ansible Roles - Best Practices
8 min read

Ansible Roles - Best Practices

Optimize Your CI/CD Pipeline

Get instant insights into your CI/CD performance and costs. Reduce build times by up to 45% and save on infrastructure costs.

45% Faster Builds
60% Cost Reduction

This article was updated on December 26, 2024, to include advanced debugging tips, real-world examples, and best practices for creating and managing Ansible Roles, based on my latest experiences with production-grade automation setups.

What are Ansible Roles?

They're really just a standardized method of organizing reusable automation tasks. You can think of a role as a sort of complete package that includes all the files, variables, and tasks necessary to achieve a certain objective - say, installing nginx.

Let me tell you a story from my early days when I was a DevOps engineer. I worked for an e-commerce company and was tasked with deploying the same nginx configuration across more than 50 servers. The traditional approach would be to SSH into every server, copy config files, restart services. You get the idea: very time-consuming and prone to human error.

That's when I started looking into Ansible Roles.

Here is how I explain Ansible Roles to my junior teammates:

  • Ansible Roles may be thought of as LEGO Standard sets
  • Each role is like a particular LEGO piece.
  • You combine such pieces, called roles, to form complex structures usually called playbooks.
  • Best part? These are pieces you can reuse over and over

Let me give you a real world example. When I create an nginx role it typically contains:

  • Installing nginx
  • Setting up configuration files
  • Managing SSL certificates
  • Security hardening
  • Starting and monitoring the service

Once I write this role, I can use it on any number of servers at any time I need. And when I need to make changes? I just update the role once, and it automatically updates across all servers. This approach has saved me from hours of work and avoided quite a few midnight incidents.

Steps we'll go through:

Introduction

TL;DR

After having repeatedly broken production environments and developing insomnia over debugging Ansible playbooks, I learned that proper structured roles aren't just a "nice thing," they are a MUST to save your sanity in modern DevOps.

Key takeaways from my experience:

  • Small, focused roles are better than large monolithic ones.
  • Role naming and variable conventions matter more than you think
  • Testing Roles Before Production Not Optional - Learned it the hard way.

The Story Behind This Post 🕶️

It was 3 AM on a Saturday when I received the page-our e-commerce platform was completely down. The offender? A "small" change I had made in our Ansible roles controlling the Nginx configs. That night taught me more about Ansible roles than any tutorial ever has.

After a decade of using Ansible, breaking stuff, fixing stuff, and occasionally getting it right, I'll share what I learned - not from books or documentation but real-world experience.

How I Structure My Ansible Roles (And Why)

Click to zoom

Let me illustrate with a real example, which is from my current project, managing more than 200+ servers:

roles/
nginx/
tasks/
main.yml # I keep core tasks here
ssl.yml # SSL stuff (learned to separate this after a cert mishap)
security.yml # Hardening configs (added after a security incident)
handlers/
main.yml # Restart handlers (be careful with these!)
defaults/
main.yml # Default vars (document these well!)
vars/
main.yml # Environment-specific vars
templates/
nginx.conf.j2 # The template that caused the 3 AM incident
files/
ssl-cert.pem # Keep these secure!
meta/
main.yml # Dependencies matter more than you think
Need Help Choosing a Role Strategy?

Having guided many different teams to set up their Ansible roles, I then released a web-based interactive tool which asks most of the following questions I've learned in consulting:

What is your team size?

How complex is your environment?

How much code reuse do you need?

The Nginx Role That Took Down Production Remember that 3 AM incident I mentioned? Here's the role that caused it, and how I fixed it:

# What I had before (don't do this):
- name: Configure nginx
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify: restart nginx

# What I learned to do instead:
- name: Backup existing nginx config
command: cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.backup
args:
creates: /etc/nginx/nginx.conf.backup

- name: Configure nginx
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
validate: nginx -t -c %s # This saved me many times later
notify: restart nginx

- name: Verify nginx is running
uri:
url: http://localhost
return_content: yes
register: nginx_check
failed_when: "'Welcome' not in nginx_check.content"

Lessons Learned the Hard Way

1. Role Names Matter More Than You Think

I used to name roles like this (and regret it):

# Please don't do this (like I did)
roles/setup-stuff
roles/configure-things
roles/my-nginx-role

# Do this instead (learned after much pain)
roles/nginx_config # Clear purpose
roles/mysql_install # Easy to find
roles/redis_cluster # Self-documenting

2. Variable Management: A Story of Conflict

Last month, we had a production issue because two roles used the same variable name. Here's how to avoid that:

# This caused conflicts (from my early days)
vars:
port: 80
user: www-data

# This saved us later
vars:
nginx_port: 80
nginx_user: www-data
nginx_worker_processes: "{{ ansible_processor_vcpus }}"

3. Dependencies - The Hidden Gotcha

One of our roles silently failed because it needed another role which wasn't listed in dependencies. Now I always do:

# meta/main.yml
dependencies:
- role: common_base
vars:
base_packages: ['curl', 'vim']
- role: security_baseline
vars:
security_level: high

Debugging Tips From the Trenches

When things go wrong (and they will), here's what I check first:

# Add this to your playbook for debugging
- hosts: webservers
roles:
- role: nginx
vars:
ansible_verbosity: 2

I also utilize the inbuilt Ansible callback plugins and logging to check execution time for roles. It really helped me to spot a number of performance bottlenecks, such as the role making superfluous API calls, which would take 15 minutes instead of 2.

Creating Ansible Roles Step-by-Step

Well, now let me walk you through building an Ansible role through a real example. Imagine in our case we are going to create a simple role called cicube_nginx to configure and secure Nginx:.

Step 1: Create the Role Structure

First, create the directory structure of the role. This way, the role will be much organized and easier to maintain.

ansible-galaxy init cicube_nginx
Ansible Role Structure

Step 2: Define the Role Tasks

Now, add the tasks to install and configure Nginx. Open the tasks/main.yml file and add the following:

# tasks/main.yml

- name: Install Nginx
apt:
name: nginx
state: present
update_cache: yes

- name: Configure Nginx
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
validate: nginx -t -c %s
notify: Restart Nginx

Step 3: Create a Template for Nginx Configuration

Create the nginx.conf.j2 file inside the templates directory. This will serve as your Nginx configuration template

server {
listen 80;
server_name {{ ansible_hostname }};
root /var/www/html;

location / {
index index.html;
}
}

Step 4: Define Handlers in handlers/main.yml

Handlers are used to perform actions once they have been notified by tasks. Here is how you could define a handler to restart Nginx:

# handlers/main.yml

- name: Restart Nginx
service:
name: nginx
state: restarted

Step 5: Add Variables in defaults/main.yml

Define default variables for your role in defaults/main.yml:

# defaults/main.yml
---
nginx_port: 80
nginx_server_name: "{{ ansible_hostname }}"

Step 6: Test the Role

To test your role, run the following command:

# playbook.yml
---
- name: Test cicube_nginx Role
hosts: webservers
become: yes
roles:
- role: cicube_nginx
ansible-playbook -i inventory playbook.yml

Sample Terminal Output

Here's how the output would look like when we execute the playbook:

PLAY [Test cicube_nginx Role] **************************************************

TASK [Gathering Facts] *********************************************************
ok: [webserver1]
ok: [webserver2]

TASK [cicube_nginx : Install Nginx] ********************************************
changed: [webserver1]
changed: [webserver2]

TASK [cicube_nginx : Configure Nginx] ******************************************
changed: [webserver1]
changed: [webserver2]

RUNNING HANDLER [cicube_nginx : Restart Nginx] *********************************
changed: [webserver1]
changed: [webserver2]

PLAY RECAP *********************************************************************
webserver1 : ok=3 changed=2 unreachable=0 failed=0
webserver2 : ok=3 changed=2 unreachable=0 failed=0

Conclusion

So after many years working with Ansible roles and countless production incidents, and several sleepless nights later, I would say one thing: take the time to structure your roles. It might look like more work now, but I promise your future self-and your team-appreciates it.

Remember:

  • Start small and iterate
  • Naming things clearly
  • Test prior to production (seriously)
  • Document your variables - Keep your roles focused

Above all, learn from your mistakes; I certainly did from mine.