I’ve used Ansible since 2013 and keep a few of my unique playbooks to today. They have advanced with Ansible from model 1.four to the present model (as of this writing, 2.9).
Along the way in which, as Ansible grew from having dozens to tons of and now hundreds of modules, I’ve discovered so much about how to ensure my playbooks are maintainable and scalable as my techniques develop. Even for easy initiatives (just like the playbook I use to manage my own laptop), it pays dividends to keep away from frequent pitfalls and make selections that may make the longer term you grateful as a substitute of regretful.
The three major takeaways from this expertise are:
- Stay organized
- Test early and infrequently
- Simplify, optimize
The significance of every lesson I’ve discovered follows in that order, too; it is no use making an attempt to optimize one thing (level three) that is already poorly assembled (level 1). Each step builds on the one above, so I am going to information you thru every step.
Stay organized
At a naked minimal, it is best to retailer your Ansible playbooks in a Git repository. This helps with so many issues:
- Once you have got a identified working state, you may commit the work (ideally, with tags marking main variations, like 1.zero.zero for the primary steady model and a couple of.zero.zero for an improve or rewrite).
- You can at all times stroll again adjustments if essential to a earlier known-working state (e.g., through the use of
git reset
orgit checkout <tag>
). - Large-scale adjustments (e.g., characteristic additions or a significant improve) may be labored on in a department, so you may nonetheless keep the present playbook and have enough time to work on main adjustments.
Storing playbooks in Git additionally helps with the second necessary group method: run your playbooks from a construct server.
Whether you employ Ansible Tower, Jenkins, or another construct system, utilizing a central interface for playbook runs offers you consistency and stability—you do not danger having one admin run a playbook a technique (e.g., with the improper model of roles or an outdated checkout) and another person operating it one other approach, breaking your servers.
It additionally helps as a result of it forces you to make sure all of your playbook’s assets are encapsulated within the playbook’s repository and construct configuration. Ideally, the whole construct (together with the job configuration) can be captured within the repository (e.g., by way of the usage of a Jenkinsfile
or its equal).
Another necessary facet to group is documentation; at a naked minimal, I’ve a README in each playbook repository with the next contents:
- The playbook’s objective
- Links to related assets (CI construct standing, exterior documentation, difficulty monitoring, main contacts)
- Instructions for native testing and improvement
Even when you have the playbook automated by way of a construct server, it is very important have thorough and proper documentation for learn how to run the playbook in any other case (e.g., domestically in a take a look at setting). I like to ensure my initiatives are simply approachable—not just for others who would possibly ultimately must work with them but in addition myself! I typically overlook a nuance or dependency when operating a playbook, and the README is the proper place to stipulate any peculiarities.
Finally, the construction of the Ansible duties themselves are necessary, and I like to make sure I’ve a maintainable construction by having small, readable activity information and by extracting associated units of duties into Ansible roles.
Generally, if a person playbook reaches round 100 strains of YAML, I am going to begin breaking it up into separate activity information and utilizing include_tasks
to incorporate these information. If I discover a set of duties that operates independently and might be damaged out into its personal Ansible role, I am going to work on extracting these duties and associated handlers, variables, and templates.
Using roles is the easiest way to supercharge Ansible playbook upkeep; I typically should do comparable duties in lots of (if not most) playbooks, like managing consumer accounts or putting in and configuring an internet server or database. Abstracting these duties into Ansible roles means I can keep one set of duties for use amongst many playbooks, with variables to offer flexibility the place wanted.
Ansible roles may also be contributed again to the neighborhood through Ansible Galaxy in case you’re capable of make them generic and supply the code with an open supply license. I’ve contributed over 100 roles to Galaxy, and they’re made higher by the truth that hundreds of different playbooks (in addition to my very own) depend on them and break if there’s a bug within the position.
One last be aware on roles: If you select to make use of exterior roles (both from Galaxy or a non-public Git repository), I like to recommend committing the position to your repository (as a substitute of including it to a .gitignore
file and downloading the position each time you run your playbook) as a result of I prefer to keep away from counting on downloads from Ansible Galaxy for each playbook run. You ought to nonetheless use a necessities.yml
file to outline position dependencies and outline particular variations for the roles so you may select when to improve your dependencies.
Test early and infrequently
Ansible permits you to outline infrastructure as code. And like every software program, it’s important to have the ability to confirm that the code you write does what you count on.
Like any software program, it is best to take a look at your Ansible playbooks. And after I think about testing for any particular person Ansible challenge I construct, I consider a spectrum of CI testing choices I can use, going so as from the simplest to hardest to implement:
yamllint
ansible-playbook --syntax-check
ansible-lint
- Molecule test (integration checks)
ansible-playbook --check
(testing in opposition to manufacturing)- Building parallel infrastructure
The first three choices (linting and operating a syntax examine in your playbook) are basically free; they run very quick and can assist you keep away from the commonest issues together with your playbook’s activity construction and formatting.
They present some worth, however except the playbook is very simple, I prefer to transcend fundamental linting and run checks utilizing Molecule. I normally use Molecule’s built-in Docker integration to run my playbook in opposition to a neighborhood Docker occasion operating the identical base OS as my manufacturing server. For a few of my roles, which I run on completely different Linux distributions (e.g., CentOS and Debian), I run the Molecule take a look at playbook as soon as for every distro—and generally with additional take a look at situations for extra advanced roles.
If you are excited by studying learn how to take a look at roles with Molecule, I wrote a weblog put up on the subject a few years in the past referred to as Testing your Ansible roles with Molecule. The course of for testing full playbooks is comparable, and in each instances, the checks may be run inside most CI environments (for instance, my geerlingguy.apache position runs a set of Molecule tests via Travis CI).
The last two take a look at choices, operating the playbook in --check
mode or constructing parallel manufacturing infrastructure, require extra setup work and infrequently transcend what’s mandatory for environment friendly testing processes. But in instances the place playbooks handle servers vital to enterprise income, they are often mandatory.
There are a couple of different issues which might be necessary to look at for when operating checks and periodically checking or updating your playbooks:
- Make positive you monitor (and repair) any
DEPRECATION WARNING
s you see in Ansible’s output. Usually, you may have a 12 months or two earlier than the warning results in a failure within the newest Ansible model, so the sooner you may replace your playbook code, the higher. - Every Ansible model has a porting guide) that’s extraordinarily useful whenever you’re updating from one model to the following.
- If you see annoying
WARN
messages in playbook output whenever you’re utilizing a module likecommand
, and you understand you may safely ignore them, you may add awarn: no
below theargs
in a activity. It’s higher to squelch these warnings in order that extra actionable warnings (like deprecation warnings) can be observed at a look.
Finally, I like to ensure my CI environments are at all times operating the newest Ansible launch (and never locked into a particular model that I do know works with my playbooks), as a result of I do know if a playbook will break proper after the brand new launch comes out. My construct server is locked into a particular Ansible model, which can be one or two variations behind the newest model, so this provides me the time to make sure I repair any new points found in CI checks earlier than I improve my construct server to the newest model.
Simplify, optimize
“YAML is not a programming language.”
— Jeff Geerling
Simplicity in your playbooks makes upkeep and future adjustments so much simpler. Sometimes I am going to have a look at a playbook and be puzzled as to what’s occurring as a result of there are a number of when
and till
situations with a bunch of Python blended in with Jinja filters.
If I begin to see multiple or two chained filters or Python technique calls (particularly something having to do with common expressions), I see that as a chief candidate for rewriting the required performance as an Ansible module. The module might be maintained in Python and examined independently and can be simpler to take care of as strictly Python code moderately than mixing in all of the Python inline together with your YAML activity definitions.
So my first level is: Stick to Ansible’s modules and easy activity definitions as a lot as potential. Try to make use of Jinja filters wherever potential, and keep away from chaining multiple or two filters on a variable at a time. If you have got quite a lot of advanced inline Python or Jinja, it is time to think about refactoring it right into a customized Ansible module.
Another frequent factor I see individuals do, particularly when constructing out roles the primary time, is utilizing advanced dict variables the place separate “flat” variables could also be extra versatile.
For instance, as a substitute of getting an apache position with many choices in a single big dict, like this:
apache:
startservers: 2
maxclients: 2
And think about using separate flat variables:
apache_startservers: 2
apache_maxclients: 2
The cause for that is easy: Using flat variables permits playbooks to override one specific worth simply, with out having to redefine the whole dictionary. This is particularly useful when you have got dozens (or in some uncommon instances, tons of) of default variables in a task.
Once the playbook and position code appears to be like good, it is time to begin excited about optimization.
A couple of of the primary issues I have a look at are:
- Can I disable
gather_facts
? Not each playbook wants all of the information, and it provides a little bit of overhead on each run, on each server. - Can I enhance the variety of
forks
Ansible makes use of? The default is 5, but when I’ve 50 servers, can I function on 20 or 25 at a time to vastly scale back the period of time Ansible takes to run a playbook on all of the servers? - In CI, can I parallelize take a look at situations? Instead of operating one take a look at, then the following, if I can begin all of the checks directly, it would make my CI take a look at cycle a lot sooner. If CI is gradual, you may are likely to ignore it or not wait till the take a look at run is full, so it is necessary to ensure your take a look at cycle is brief.
When I am wanting by way of duties in a task or playbook, I additionally search for a couple of blatant efficiency points which might be frequent with sure modules:
- When utilizing
bundle
(orapt
,yum
,dnf
, and so forth.), if there may be multiple bundle being managed, the checklist needs to be handed on to thetitle
parameter and never throughwith_items
or aloop
—this fashion Ansible can effectively function on the entire checklist in a single go as a substitute of doing it bundle by bundle. - When utilizing
copy
, what number of information are being copied? If there’s a single file or perhaps a few dozen, it may be nice, however thecopy
module may be very gradual when you have tons of or hundreds of information to be copied (higher to make use of a module likesynchronize
or a distinct technique like copying a tarball and increasing it on the server). - If utilizing
lineinfile
in a loop, it may be extra environment friendly (and generally simpler to take care of) to make use oftemplate
as a substitute and management the whole file in a single move.
Once I’ve gotten a lot of the low-hanging fruit out of the way in which, I prefer to profile my playbook, and Ansible has some built-in instruments for this. You can configure additional callback plugins to measure position and activity efficiency by setting the callback_whitelist
choice below defaults
in your ansible.cfg
:
[defaults]
callback_whitelist = profile_roles, profile_tasks, timer
Now, whenever you run your playbook, you get a abstract of the slowest roles and duties on the finish:
Monday 10 September 22:31:08 -0500 (zero:00:00.851) zero:01:08.824 ******
===============================================================================
geerlingguy.docker ------------------------------------------------------ 9.65s
geerlingguy.safety ---------------------------------------------------- 9.33s
geerlingguy.nginx ------------------------------------------------------- 6.65s
geerlingguy.firewall ---------------------------------------------------- 5.39s
geerlingguy.munin-node -------------------------------------------------- four.51s
copy -------------------------------------------------------------------- four.34s
geerlingguy.backup ------------------------------------------------------ four.14s
geerlingguy.htpasswd ---------------------------------------------------- four.13s
geerlingguy.ntp --------------------------------------------------------- three.94s
geerlingguy.swap -------------------------------------------------------- 2.71s
template ---------------------------------------------------------------- 2.64s
...
If something takes quite a lot of seconds, it may be good to determine precisely why it is taking so lengthy.
Summary
I hope you discovered a couple of methods you may make your Ansible Playbooks extra maintainable; as I stated to start with, every of the three takeaways (keep organized, take a look at, then simplify and optimize) builds on the earlier, so begin by ensuring you have got clear, documented code, then ensure it is well-tested, and at last have a look at how one can make it even higher and sooner!
This article is a comply with as much as Jeff’s presentation, Make your Ansible playbooks flexible, maintainable, and scalable, at AnsibleFest 2018, which you’ll be able to watch here.