Benjamin Sago / ogham / cairnrefinery / etc…

Technical notes Provision Vagrant machines in parallel

If you’re running a multi-machine Vagrant setup, you’re probably aware of how long it takes for the entire cluster of machines to start up. Vagrant, by default, sets machines up one after another: it launches one virtual machine, provisions it, then launches the next, provisions that

This is a good, simple default mode of operation. But if you’re provisioning these machines with Ansible, a tool that’s deliberately made for running the same commands on hordes of machines at the same time, you might think this is a little inefficient.

Luckily, the Vagrant documentation offers a trick: there’s a way to run Ansible in parallel, against all your machines at once. Here’s what the necessary configuration changes looks like, with the lines to add highlighted in green:

Vagrantfile
MACHINES = %w[machine-the-first machine-the-second machine-the-third].freeze
Vagrant.configure(2) do |config|

  MACHINES.each do |machine_name|
    config.vm.define(machine_name) do |machine|

      # machine-specific config goes here

      if machine_name == MACHINES.last
        machine.vm.provision :ansible do |a|
          a.compatibility_mode = '2.0'
          a.playbook = 'playbook.yml'
          a.limit = 'all'
        end
      end

    end
  end
end

The way this works is a bit of a hack. Instead of simply being able to tell Vagrant to “just run in parallel”, this trick works by disabling provisioning for all machines but the last, and then configuring provisioning for the final machine in a way that runs Ansible against all machines.

This approach comes with a downside, however: you can no longer provision one machine individually. Because Vagrant only “sees” the final machine with a provision block, trying to provision the final machine will provision all of them, and trying to provision another machine will just do nothing.

So how much time does it save?

Benchmarking

I put Vagrant through its paces by measuring the time taken to run vagrant up (the grey bars) followed by vagrant provision, with and without the parallel trick enabled (the coloured bars).

I used Vagrant 2.2.16 and VMware Fusion 12.1.2 to run the tests. You’ll definitely get different results on your own system, as these numbers are affected by all manner of variables, but the general trend should be the same.

Here are the results:

0 m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12 m13 m14 mSerial(8 machines)262 s560 sParallel(8 machines)262 s224 sSerial(7 machines)228 s510 sParallel(7 machines)228 s218 sSerial(6 machines)198 s457 sParallel(6 machines)198 s189 sSerial(5 machines)163 s366 sParallel(5 machines)163 s151 sSerial(4 machines)130 s276 sParallel(4 machines)130 s130 sSerial(3 machines)98 s217 sParallel(3 machines)98 s117 s

The results are pretty conclusive: provisioning in parallel this way provides a significant speedup — far more than I anticipated before running the test — and the more machines you are provisioning in parallel, the more time you save. Vagrant ended up spending more time bringing my machines up than it did provisioning them!

Toggling this on and off

If you do provision individual machines often but the thought of shaving several minutes off of your provisioning time simply sounds too good to pass up, it’s possible to configure the Vagrantfile so you can switch this behaviour off and on again with an environment variable.

The changes look like this:

Vagrantfile
MACHINES = %w[machine-the-first machine-the-second machine-the-third].freeze
Vagrant.configure(2) do |config|

  MACHINES.each do |machine_name|
    config.vm.define(machine_name) do |machine|

      # machine-specific config goes here

      if machine_name == MACHINES.last || ! ENV['ANSIBLE_QUICK']
        machine.vm.provision :ansible do |a|
          a.compatibility_mode = '2.0'
          a.playbook = 'playbook.yml'
          a.limit = 'all' if ENV['ANSIBLE_QUICK']
        end
      end

    end
  end
end

With this in place, you can set the ANSIBLE_QUICK environment variable while provisioning for the first time, taking advantage of parallelism when it’s most useful, while still being able to easily re-provision individual machines later without having to think about it.