Virtualization for developers, Part 3

Welcome back to the third and final part in our series on creating dynamic, version-controlled virtual development environments using Vagrant, VirtualBox, and Puppet. In this post we'll focus on the final piece of our puzzle, which is using puppet itself to provision your virtual environment exactly to your specifications.

Introducing Puppet Manifests and Modules

In Part 2 of this series we introduced you to how Vagrant and puppet are tied together through the use of the config.vm.provision key in the Vagrant's Vagrantfile. To refresh your memory that resembled something like this:

config.vm.provision :puppet do |puppet|
puppet.options        = "--verbose --debug"
    puppet.manifests_path = "puppet/manifests"
    puppet.module_path    = "puppet/modules"
    puppet.manifest_file  = "site.pp"
        puppet.facter         = {
  "vagrant"     => true,
    }
end

Moving beyond and into Puppet itself, when Vagrant provisions your newly booted VM, it executes puppet to provide the configuration values in the block above. In this case, our initial entry-point to our puppet manifests is the puppet.manifest_file key, in our case site.pp residing in the puppet.manifests_path key, in this case puppet/manifests. Thus, to get started we will need to create a puppet/ directory in our project along with a puppet/manifests/site.pp file and a puppet/modules directory.

For the purposes of explanation, I am going to reference a PHP skeleton application I created using these concepts. It is freely available on GitHub.

Here's what our bare-bones LAMP stack skeleton app looks like within puppet:

├── manifests
│   ├── nodes
│   │   └── default.pp
│   └── site.pp
└── modules
    ├── app
    │   ├── files
    │   │   ├── config
    │   │   │   ├── development
    │   │   │   │   └── public
    │   │   │   └── ec2
    │   │   │       └── public
    │   │   └── ec2
    │   │       └── aws_assign_ip.sh
    │   ├── manifests
    │   │   ├── codebase.pp
    │   │   ├── database.pp
    │   │   └── webserver.pp
    │   └── templates
    │       └── apache
    │           └── virtualhost
    │               └── vhost.conf.erb
    ├── ec2
    │   ├── files
    │   │   └── ec2-api-tools.zip
    │   ├── manifests
    │   │   └── init.pp
    │   └── templates
    │       └── aws_assign_ip.erb
    └── zendserver
  └── manifests
      ├── apt
      │   └── repo
      │       └── zend.pp
      ├── init.pp
      ├── params.pp
      └── prerequisites.pp

The entry point to our puppet manifests is as previously mentioned the puppet/manifests/site.pp file, but before we get too deep, let's discuss some of the concepts of how puppet actually works first.

The puppet concept

In some ways, Puppet looks like a programming language, and as such it has a tendency to confuse developers who think of puppet manifests in terms of a language. In reality, puppet manifests describe more what has to be done than how it is done. The ultimate result is a tree of dependencies puppet then resolves on the target machine. Confused yet? It takes a little getting used to, but let's try to clear things up.

Puppet manifests contain something puppet calls Types. There are a ton of different types to define various requirements for your machine. An example, the exec type, is used to define a shell command that needs to be executed on the machine. The documentation for this type defines the following as possible input parameters for a definition:

exec { 'resource title':
    command     => # The actual command to execute.  
    creates     => # A file to look for before running 
    cwd         => # The directory from which to run 
    environment => # Any additional environment 
    group       => # The group to run the command as.  
    logoutput   => # Whether to log command output
    onlyif      => # If this parameter is set, then 
    path        => # The search path used for command 
    provider    => # The specific backend to use for 
    refresh     => # How to refresh this command.   
    refreshonly => # The command should only be run...
    returns     => # The expected return code(s). 
    timeout     => # The maximum time the command
    tries       => # The number of times execution
    try_sleep   => # The time to sleep in seconds 
    umask       => # Sets the umask to be used
    unless      => # If this parameter is set, then 
    user        => # The user to run the command as.  
}

All puppet types being used, regardless of where, should first be given a unique resource name to identify them in the dependency tree, and then a series of parameters applicable to the type. For example, the following is taken from puppet/modules/ec2/manifests/init.pp in my skeleton application:

exec { "ec2-api-tools-unzip" :
    command => "/usr/bin/unzip /tmp/ec2-api-tools.zip -d /usr/local",
    creates => "/usr/local/ec2-api-tools-1.6.12.0",
    require => [ File['/tmp/ec2-api-tools.zip'], Package['unzip'], Package['default-jre'] ]
}

This command is used to install Amazon AWS EC2 command line tooling in the virtual machine using the standard unzip shell command. It can be translated into the following description:

"Execute the shell command and identify it as ec2-api-tools-unzip, this command creates a /usr/local/ec2-api-tools-1.6.12.0 directory and thus if this directory exists do not run this command again. Before running this shell command, make sure the /tmp/ec2/api-tools.zip File type has been completed, as well as the unzip and default-jre Package types"

Thus, when puppet processes this particular type it will first make sure it hasn't previously been executed and also ensure it does not get executed until its dependencies listed in the require section have run. It is important to note that this is a key difference between a puppet manifest and a programming language. In puppet, logical location of a given type entry does not have any bearing on the order the types are processed. In our example this exec definition could execute anytime after its requirements have been satisfied regardless of when or where it was actually defined.

Fundamentally, puppet is simply a collection of these type definitions with their dependencies documented. In fact one could, in theory, place all of the necessary types in a single manifest. However, as a matter of a best practice, these types are typically broken down in a more reasonable fashion through the use of classes and modules.

The puppet manifests

Now that we have at least a basic understanding of puppet types and dependencies, let's take a look at the entry point for our puppet manifests, the puppet/manifests/site.pp file:

info("Configuring '${::fqdn}' (${::site_domain}) using environment '${::environment}'")

# Fix for Puppet working with Vagrants
group { 'puppet': ensure => 'present', }

# Setup global PATH variable
Exec { logoutput => true, path => [
    '/usr/local/bin',
    '/opt/local/bin',
    '/usr/bin',
    '/usr/sbin',
    '/bin',
    '/sbin',
    '/usr/local/zend/bin',
], }

import 'nodes/*.pp'

In this manifest we do a few things. We output a little debugging/info, we use the group type to make sure the puppet group exists (to sidestep an issue with some VMs in Vagrant), and we have an Exec type definition before importing additional scripts from puppet/manifests/nodes/*.pp.

This demonstrates another important to understand notion regarding puppet manifests. If you recall, in our original exec type example we used the lower-case version of exec to define the action, where in the site.pp example above we use the Exec upper-cased version. This is not a typo, these two declarations mean different things to puppet.

In the first example, the lower case signifies an actual definition and action to perform. Using the capitalized version of the type, however, is akin to defining default values. In this case, we use Exec to define some default paths for shell executions and set the logoutput configuration value to true by default.

We're not done yet though, let's take a look at the single node manifest puppet/manifests/nodes/default.pp which is included in the site.pp import statement:

node default {
  include apt
  include stdlib
  include git

  case $::environment { 
      development: {
    include app::database
    include app::webserver
    include app::codebase

    sysctl::value { 'vm.overcommit_memory': value => '1' }

  }
  ec2 : {
      include app::codebase
      include app::webserver
      include app::database
      include ec2
  }
    }

    package { 'unzip' :
       ensure => present
    }

    package { 'vim' :
  ensure => present
    }

    package { 'autoconf' :
  ensure => present
    }

    package { 'make' :
  ensure => present
    }

    sysctl::value { 'fs.file-max': value => '100000' }

    exec { "apt-get clean" :
      command => "/usr/bin/apt-get clean"
    }

    exec { "apt-update":
      command => "/usr/bin/apt-get update",
      require => [ Exec['apt-get clean'] ]
    }

}

While it is outside of the scope of this particular series, I will make note of a new construct in this example, the node construct. Puppet is designed to allow you to create a single collection of manifests which represent different types of machines and build configurations. One of the ways the type of build configuration is chosen is by its DNS name or IP address when the script is run, which is defined by the node construct (i.e. node www.example.com instead of node default). For our purposes using the default special node type is sufficient.

Next, you can see an example of one of the few control structures available within manifests which functions very similarly to a switch statement in a language such as PHP. In this case, we are looking at the $::environment variable in the manifest and based on that, deciding what modules and classes we will include in our manifests. Since these manifests are designed to either deploy to a Vagrant VM or to an AWS instance, we do slightly different things for each case, and other things regardless such as ensuring certain basic packages like unzip are installed.

Using puppet modules

Now that we are through the basics of puppet, let's now move on to the construction of classes and modules. You'll notice in our default.pp example above we include a number of classes such as app::webserver. These references tie into the modules defined in the puppet/modules directory, specifically the puppet/modules/app module.

Puppet modules are organized in a standard fashion, and are broken down into the following (basic) structure:

    ├── app
    │   ├── files
    │   │   ├── config
    │   │   │   ├── development
    │   │   │   │   └── public
    │   │   │   └── ec2
    │   │   │       └── public
    │   │   └── ec2
    │   │       └── aws_assign_ip.sh
    │   ├── manifests
    │   │   ├── codebase.pp
    │   │   ├── database.pp
    │   │   └── webserver.pp
    │   └── templates
    │       └── apache
    │           └── virtualhost
    │               └── vhost.conf.erb

At a minimum, a module consists of a manifests/ directory which contains files that represent classes in the module. The init.pp file if it exists maps to a class name the same as the module name, otherwise <modulename>::<classname> maps to <modulename>/manifests/<classname>.pp.

So in our case, if we are loading for example the app::webserver class, we will be looking at the puppet/modules/app/manifests/webserver.pp file which is as follows:

class app::webserver {

    class { 'composer':
  target_dir   => '/usr/local/bin',
  composer_file => 'composer',
    }

    class { 'apache': }

    class { 'zendserver':
  php_version => $::php_version,
  use_ce => false
    }

    file { "/usr/local/bin/pear" :
  target => '/usr/local/zend/bin/pear',
  ensure => 'link',
  require => [ Class['zendserver'] ]
    }

    apache::vhost { $::site_domain :
  docroot  => "/vagrant/public",
  ssl      => true,
  priority => '000',
  env_variables => [
      "APPLICATION_ENV $::environment"
  ],
  require => [ Package['apache'] ]
    }

    exec { "bootstrap-zs-server" :
  command => "/usr/local/zend/bin/zs-manage bootstrap-single-server --acceptEula TRUE -p 'password'; touch /var/local/zs-bootstrapped",
  cwd => "/usr/local/zend/bin/",
  require => [ Class['zendserver'] ],
  creates => "/var/local/zs-bootstrapped"
    }

    file { "/etc/profile.d/server_env.sh" :
  content => "export APPLICATION_ENV=$::environment",
  owner => root,
  group => root,
  mode => 755
    }

    # Disable the default (catch-all) vhost
    exec { "disable default virtual host from ${name}":
  command => "a2dissite default",
  onlyif  => "test -L ${apache::params::config_dir}/sites-enabled/000-default",
  notify  => Service['apache'],
  require => Package['apache'],
    }
}

While we won't go through every single line of code in this manifest, basically the job of this class is to initialize the web server itself for the server. It makes use of a number of third-party modules (discussed in Part 2 of this series) to install and configure Apache, Zend Server, and composer.

Let's take a look at one more small example, specifically the app::codebase class which is used to initialize the code base for the application and can take care of things such as running composer update, etc.

class app::codebase {

      info("Deploying Codebase for environment $environment")

      file { "/vagrant/public/.htaccess" :
       group => "www-data",
       owner => "root",
       mode => 775,
       source => "puppet:///modules/app/config/$::environment/public/.htaccess"
      }

      composer::exec { 'update-codebase' :
      cmd => "update",
      cwd => "/vagrant",
      logoutput => true
      }
}

The reason I wanted to show this specific piece of code was to explain something in the file type used in this class, the source attribute. The source attribute determines where the contents of the file being referenced will be loaded from, which can exist within the puppet module itself in the <module>/files directory. So, in this example, the source of:

puppet:///modules/app/config/$::environment/public/.htaccess

Will reference the puppet/modules/app/files/config/<env>/public/.htaccess file. Any changes made to this file will automatically update the virtual machine's copy every time the machine is provisioned, and its an excellent way to keep configuration files in a version controlled and managed way.

Conclusion

I'll be the first to admit, this is most certainly a crash course on puppet in the context of virtualization for developers. We did not discuss everything in puppet by far, but we did cover the basics. Coupled with the skeleton PHP application you should have a pretty good foundation to start hacking around on your own. Keeping the puppet docs handy is helpful, and I'm happy to answer any questions you have. Throw us a comment below.