configuration management with salt stack

2013-03-26

I’ve often made the remark that the open source community is a fickle crowd. There is always a new fork, or a new hot development team that everyone is clamoring to be involved in.

I strive for stability, but I’m not going to stick with something for historical reasons. If I find a better tool, I’ll spend a good amount of time working with it before I call it production.

For example:

  • UFS -> ZFS
  • Apache -> Nginx
  • PHP -> Ruby/Python
  • Slackware -> RHEL -> Fedora -> Ubuntu…
  • KDE -> Gnome -> Xfce
  • FreeBSD -> FreeBSD
    • If it ain’t broke dont fix it :)

After hearing about Salt, and distributed execution engine over a year ago, my co-worker and I had been keeping close tabs on it. We attended a LSPE meeting where all sorts of folks talked about their own Configuration Management experience. The overall message was “We used Puppet, but we felt Salt had a more accommodating community”.

Another good description is “Salt is a configuration management tool done right the first time”.

Once I switched to pkgng, which salt supported, and we had a handful of new systems to test and deploy with, it quickly became the preferred tool.

My big headaches with Puppet were the following:

  • Out of the box, Puppet did not scale. You had to setup Passenger and Apache to get any reasonable amount of performance
    • I figured this out early, and I never had production scaling issues
  • Ruby made the manifest execution order unpredictable, and the only way to work around that was to have a chain of dependencies, notifies and requries that became difficult to keep track of
  • For a while, you had to be careful about which version of Puppet and mod_passenger you were running (and of course, Ruby), but this had stabilized
  • Some folks in charge of puppet are kind of arrogant, and it shows by how the products are managed. Their buisness plan and how they aggressively push the commercial version of Puppet was a turn off.
  • After 2.5, I was never able to actually use the external node database. They came out with PuppetDB to solve this issue, but it was not an official released tool in FreeBSD’s Ports. I had to pull it down from github and manually build it
    • It was riddled with sloppy linux-isms ( silly things like assuming /bin/sh is actually bash, or not specifying gmake vs. cmake vs. make)
  • There was a disdain for FreeBSD/BSD in general
  • It seemed like every release broke how I used variable lookups. Puppet never really felt stable in that sense, so it was frustrating.

Salt on the other hand:

  • “Batteries Included” mindset. The single py-salt port/package has the master and minion. It works great out of the box and scales very well
    • Using a fast task queing system like ZeroMQ that is built in makes a difference
  • Its FAST. Python is fast, and its parallel execution is noticibly faster than Puppet ever was
    • It also helps that I switched from Portmaster/Portupgrade to our own pkgng environment
  • Execution of states works the same time, everytime.
  • I like how I can have Private data per host or group of hosts (Pillars), static system information (grains) and use those in the State file and the template engine.
    • Probably my killer feature. I LOVE that I can keep bind credentials as a pillar and never let it seep into the state files

The Saltstack team are very receptive to feedback, and I have yet to see Hatch turn down a pull request. Following the mailing list, you constantly see Hatch saying things like “Thats a good idea, lets work on that”. filed a feature request myself and it was well recieved and resolved in a few days.

Basically, there have yet to be any bikeshed’s and I like that. That is probably more because Salt is still new and there are no real ego’s involved yet.

It also helps that there is strong FreeBSD support and there are active patches.

Did I mention it is new?

Yeah, there are some areas that I hope they improve quickly. Logging and validation are two biggies for me. Right now, if you mess up a pillar, your minions will not show ANY pillar data, and they won’t tell you why either. It you muck up a template file, its is difficult to see where exactly.

I really enjoyed Puppets validate flag. I have a Jenkins job that quickly and simply validates the basic syntax of my .pp files. This quickly points out most of the issues we have while we are building new modules.

Lets get down to the real core reason why anyone should switch a critical peice of infrastructure software.

Does this help us deploy systems better and faster? Is there more or less accountibility between the new and old system? How often am I surprized by version changes?

I don’t really know, not yet at least. Rob likes it a lot more than Puppet, so that virtue by itself doubles the productivity of the system.

What I can do, is show two different ways of doing something.

To start off easy, lets do the defacto ssh server, which seems to be the first thing anyone ever does.

The Puppet way:

Layout

environments/production/modules/ssh
├── manifests
│   ├── client.pp
│   ├── init.pp
│   └── server.pp
└── templates
    └── sshd_config.erb

Manifests

class ssh {
    include ssh::server, ssh::client 
}
class ssh::client inherits ssh {
    if $bootstrap::is_public != "yes" {
            file { "/etc/pam.d/sshd":
                owner   => 0,
                group   => 0,
                content => template("ssh/pam/$operatingsystem/sshd.erb"),
            }
    }
}
class ssh::server inherits ssh {
    service { "sshd":
        name    => $operatingsystem ? {
            Ubuntu    => "ssh",
            default   => "sshd",
        },
        enable  => true,
        ensure  => running,
        require => File['/etc/ssh/sshd_config'],
    }

    file { "/etc/ssh/sshd_config":
        ensure  => present,
        owner   => 0,
        group   => 0,
        mode    => 600,
        content => template("ssh/sshd_config.erb")
        notify  => Service['sshd']
    }

    group { "local-allow":
        ensure    => present,
        name  => 'local-allow',
        gid       => '1000',
        provider  => $osfamily ? {
            FreeBSD              => pw,
            /(Redhat|Debianc)/   => groupadd,
        }
    }
}
<% if scope.lookupvar("bootstrap::is_public") == "yes" %>
ListenAddress 0.0.0.0
<% else %>
ListenAddress <%= ipaddress %>
<% end %>

Protocol 2

<% if operatingsystem == "FreeBSD" %>
<% else %>
SyslogFacility AUTHPRIV
<% end %>

KerberosAuthentication yes
KerberosOrLocalPasswd yes
KerberosTicketCleanup yes

GSSAPIAuthentication yes
GSSAPICleanupCredentials yes

UsePAM yes

AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES
AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
AcceptEnv LC_IDENTIFICATION LC_ALL

<% if kernel == "FreeBSD" %>
Subsystem   sftp    /usr/libexec/sftp-server
<% elsif kernel == "Linux" %>
Subsystem        sftp    /usr/libexec/openssh/sftp-server
<% else %>
Subsystem       sftp    /usr/libexec/sftp-server
<% end %>

AllowGroups wheel local-allow <% if scope.lookupvar("bootstrap::is_public") == "yes" %>syncwww<% end %>

Pretty Simple. We manage a service, one that is notified if the sshd_config template is changed.

What is missing here is the openssh-server package that is required on some linux distro’s.

The Salt Way

Layout

basepkgs/
├── init.sls
└── pkg.conf.jinja
groups
├── absent.sls
└── init.sls
ssh
├── absent.sls
├── init.sls
├── keys
│   ├── absent.sls
│   └── init.sls
└── server
    ├── absent.sls
    ├── init.sls
    └── sshd_config.jinja
users
└── init.sls

Right off the bat, you can see I implemented things a bit different with Salt.

The Puppet way, was that each module should be an isolated and independent state. So, a ssh module had to manage the group, keys, users… all of it, because calling facts or variables between classes was not really encouraged (but I did it to a degree anyway. It bit me a few times)

With Salt, the entire collection of states is part of “top”, so I can be pretty granular (pun intended) with the states.

States

ssh_init.sls

{% if grains['kernel'] == 'Linux' %}
openssh-server:
  - pkg:
    - installed
{% endif %}

include:
   - ssh.keys

# SSH keys for users
include:
  - ssh

AAAAB3NzaC1yc2EA___THIS__IS__MY__KEY2_3333123w1:
    ssh_auth:
        - present
        - user: mikec
        - enc: ssh-rsa
include:
  - ssh

sshd:
  service:
    - running
    - enable: True
    - watch:
      - file: /etc/ssh/sshd_config

/etc/ssh/sshd_config:
  file.managed:
    - source: salt://ssh/server/sshd_config.jinja
    - template: jinja

sshd_config.jinja

Protocol 2

KerberosAuthentication yes
KerberosOrLocalPasswd yes
KerberosTicketCleanup yes

GSSAPIAuthentication yes
GSSAPICleanupCredentials yes

UsePAM yes

AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES
AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
AcceptEnv LC_IDENTIFICATION LC_ALL

Subsystem   sftp    /usr/libexec/sftp-server

{% if pillar['allowed_users'] %}AllowUsers {% for user in pillar['allowed_users'] %} {{ user }} {% endfor %}{% endif %}

users_init.sls

home-dirs:
   file.directory:
      {% for user in pillar['allowed_users'] %}
      - name: /home/{{ grains.realm.split('.')[0] }}/{{ user }}
      - user: {{ user }}
      {% endfor %}
      - mode: 700

local-allow:
   group.present:
      - gid: 1000
         - system: True

That last bit right there with the AllowUsers directive is a simple example of using Pillars for unique and private minion information in a state.

The Pillar that makes that possible is this:

Pillars

sshd:
   users:
      - mike
      - rob
      - service_account

Thats a simple SSH module, and you’ll notice my Salt implementation needs more work since I don’t have any special conditions about OS specific settings for sshd.

Perhaps in another post I’ll show what I’ve done for some more complex states ( internal vs dmz systems, ftp with ldap authentication, etc… )