unified infratructure management with salt

2014-06-16

Preamble

I gave this talk at SaltConf2014, but I felt I still did not convey how killer salt has been at Bay Photo Lab. This post will be sort of the Directors Cut, where I’ll flesh out the slides.

I’ve worked in a few different environments, and oddly, each position change at LLNL had an entire evolution of systems management.

So, I’ve worked for teams that did everything the manual (ie:hard) way, and then I’ve worked with teams that had many specialists for each discipline of IT.

When I came to Bay Photo Lab, it was effectivly me and Rob. We now have two full-time helpdesk employee’s (Mathew and Christina) who have done a fantastic job, but either way, it has put us in the position to automate all platforms with the same tool

Salt.

Now what is amazing about salt, is that it’s feature rich and very powerful on both our opensource platforms and Windows. My standards for quality software on Windows may be a bit low (free tools), so when we first started using

About Me

This blog serves as a much better description of who I am. If I have to distill it down to a cute set of viewgraphs, its this:

  • Unix/Linux Administrator
  • Open source advocate
  • Right tool for the job advocate
skills = {
  'OS':[
    'bsd',
    'linux',
    'solaris',
    'irix',
  ],
  'DB':[
    'postgresql',
    'mysql',
    'couchdb',
  ]
}

Lets call that my preferred baseline of skills. People may sort of chuckle at the Irix and Solaris parts (and, I could technically add HP-UX and AIX on there…), but its sort of the fun spice of my career.

I noted in my talk that I love a diverse environment. I completely understand the draw and attraction for uniformity (single distribution), and it makes complete sense from an efficiency point of view. However, I think I had the most fun in my career (at LLNL) when every day, I got to work on Solaris, IRIX, BSD and Linux. What made that even more interesting was how I could manage and update those platforms!

At LLNL, I had stood up Ubuntu, Solaris and FreeBSD update mirrors. I discovered Blastwave (community built packages for Solaris) and Nekoware (same for Irix), and it relieved (at the time) a very time consuming function of my job, compiling modern software.

Times have completely changed. It ages me to say that I am a “UNIX Administrator”, because no one looks at it that way. I could be called “DevOps”, or “SRE”, but either way I have a rich background in managing a multitude of platforms.

This is only partially related to Salt, so lets get back to the history of my skillset

  • My work at LLNL (in relation to configuration management)
    • Implemented Puppet
    • Used configuration management to implement NNSA security guidelines (SSCL)
    • Had not managed a Windows system in years
dev_ops = {
  'devops':[
    'puppet',
    'splunk',
  ]
}

>>> skills.update(dev_ops)

At LLNL, I had a hard time getting traction with Puppet. It was not well received, it conflicted with what another group was doing. Here was the issue though, that group was adamant about what they would (and mostly, would not support). They only wanted to support Red Hat, so their tools to secure and harden those systems were essentially a script run at build time. It didn’t continually enforce, nor did it report back… these were things Puppet did very well, and it was simple to adapt it to many platforms.

This issue I faced as a UNIX Admin there, was my user base really wanted things like Ubuntu and Fedora. RHEL was really lagging in the area of python and ruby, and when people wanted to run Chrome… well, it just wasn’t feasible. I found myself, and my team members manually compiling software!

So, I gave users what they wanted, and I satisfied the security requirements. Ta-Da!

Eventually, I found a better fit for me. I moved on to Bay Photo Lab, where I never have to justify useful technology or get into business justification for tools.

  • Joined Bay Photo Lab in 2011
  • Inherited new responsibilities
    • Active Directory
    • MSSQL Servers
    • Windows Desktops
required_responsibility = {
  'Microsoft':[
    'Windows Server',
    'XP',
    'MSSQL'
  ]
}

>>> skills.update(required_responsibility)

>>> print skills
{
  'DB': [
    'postgresql', 
    'mysql', 
    'couchdb'
  ],
  'OS': [
    'bsd', 
    'linux', 
    'solaris', 
    'irix'
  ], 
  'devops': [
    'puppet', 
    'splunk'
  ], 
  'Microsoft': [
    'Windows Server', 
    'XP', 
    'MSSQL'
  ]
}

Here I am, in a fast-paced small company with a whole lot of freedom and also, a whole lot more responsibility. I am now no longer a specialist, I am now more of a generalist.

One other point I want to stress, is that I am not a developer, I do however have a few badges on Code Academy.

This joke flew like a proper lead balloon during my talk. I’ll leave the cleverness to the real comedians.

About Bay Photo Lab

todo

Bay Photo Lab is an amazing company. We’re a professional photo lab originally from Santa Cruz CA, but have recently moved to Scotts Valley.

The company has been producing high quality products for professional photo studios for over 35 years.

Internally I like to joke that sometimes Bay Photo succeeds despite itself. Why? Because the photo industry in general is small, and the quality of software is very poor. Most of it seems to be geared towards small print labs where its okay to have one or two desktops (yes, desktops) running the order processing service. Almost none of it is designed to run as a background service. We literally have people watching status bars on desktops they vnc into. So on occasion, as a company, we have to brute force something.

Our Environment

Staff

todo

We’re a mixture of business professionals and production staff. During the holidays, we can have up to 300 production employees.

This is a very different culture than lets say, LLNL or Legato, where a large majority of users are not just computer literate, but highly educated (Masters and PhD levels) and able to communicate clearly.

So on occasion, the production staff creates a solution to a problem that is completely organic in nature and on sometimes horrific :)

I can only blame our department, myself included, because we are not always in tune with the needs of production. We have been much better over the past two years.

IT

Small IT staff:

  • Digital Operations Manager (Patrick)
  • 2 full-time Desktop Support (Mathew and Christina!)
  • 1 full-time SysAdmin (me)
  • .5 Developer + .5 SysAdmin (Rob)

Some days, this feels adequate, others though, not nearly enough.

There are so many projects and aspects of our infrastructure that require dedicated time and attention, and when you’re on the same staff that fights fires on a daily basis, it can be difficult to make progress.

However, we use great tools! Salt not just being one, but I’ve completely fallen in love with Atlassian products. JIRA and Stash are essential tools, it is the only way I can keep track of progress on the long running projects.

This goes back to my point of being a right tool for the job advocate. I love how Open Source software works, and that its essentially “free”. However, if a tool warrants a purchase, I’ll do it. I know I can use things like redmine, but there is no competition for JIRA in my opinion.

What was funny, is during my Q&A, someone asked how I manage those applications :) They can be a little painful, but I have managed to create salt states that handle the setup, init scripts, and databases. Just now downloading the zip file from atlassian.

Desktops

todo

Here is the Breakdown:

  • 200+ Workstations
    • 98 XP, 187 Windows 7
      • Customer Service
      • Color Correction
      • Digital Artwork
    • 2 FreeBSD!
      • IT Staff, duh
    • 3 OSX
      • web team

We do have pockets of XP… the users workstations have been upgraded at this point, but remember above when I said that the photo industry is filled with poorly written software? Yeah, somethings are only “supported” on XP, and we still have to test out a handful of applications before we can migrate them to Windows 7.

And yes, I use FreeBSD as my primary desktop. I have a windows laptop that I’ll use for Google Hangouts or other unsupported applications.

This works out for me, I can focus really well on my work with very little distractions.

Servers

todo

  • 50+ Servers
    • 35 FreeBSD
    • 8 Windows Server
    • 6 Linux
  • Wide Range of purpose
    • Virtualization cluster (Xen on ScientificLinux)
    • Storage (ZFS + Samba)
    • Internal Tools
      • Wiki’s
      • Atlassian tools (stash and JIRA)
      • Log servers + Splunk
      • Internal CA
    • Public facing tools
      • WWW
      • API
      • MailMain
      • FTP
      • LDAP

This is my arena, I don’t handle much desktop support (it has to be a real disaster if I touch someone else’s computer)

Manufacturing Equipment

I can’t say too much here, most of these are specialty production machines that do one product. They rarely have any sort of network capabilies and are not managed by use.

With that, we probably have 50+ of these systems, and sadly, a lot of them use Windows XP as a controller. They also usually use a hardware key (dongle) for authorization, and the tech support is typically a guy that you call.

Why am I mentioning any of this?

Because Bay Photo is a real manufacturing environment, we make stuff. It is literally the main stage attraction, so it take precedent.

It was different when I worked at LLNL or Legato, where the “product” was tangible in a different sense.

History

State of IT in 2011

When I started, Bay Photo had very little in automation. When I interviewed, Rob was specifically looking for someone with Puppet experience.

What we did have was:

  • F.O.G. server for image deployments (we still use it for Window’s desktops and service table systems
  • Windows Group Policy Objects

Software installation was passed down, in the oral tradition. That is the way of our people.

Other aspects like Nagios and Backups were updated manually.

What We Tried

Puppet of course. It made total sense, I had a lot of experience with it.

I’ve gone over why I switch to salt here. The gist was, we felt Salt was faster, cleaner, easier to manage, and had a killer community to work with.

Once we felt that Salt was mature enough, and we had some set of feature parity, we slowly phased Puppet out.

Where We Are Today

Salt is deployed on all platforms, it is part of our base image.

Salt also allowed us to reduce our base FOG images since we can easily deploy the latest Libre Office, Java, and our internal financial client.

We still have components that require further automation (bacula and icinga), but we are getting there.

The entire IT staff is using salt! This is a big feat, its the first time in my career that both Windows and UNIX admins have a standard toolset. We all learn and benefit from each other.

  • Core Desktop Platform:
    • LibreOffice
    • Financial System
    • Java Runtime
    • Department-specific desktop shortcuts

We also use salt daily for:

  • Serivce control
  • File management
  • Grains are very useful for Inventory management

I wrote a custom grain for windows profiles, I don’t even want to discuss the use of virtual store’s on windows, but to push out a particular configuration file, I had to have salt look up each available c:\Users\ or c:\Documents and Settings\ directory:

import os
import sys
import re
import platform


def user_profiles():
    """
    user_profiles grain will list the contents of:
        Unix:
            /home
        Windows:
            7:
                c:\Users\
            XP:
                c:\Documents and Settings\
    """
    grains = {}

    if platform.system() == 'Windows':
        if platform.win32_ver()[0] == 'XP':
            profile_path = 'c:\Documents and Settings'
        elif platform.win32_ver()[0] == '7':
            profile_path = 'c:\Users'
    elif platform.system() == ('FreeBSD' or 'Linux'):
        profile_path = '/home/DISCDRIVE'

    users = os.listdir(profile_path)

    grains['user_profiles'] = users

    return grains

It could be cleaner, but in the end, it allowed me to do this:

include:
  - labworks.reports

labworks:
  pkg.installed:
    - name: labworks

{{ "{% for user in grains.user_profiles  " }}%}
LW_{{"{{ user "}}}}.ini:
  file.managed:
    - name: c:/Users/{{"{{ user "}}}}/AppData/Local/VirtualStore/Windows/LW.ini.test
    - source: salt://lw/LW.ini.jinja
    - template: jinja
    - context:
      lw_server: biz-2

{{ "{% endfor " }}%}

What Problems has Salt Solved?

With Salt at the core of our infrastructure, we now treat all platforms the same, and four team members have Salt at their finger tips. This is great for both sides (us cool open source guys, and then those freedom hating Windows slaves :) ), it provides nice cross-pollination, we all learn a little bit more about the internals.

Salt’s utility abstraction has allowed me to ignore how each distribution and platform manages services, packages/software, and network information.

What else?

Ownership.

This was a really important piece for me, and one I tried to stress during my talk. Salt is so well managed, that I don’t feel terrified that I’ve rolled it out. The community itself has been fantastic to work with.

The conference itself proved this point fantastically. I met some great people, filed some nice feature requests, and updated the bootstrap tool for FreeBSD.

I don’t like to be the owner of technology, I don’t want it to limit what I can do for the business. I want to have have to option of letting another person step up and take on things like Salt, and SaltStack makes that possible.

Fin

Alright, thats enough.

I think it was great that SaltStack let me talk, and have been very supportive in what I’ve done (with my Blog especially).