using amazon s3 for backups

 2009-04-25

I don’t have a backup system for home (which is where this site, and others are located), and I have generally relied on duplicating enough of my important stuff between friends and other computers. That, and I have a RAID5 setup for my large storage, and then home directories and website stuff is on a RAID1 ZFS volume. This doesn’t prevent accidental “oh-no"s, but it does protect me from some hardware failures.

Last year when I upgraded to the new server, I lost a lot of data because I forgot to backup all of my MySQL databases. I like to think I can learn from my mistakes, so a full year later I finally did something about it and signed up for Amazon’s S3 service.

The pricing is pretty nice, and I don’t have all that much data to backup. I figure, I’ll use up a few GB in total, and keep the monthly price around $1 - $2. That seems worth the price for off-site backup’s.

Now, I have 3 main websites that I need to backup, and one test site that I like to play around with:

http://www.m87-blackhole.org/ <- This is the first domain that I owned, and its the site where my family checks out new photos
http://www.willowoakboarding.com/ <- My parent’s site for their boarding ranch. I’m glad they have no concept of a SLA, or that things need to be backed up :)
http://www.mywushublog.com/ <- This site of course, where I claim my own identity on the internet
http://www.evil-genius-network.com/ <- a test domain, but now I run a little OpenID service, For one…

After a quick “FreeBSD s3 backup” Google search, I found Gary Dalton’s blog post: http://dvector.com/oracle/2008/10/18/backing-up-to-amazon-s3/. After reading this post, I formulated my plan of attack:

Sign up for S3, create a “bucket” for each site
Use something to interface with S3 ( duplicity )
Automate MySQL and PostgreSQL backups
Create a service account to run both s3 and db backup scripts as
Set up a cron job for backups

So, after I signed up for S3, I had to create the buckets. I couldn’t find a way to do this though my Amazon account settings, so I created a little ruby script.

$ sudo gem install aws-s3
$ vim make-bucket.rb

 
    #!/usr/local/bin/ruby
    
    require 'aws/s3'
    
    AWS::S3::Base.establish_connection!(
    :access_key_id     => 'my-s3-key-id',
    :secret_access_key => 'my-s3-secret-access-key'
    )
    AWS::S3::Bucket.create('mywushublog')
    AWS::S3::Bucket.create('willowoak')
    AWS::S3::Bucket.create('m87-blackhole')
    AWS::S3::Bucket.create('evil-genius-network')

$ ./make-bucket.rb

Next, I had to install duplicity and py-boto

[root@server ~] cd /usr/ports/sysutils/duplicity
[root@server duplicity] make install
...
[root@server duplicity] cd ../../devel/py-boto
[root@server py-boto] make install clean
...
[root@server py-boto]

Next step, create a user (with access to shared data, and website data) to run the backups with the adduser command…

[root@server py-boto] adduser -g shared-data -G www -s /bin/tcsh -w random s3backupuser
...
[roott@server py-boto] su - s3backupuser

In tcsh, you can `set autolist’ to have the shell automatically show all the possible matches when doing filename/directory expansion.

I’ll have to set my Access ID and Access Key in the s3backupuser’s environment, as well as a GnuPG passphrase so the backups are encrypted (and compressed). I mean, I trust Amazon, but not THAT much :)

`
% vim .cshrc
setenv AWS_ACCESS_KEY_ID my-s3-key-id
setenv AWS_SECRET_ACCESS_KEY my-s3-secrect-access-key
setenv PASSPRASE AVeryRandonPassphraseForGnuPG

Next, I copied the very useful automysqlbackup.sh script into a separate script for each website. I could have just dumped every database that was running, but I wanted to segregate each site’s databases into a different directory. So, I’m complicating my cron job by running multiple backup scripts, but I really want to make the end result easily readable and identifiable by me. So for each site, I create a directoy under /u01/backups:

%ll /u01/backups/
total 8
drwxr-x---  5 s3-backupuser  mysql  5 Apr 25 15:46 evil-genius-network
drwxr-x---  5 s3-backupuser  mysql  5 Apr 25 15:47 m87-blackhole
drwxr-x---  5 s3-backupuser  mysql  5 Apr 25 15:46 mywushublog
drwxr-x---  5 s3-backupuser  mysql  5 Apr 25 15:47 willowoak

Next was the s3-backups.sh script, which is very crude and simple. If I’m really motivated, I’ll make it nicer but I’m lazy and if I don’t need anymore functionality then I’ll just leave it. One thing I initially forgot was that I set my Amazon S3 variables in the users .cshrc profile. This is not a good place to have those things, it was just handy as I was running the duplicity commands manually. So I had to add those in, otherwise the cron job would fail.

~/bin/s3-backups.sh:

 
#!/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/home/s3/bin

# Amazon S3 keys, and GnuPG keys
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
PASSPHRASE=
export AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY
export PASSPHRASE

echo "*************************************************"
echo "*   Backing up Website content....              *"
echo "*                                               *"
echo "*     www.willowoakboarding.com...              *"
duplicity /www/www.willowoakboarding.com s3+http://s3.amazon.com/willowoak/www
echo "*     www.mywushublog.com...                    *"
duplicity /www/www.mywushublog.com s3+http://s3.amazon.com/mywushublog/www
echo "*     www.m87-blackhole.org...                  *"
duplicity /www/www.m87-blackhole.org s3+http://s3.amazon.com/m87-blackhole/www
echo "*************************************************"
echo "*   Backing up databases....                    *"
echo "*                                               *"
echo "*     www.willowoakboard.com...                 *"
duplicity /u01/backups/willowoak s3+http://s3.amazon.com/willowoak/db
echo "*     www.mywushublog.com...                    *"
duplicity /u01/backups/mywushublog s3+http://s3.amazon.com/mywushublog/db
echo "*     www.m87-blackhole.org...                  *"
duplicity /u01/backups/m87-blackhole s3+http://s3.amazon.com/m87-blackhole/db
echo "*************************************************"

And last but not least, a cronjob to tie it all together:

% crontab -e
@weekly ~/bin/s3-backups.sh
@weekly ~/bin/mywushublog-mysql-backup.sh
@weekly ~/bin/willowoak-mysql-backup.sh
@weekly ~/bin/m87-blackhole-mysql-backup.sh
@weekly ~/bin/evil-genius-network-mysql-backup.sh

I can check the status of a backup by running duplicity with the ‘collection-status’ flag:

%duplicity collection-status s3+http://s3.amazon.com/mywushublog/db
date = "full"
Collection Status
-----------------
Connecting with backend: BotoBackend
Archive dir: None
Found 0 backup chains without signatures.
Found a complete backup chain with matching signature chain:
-------------------------
Chain start time: Sat Apr 25 15:08:02 2009
Chain end time: Sat Apr 25 15:08:02 2009
Number of contained backup sets: 1
Total number of contained volumes: 1
Type of backup set:                            Time:      Num volumes:
Full         Sat Apr 25 15:08:02 2009                 1
-------------------------
No orphaned or incomplete backup sets found.`
I can also list the files:
`%duplicity list-current-files s3+http://s3.amazon.com/mywushublog/db
date = "full"
Sat Apr 25 15:05:11 2009 .
Sat Apr 25 15:05:10 2009 daily
Sat Apr 25 15:05:10 2009 daily/mywushublog
Sat Apr 25 15:05:10 2009 monthly
Sat Apr 25 15:05:10 2009 weekly
Sat Apr 25 15:05:11 2009 weekly/mywushublog
Sat Apr 25 15:05:11 2009 weekly/mywushublog/mywushublog_week.17.2009-04-25_15h05m.sql.gz

Pretty sweet automated backup process. It is a lot cheaper than tapes or additional disk storage. With S3, I also don’t have to worry about buying additional hardware, the maintenance of a library or tape drive (which is what I had a few years ago, what a headache).