using amazon s3 for backups


I don’t have a backup system for home (which is where this site, and others are located), and I have generally relied on duplicating enough of my important stuff between friends and other computers. That, and I have a RAID5 setup for my large storage, and then home directories and website stuff is on a RAID1 ZFS volume. This doesn’t prevent accidental “oh-no"s, but it does protect me from some hardware failures.

Last year when I upgraded to the new server, I lost a lot of data because I forgot to backup all of my MySQL databases. I like to think I can learn from my mistakes, so a full year later I finally did something about it and signed up for Amazon’s S3 service.

The pricing is pretty nice, and I don’t have all that much data to backup. I figure, I’ll use up a few GB in total, and keep the monthly price around $1 - $2. That seems worth the price for off-site backup’s.

Now, I have 3 main websites that I need to backup, and one test site that I like to play around with:

After a quick “FreeBSD s3 backup” Google search, I found Gary Dalton’s blog post: After reading this post, I formulated my plan of attack:

  • Sign up for S3, create a “bucket” for each site

  • Use something to interface with S3 ( duplicity )

  • Automate MySQL and PostgreSQL backups

  • Create a service account to run both s3 and db backup scripts as

  • Set up a cron job for backups

So, after I signed up for S3, I had to create the buckets. I couldn’t find a way to do this though my Amazon account settings, so I created a little ruby script.

$ sudo gem install aws-s3
$ vim make-bucket.rb
    require 'aws/s3'
    :access_key_id     => 'my-s3-key-id',
    :secret_access_key => 'my-s3-secret-access-key'
$ ./make-bucket.rb

Next, I had to install duplicity and py-boto

[root@server ~] cd /usr/ports/sysutils/duplicity
[root@server duplicity] make install
[root@server duplicity] cd ../../devel/py-boto
[root@server py-boto] make install clean
[root@server py-boto]

Next step, create a user (with access to shared data, and website data) to run the backups with the adduser command…

[root@server py-boto] adduser -g shared-data -G www -s /bin/tcsh -w random s3backupuser
[roott@server py-boto] su - s3backupuser

In tcsh, you can `set autolist’ to have the shell automatically show all the possible matches when doing filename/directory expansion.

I’ll have to set my Access ID and Access Key in the s3backupuser’s environment, as well as a GnuPG passphrase so the backups are encrypted (and compressed). I mean, I trust Amazon, but not THAT much :)

% vim .cshrc
setenv AWS_ACCESS_KEY_ID my-s3-key-id
setenv AWS_SECRET_ACCESS_KEY my-s3-secrect-access-key
setenv PASSPRASE AVeryRandonPassphraseForGnuPG

Next, I copied the very useful script into a separate script for each website. I could have just dumped every database that was running, but I wanted to segregate each site’s databases into a different directory. So, I’m complicating my cron job by running multiple backup scripts, but I really want to make the end result easily readable and identifiable by me. So for each site, I create a directoy under /u01/backups:

%ll /u01/backups/
total 8
drwxr-x---  5 s3-backupuser  mysql  5 Apr 25 15:46 evil-genius-network
drwxr-x---  5 s3-backupuser  mysql  5 Apr 25 15:47 m87-blackhole
drwxr-x---  5 s3-backupuser  mysql  5 Apr 25 15:46 mywushublog
drwxr-x---  5 s3-backupuser  mysql  5 Apr 25 15:47 willowoak

Next was the script, which is very crude and simple. If I’m really motivated, I’ll make it nicer but I’m lazy and if I don’t need anymore functionality then I’ll just leave it. One thing I initially forgot was that I set my Amazon S3 variables in the users .cshrc profile. This is not a good place to have those things, it was just handy as I was running the duplicity commands manually. So I had to add those in, otherwise the cron job would fail.



# Amazon S3 keys, and GnuPG keys

echo "*************************************************"
echo "*   Backing up Website content....              *"
echo "*                                               *"
echo "*              *"
duplicity /www/ s3+
echo "*                    *"
duplicity /www/ s3+
echo "*                  *"
duplicity /www/ s3+
echo "*************************************************"
echo "*   Backing up databases....                    *"
echo "*                                               *"
echo "*                 *"
duplicity /u01/backups/willowoak s3+
echo "*                    *"
duplicity /u01/backups/mywushublog s3+
echo "*                  *"
duplicity /u01/backups/m87-blackhole s3+
echo "*************************************************"

And last but not least, a cronjob to tie it all together:

% crontab -e
@weekly ~/bin/
@weekly ~/bin/
@weekly ~/bin/
@weekly ~/bin/
@weekly ~/bin/

I can check the status of a backup by running duplicity with the ‘collection-status’ flag:

%duplicity collection-status s3+
date = "full"
Collection Status
Connecting with backend: BotoBackend
Archive dir: None
Found 0 backup chains without signatures.
Found a complete backup chain with matching signature chain:
Chain start time: Sat Apr 25 15:08:02 2009
Chain end time: Sat Apr 25 15:08:02 2009
Number of contained backup sets: 1
Total number of contained volumes: 1
Type of backup set:                            Time:      Num volumes:
Full         Sat Apr 25 15:08:02 2009                 1
No orphaned or incomplete backup sets found.`
I can also list the files:
`%duplicity list-current-files s3+
date = "full"
Sat Apr 25 15:05:11 2009 .
Sat Apr 25 15:05:10 2009 daily
Sat Apr 25 15:05:10 2009 daily/mywushublog
Sat Apr 25 15:05:10 2009 monthly
Sat Apr 25 15:05:10 2009 weekly
Sat Apr 25 15:05:11 2009 weekly/mywushublog
Sat Apr 25 15:05:11 2009 weekly/mywushublog/mywushublog_week.17.2009-04-25_15h05m.sql.gz

Pretty sweet automated backup process. It is a lot cheaper than tapes or additional disk storage. With S3, I also don’t have to worry about buying additional hardware, the maintenance of a library or tape drive (which is what I had a few years ago, what a headache).