using amazon s3 for backups
I don’t have a backup system for home (which is where this site, and others are located), and I have generally relied on duplicating enough of my important stuff between friends and other computers. That, and I have a RAID5 setup for my large storage, and then home directories and website stuff is on a RAID1 ZFS volume. This doesn’t prevent accidental “oh-no"s, but it does protect me from some hardware failures.
Last year when I upgraded to the new server, I lost a lot of data because I forgot to backup all of my MySQL databases. I like to think I can learn from my mistakes, so a full year later I finally did something about it and signed up for Amazon’s S3 service.
The pricing is pretty nice, and I don’t have all that much data to backup. I figure, I’ll use up a few GB in total, and keep the monthly price around $1 - $2. That seems worth the price for off-site backup’s.
Now, I have 3 main websites that I need to backup, and one test site that I like to play around with:
http://www.m87-blackhole.org/ <- This is the first domain that I owned, and its the site where my family checks out new photos
http://www.willowoakboarding.com/ <- My parent’s site for their boarding ranch. I’m glad they have no concept of a SLA, or that things need to be backed up :)
http://www.mywushublog.com/ <- This site of course, where I claim my own identity on the internet
http://www.evil-genius-network.com/ <- a test domain, but now I run a little OpenID service, For one…
After a quick “FreeBSD s3 backup” Google search, I found Gary Dalton’s blog post: http://dvector.com/oracle/2008/10/18/backing-up-to-amazon-s3/. After reading this post, I formulated my plan of attack:
Sign up for S3, create a “bucket” for each site
Use something to interface with S3 ( duplicity )
Automate MySQL and PostgreSQL backups
Create a service account to run both s3 and db backup scripts as
Set up a cron job for backups
So, after I signed up for S3, I had to create the buckets. I couldn’t find a way to do this though my Amazon account settings, so I created a little ruby script.
$ sudo gem install aws-s3 $ vim make-bucket.rb
#!/usr/local/bin/ruby require 'aws/s3' AWS::S3::Base.establish_connection!( :access_key_id => 'my-s3-key-id', :secret_access_key => 'my-s3-secret-access-key' ) AWS::S3::Bucket.create('mywushublog') AWS::S3::Bucket.create('willowoak') AWS::S3::Bucket.create('m87-blackhole') AWS::S3::Bucket.create('evil-genius-network')
Next, I had to install duplicity and py-boto
[root@server ~] cd /usr/ports/sysutils/duplicity [root@server duplicity] make install ... [root@server duplicity] cd ../../devel/py-boto [root@server py-boto] make install clean ... [root@server py-boto]
Next step, create a user (with access to shared data, and website data) to run the backups with the adduser command…
[root@server py-boto] adduser -g shared-data -G www -s /bin/tcsh -w random s3backupuser ... [roott@server py-boto] su - s3backupuser
In tcsh, you can `set autolist’ to have the shell automatically show all the possible matches when doing filename/directory expansion.
I’ll have to set my Access ID and Access Key in the s3backupuser’s environment, as well as a GnuPG passphrase so the backups are encrypted (and compressed). I mean, I trust Amazon, but not THAT much :)
` % vim .cshrc setenv AWS_ACCESS_KEY_ID my-s3-key-id setenv AWS_SECRET_ACCESS_KEY my-s3-secrect-access-key setenv PASSPRASE AVeryRandonPassphraseForGnuPG
Next, I copied the very useful automysqlbackup.sh script into a separate script for each website. I could have just dumped every database that was running, but I wanted to segregate each site’s databases into a different directory. So, I’m complicating my cron job by running multiple backup scripts, but I really want to make the end result easily readable and identifiable by me. So for each site, I create a directoy under /u01/backups:
%ll /u01/backups/ total 8 drwxr-x--- 5 s3-backupuser mysql 5 Apr 25 15:46 evil-genius-network drwxr-x--- 5 s3-backupuser mysql 5 Apr 25 15:47 m87-blackhole drwxr-x--- 5 s3-backupuser mysql 5 Apr 25 15:46 mywushublog drwxr-x--- 5 s3-backupuser mysql 5 Apr 25 15:47 willowoak
Next was the s3-backups.sh script, which is very crude and simple. If I’m really motivated, I’ll make it nicer but I’m lazy and if I don’t need anymore functionality then I’ll just leave it. One thing I initially forgot was that I set my Amazon S3 variables in the users .cshrc profile. This is not a good place to have those things, it was just handy as I was running the duplicity commands manually. So I had to add those in, otherwise the cron job would fail.
#!/bin/sh PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/home/s3/bin # Amazon S3 keys, and GnuPG keys AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= PASSPHRASE= export AWS_ACCESS_KEY_ID export AWS_SECRET_ACCESS_KEY export PASSPHRASE echo "*************************************************" echo "* Backing up Website content.... *" echo "* *" echo "* www.willowoakboarding.com... *" duplicity /www/www.willowoakboarding.com s3+http://s3.amazon.com/willowoak/www echo "* www.mywushublog.com... *" duplicity /www/www.mywushublog.com s3+http://s3.amazon.com/mywushublog/www echo "* www.m87-blackhole.org... *" duplicity /www/www.m87-blackhole.org s3+http://s3.amazon.com/m87-blackhole/www echo "*************************************************" echo "* Backing up databases.... *" echo "* *" echo "* www.willowoakboard.com... *" duplicity /u01/backups/willowoak s3+http://s3.amazon.com/willowoak/db echo "* www.mywushublog.com... *" duplicity /u01/backups/mywushublog s3+http://s3.amazon.com/mywushublog/db echo "* www.m87-blackhole.org... *" duplicity /u01/backups/m87-blackhole s3+http://s3.amazon.com/m87-blackhole/db echo "*************************************************"
And last but not least, a cronjob to tie it all together:
% crontab -e @weekly ~/bin/s3-backups.sh @weekly ~/bin/mywushublog-mysql-backup.sh @weekly ~/bin/willowoak-mysql-backup.sh @weekly ~/bin/m87-blackhole-mysql-backup.sh @weekly ~/bin/evil-genius-network-mysql-backup.sh
I can check the status of a backup by running duplicity with the ‘collection-status’ flag:
%duplicity collection-status s3+http://s3.amazon.com/mywushublog/db date = "full" Collection Status ----------------- Connecting with backend: BotoBackend Archive dir: None Found 0 backup chains without signatures. Found a complete backup chain with matching signature chain: ------------------------- Chain start time: Sat Apr 25 15:08:02 2009 Chain end time: Sat Apr 25 15:08:02 2009 Number of contained backup sets: 1 Total number of contained volumes: 1 Type of backup set: Time: Num volumes: Full Sat Apr 25 15:08:02 2009 1 ------------------------- No orphaned or incomplete backup sets found.` I can also list the files: `%duplicity list-current-files s3+http://s3.amazon.com/mywushublog/db date = "full" Sat Apr 25 15:05:11 2009 . Sat Apr 25 15:05:10 2009 daily Sat Apr 25 15:05:10 2009 daily/mywushublog Sat Apr 25 15:05:10 2009 monthly Sat Apr 25 15:05:10 2009 weekly Sat Apr 25 15:05:11 2009 weekly/mywushublog Sat Apr 25 15:05:11 2009 weekly/mywushublog/mywushublog_week.17.2009-04-25_15h05m.sql.gz
Pretty sweet automated backup process. It is a lot cheaper than tapes or additional disk storage. With S3, I also don’t have to worry about buying additional hardware, the maintenance of a library or tape drive (which is what I had a few years ago, what a headache).