My transformation to use Amazon's s3 backup


Why Amazon S3?
Let me start by pointing out the obvious and not so obious thing why I made the move to use Amazon's S3 backup strategy:


Defenetly one of the things that got my attention quickly was the ease of use. Ease of use, in terms of how simple is to use it. I can explote the service by either using the command line programs, or use a GUI interface, or write my own code, if nothing fits me, and yet the best part is that I can write a program in almost any language, since it is all HTTP based.

The Research
As I was doing the research to see what I was going to use, I found that the service was so new that really wasn't a lot of articles about how to setup it up, or a recommendation on the best setup. Now that I am completing my implementation I realize that is so easy, that really just comes down to having a strategy, on your own about your needs, then implementing it.

The Solution
Since I am a java developer, I decided to go all java based. And use jets3t-0.4.0, as it was recomended by http://blog.eberly.org/2006/10/09/how-automate-your-backup-to-amazon-s3-using-s3sync/, in his case he actually went with Ruby, but someone suggested why not just use the sync program that comes with jets3t. So I went ahead and experimented in that area, after all I am a java developer so it would be easier for me to analyze error or modify the code.


Implementation
(PROBLEM #1)
One thing that I noted was that the Ruby implementation provided https and http, I was thinking, how come the java doesn't, at least noone ever talked about it.

So I dig into the code, and I found that there is a few extra properties that you can setup. They are not documented but you can setup https by using: s3service.https-only, set it to true and all your communication with the amazon server, uses https instead of http. There is a list of other properties you can modify at the end of this page.

After that I tried out the tool, and got my hands dirty. With a directory that I didn't care I copy a a few files, updated some files, delete some files, etc..

Once I understand how things worked. It was time to get things working with a cron.job

(PROBLEM #2)
synchronize.sh assumes that you will be executing the program from /opt/jets3t-0.4.0 directory, this is wrong!!. So, I modified the file to include the lines below, right at the begging of the file (obviously after the #!/bin/sh):
user@host:/opt/jets3t-0.4.0$ vi synchronize.sh


# ...

if [ "$1" = "-cp" ]; then
   JETS_HOME=$2
   shift
   shift
fi

if [ "$JETS_HOME" = "" ]; then
   echo Setting default JETS_HOME
   JETS_HOME=./
else
   echo --------------------------------------
   echo JETS_HOME=$JETS_HOME
   echo --------------------------------------
fi

# ...


This would allow me to pass in the a directory to append to the classpath. I then created a file in /etc/cron.daily, filename: s3-backup
user@host:/opt/jets3t-0.4.0$ cd /etc/cron.daily
user@host:/etc/cron.daily$ sudo vi s3-backup
#!/bin/sh

JETS_HOME=/opt/jets3t-0.4.0

$JETS_HOME/synchronize.sh -cp $JETS_HOME UPLOAD /mnt/backup_test
bucket_name_test/backup_test $JETS_HOME/synchronize.properties


Made the file executable:
user@host:/etc/cron.daily$ sudo chmod +x s3-backup


Now I am ready to go. Every night the s3-backup will get run and syncronize my /mnt/backup_test/ with my bucket_name_test using a root directory of backup_test.

After a couple of days of testing, I moved the file to the /etc/cron.monthly/ directory so that it only synchronizes monthly.



Other stuff:
Here is a list of the properties you can set in the jets3t.properties file and its default values

Property
Default value
s3service.https-only false
httpclient.connection-timeout-ms 60000
httpclient.socket-timeout-ms 60000
httpclient.max-connections 10
httpclient.max-connections 10
httpclient.stale-checking-enabled true
httpclient.tcp-no-delay-enabled true
http.protocol.expect-continue true
httpclient.retry-on-errors true