Monday, December 15, 2014

Incremental backup with rsync and cp with low disk demand

I just created very simple incremental backup solution for my Zimbra installation which efficiently uses disk space. The idea is simple. First, using rsync, I'm making a copy of existing Zimbra installation to another disk:
rsync --delete --delete-excluded -a \
        --exclude zimbra/data/amavisd/ \
        --exclude zimbra/data/clamav/ \
        --exclude zimbra/data/tmp \
        --exclude zimbra/data/mailboxd/imap-inactive-session-cache.data \
        --exclude zimbra/log \
        --exclude zimbra/zmstat \
        /opt/zimbra ${DSTDIR}/
I excluded from synchronization some directories that are not necessary for restoring Zimbra. Then, using cp I'm creating copy of this directory but which only consists of hard links to original files, the content isn't copied:
cd ${DSTDIR}
cp -al zimbra zimbra.`date +%Y%m%d%H%M`
Note the option -l that tells cp to hard link files instead of making a new copy. Also, note that the copy created is named so that it contains timestamp when it was created. Here is the content of the directory:
$ ls -l ${DSTDIR}
total 16
drwx------ 7 root   root    4096 Pro  9 15:31 zimbra
drwx------ 7 root   root    4096 Pro  9 15:31 zimbra.201412131551
drwx------ 7 root   root    4096 Pro  9 15:31 zimbra.201412140326
drwx------ 7 root   root    4096 Pro  9 15:31 zimbra.201412150325
Next time rsync runs, it will delete files that don't exist any more, and when it copies changed files it will create a new copy, and then remove the old one. Removing the old one means unlinking which in essence leaves the old version saved in the directory made by cp. This way you'll allocate space only for new and changed files, while the old ones will share disk space.

This system uses only the space it needs. Now, it is interesting to note du's command behavior in case of hard links. Here is an example:
# du -sh zimbra*
132G      zimbra
3.4G      zimbra.201412131551
3.2G      zimbra.201412140326
114M      zimbra.201412150325
# du -sh zimbra.201412131551
132G      zimbra.201412131551
# du -sh zimbra.201412150325
132G      zimbra.201412150325
In the first case it tells us how much space is used by main directory, zimbra, and then it tells us the difference in usage of the other directories, e.g. zimbra is using 132G and zimbra.201412131551 uses 3.4G more/differently. But, when we give specific directory to du command, then it tells us how much this directory is by itself, so we see that all the files in zimbra.201412131551 indeed use 132G.

And that's basically it. These two commands (rsync and cp) are placed in a script with some additional boilerplate code and everything is run from cron.

No comments:

About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)