Rsync for incremental backups
Introduction
Archiving files to tape is still considered one of the cheapest way of making backups. However with the prices of disk storage and solid-state storage decreasing rapidly, it won’t be long before users make the switch to the faster disk storage for all thier backup needs. The problem, however, is that if you want to do anything more than mirroring data on a remote storage, there aren’t too many good freeware tools to do it. This writeup explains one of the interesting ways to do incremental backups with snapshot cabability using a popular tool called rsync.
Traditional Backup
Traditional backup applications not only support backing up and restoring of files, directories, partitions and drives, but also allow incrementally backups to reduce time taken to backup large file systems. Since its not practical to restore the complete data from tape just to apply minor changes to it, most tape backup software store the differences in a seperate file or location on tape which can be used to patch the last full backup image on the tape. This feature of storing incremental updates also allows administrators to maintain multiple versions of data without keeping as many physical copies of data. Incremental updates to backup repository is extremly valuable feature in highly dynamic environments where maintaining multiple snapshot of data taken very frequently is important.
Rsync
Rsync was one of the first tools I used, which allows one to update copies of data by sending incremental updates. This dramatically cuts down the time to update a copy of data. The problem is that though rsync allows you to make a copy of data, and allows you to incrementally update it, it was not designed with tape in mind. It specifically doesn’t allow you to keep the incremental updates in a different directory or file the way backup applications do. This limits the number of snapshots one can maintain using rsync.
The cp command
And that brings us to the last part of the puzzle which we need to know to make do incremental backups. The “cp” command on most unix operating systems allows copying of hard links (instead of the actual data). This feature allows you to maintain two physically different directories on the same partition (with different names) pointing to the same physical set of files. In linux this can be accomplised by the following command “cp -al $sourcedir $targetdir”.
Rsync + cp
Original Directory Structure
List of files ============================== /tmp/test/primary /tmp/test/primary/file1.txt /tmp/test/primary/file2.txt /tmp/test/primary/subdir1/file3.txt ============================== la:/tmp/test # ls -ila primary/* 358315 -rw-r--r-- 1 root root 5 Aug 21 23:06 primary/file1.txt 358316 -rw-r--r-- 1 root root 5 Aug 21 23:07 primary/file2.txt primary/subdir1: total 4 358313 drwxr-xr-x 2 root root 80 Aug 21 23:07 . 358312 drwxr-xr-x 3 root root 136 Aug 21 23:07 .. 358314 -rw-r--r-- 1 root root 5 Aug 21 23:07 file3.txt
After Copy using “cp -rp src target”
la:/tmp/test # cp -rp primary directory1 la:/tmp/test # ls -ila directory1/* 358307 -rw-r--r-- 2 root root 5 Aug 21 23:06 directory1/file1.txt 358308 -rw-r--r-- 2 root root 5 Aug 21 23:07 directory1/file2.txt directory1/subdir1: total 4 358305 drwxr-xr-x 2 root root 80 Aug 21 23:07 . 115666 drwxr-xr-x 3 root root 136 Aug 21 23:07 .. 358306 -rw-r--r-- 2 root root 5 Aug 21 23:07 file3.txt
“cp -la src target”
la:/tmp/test # cp -al directory1 directory2 la:/tmp/test # ls -ila directory2/* 358307 -rw-r--r-- 2 root root 5 Aug 21 23:06 directory2/file1.txt 358308 -rw-r--r-- 2 root root 5 Aug 21 23:07 directory2/file2.txt directory2/subdir1: total 4 358310 drwxr-xr-x 2 root root 80 Aug 21 23:07 . 358309 drwxr-xr-x 3 root root 136 Aug 21 23:07 .. 358306 -rw-r--r-- 2 root root 5 Aug 21 23:07 file3.txt
“directory2″ has now become a “snapshot” of “directory1″ without actually having a duplicate copy of all the data in “directory1″.
Modified file3.txt in primary copy”
la:/tmp/test # rsync -rvgoutl primary/* directory1/ building file list ... done subdir1/ subdir1/file3.txt wrote 161 bytes read 40 bytes 402.00 bytes/sec total size is 21 speedup is 0.10 la:/tmp/test # ls -ila directory1/* 358307 -rw-r--r-- 2 root root 5 Aug 21 23:06 directory1/file1.txt 358308 -rw-r--r-- 2 root root 5 Aug 21 23:07 directory1/file2.txt directory1/subdir1: total 4 358305 drwxr-xr-x 2 root root 80 Aug 21 23:14 . 115666 drwxr-xr-x 3 root root 136 Aug 21 23:07 .. 358322 -rw-r--r-- 1 root root 11 Aug 21 23:14 file3.txt la:/tmp/test # ls -ila directory2/* 358307 -rw-r--r-- 2 root root 5 Aug 21 23:06 directory2/file1.txt 358308 -rw-r--r-- 2 root root 5 Aug 21 23:07 directory2/file2.txt directory2/subdir1: total 4 358310 drwxr-xr-x 2 root root 80 Aug 21 23:07 . 358309 drwxr-xr-x 3 root root 136 Aug 21 23:07 .. 358306 -rw-r--r-- 1 root root 5 Aug 21 23:07 file3.txt