August 26, 2005

Rsync for incremental backups


Introduction


Archiving files to tape is still considered one of the cheapest way of making backups. However with the prices of disk storage and solid-state storage decreasing rapidly, it won't be long before users make the switch to the faster disk storage for all thier backup needs. The problem, however, is that if you want to do anything more than mirroring data on a remote storage, there aren't too many good freeware tools to do it. This writeup explains one of the interesting ways to do incremental backups with snapshot cabability using a popular tool called rsync.


Traditional Backup


Traditional backup applications not only support backing up and restoring of files, directories, partitions and drives, but also allow incrementally backups to reduce time taken to backup large file systems. Since its not practical to restore the complete data from tape just to apply minor changes to it, most tape backup software store the differences in a seperate file or location on tape which can be used to patch the last full backup image on the tape. This feature of storing incremental updates also allows administrators to maintain multiple versions of data without keeping as many physical copies of data. Incremental updates to backup repository is extremly valuable feature in highly dynamic environments where maintaining multiple snapshot of data taken very frequently is important.


Rsync


Rsync was one of the first tools I used, which allows one to update copies of data by sending incremental updates. This dramatically cuts down the time to update a copy of data. The problem is that though rsync allows you to make a copy of data, and allows you to incrementally update it, it was not designed with tape in mind. It specifically doesn't allow you to keep the incremental updates in a different directory or file the way backup applications do. This limits the number of snapshots one can maintain using rsync.


The cp command


And that brings us to the last part of the puzzle which we need to know to make do incremental backups. The "cp" command on most unix operating systems allows copying of hard links (instead of the actual data). This feature allows you to maintain two physically different directories on the same partition (with different names) pointing to the same physical set of files. In linux this can be accomplised by the following command "cp -al $sourcedir $targetdir".


Rsync + cp


To demonstrate how these two can work together to provide us snapshot capability I did some tests on my linux box. The first step was to create a small directory structure which we would be using for this exercise. "ls -ila" on the directory shows the actual "inodes" (column 1) assigned to each of these files and directories within the test directory I created.


Original Directory Structure



    List of files
    ==============================
    /tmp/test/primary
    /tmp/test/primary/file1.txt
    /tmp/test/primary/file2.txt
    /tmp/test/primary/subdir1/file3.txt
    ==============================

    la:/tmp/test # ls -ila primary/*
    358315 -rw-r--r-- 1 root root 5 Aug 21 23:06 primary/file1.txt
    358316 -rw-r--r-- 1 root root 5 Aug 21 23:07 primary/file2.txt
    primary/subdir1:
    total 4
    358313 drwxr-xr-x 2 root root 80 Aug 21 23:07 .
    358312 drwxr-xr-x 3 root root 136 Aug 21 23:07 ..
    358314 -rw-r--r-- 1 root root 5 Aug 21 23:07 file3.txt



Next step is to do a traditional recursive copy from "primary" to "directory1". You can accomplish this by either "cp" or "rsync". In this following example I used "cp" command. Notice that when I list the inodes after the cp commands, it creates a new set of inodes for each of the files and directory in the new directory structure. This means that the traditional "cp" command did an actual recursive copy of file contents to new locations, and that there exists two identical copies of each of the objects.


After Copy using "cp -rp src target"



    la:/tmp/test # cp -rp primary directory1
    la:/tmp/test # ls -ila directory1/*
    358307 -rw-r--r-- 2 root root 5 Aug 21 23:06 directory1/file1.txt
    358308 -rw-r--r-- 2 root root 5 Aug 21 23:07 directory1/file2.txt
    directory1/subdir1:
    total 4
    358305 drwxr-xr-x 2 root root 80 Aug 21 23:07 .
    115666 drwxr-xr-x 3 root root 136 Aug 21 23:07 ..
    358306 -rw-r--r-- 2 root root 5 Aug 21 23:07 file3.txt



Now lets see how "cp" behaves when we ask it to preserve hardlinks. In this example we are copying "directory1" into a new directory "directory2". Notice how the inodes in the new directory are same as the ones from "directory1". This means that though there are two logical directories which look alike, the actual file and directories listed within each one of them are identical. Any modification done to one file within one directory (without modifying the inode) will affect the file in the other directory. This is almost same as symbolic linking, except that unlike symbolic links the file wont dissapear from "directory2" if I delete it from "directory1". In other words there is actually multiple owners of these inodes at this moment, which seems a little hard to digest.


"cp -la src target"



    la:/tmp/test # cp -al directory1 directory2
    la:/tmp/test # ls -ila directory2/*
    358307 -rw-r--r-- 2 root root 5 Aug 21 23:06 directory2/file1.txt
    358308 -rw-r--r-- 2 root root 5 Aug 21 23:07 directory2/file2.txt
    directory2/subdir1:
    total 4
    358310 drwxr-xr-x 2 root root 80 Aug 21 23:07 .
    358309 drwxr-xr-x 3 root root 136 Aug 21 23:07 ..
    358306 -rw-r--r-- 2 root root 5 Aug 21 23:07 file3.txt



So we know how interesting hard links are and we know how to create multiple directories look exactly the same without creating as many copies of the actual data. A little more research on your part would reviel that if you had modified "subdir1/file3.txt" the only two inodes which would change are "subdir1" and "subdir1/file3.txt". I didn't show inodes of "primary" directory in the dumps below, but what I did do is show you how the inodes look like after I rsync the changes from "primary" to "directory1".Notice that after rsync to "directory1" the inodes for "subdir1" and "subdir1/file3.txt" has changed (as expected). This is because rsync usually doesn't overwrite existing inodes. Instead it creates fresh copies of updated files and directories and deletes the old ones. Interestingly inodes of "directory2" still shows the old inodes for the files/directories which were modified.

"directory2" has now become a "snapshot" of "directory1" without actually having a duplicate copy of all the data in "directory1".


Modified file3.txt in primary copy"



    la:/tmp/test # rsync -rvgoutl primary/* directory1/
    building file list ... done
    subdir1/
    subdir1/file3.txt
    wrote 161 bytes read 40 bytes 402.00 bytes/sec
    total size is 21 speedup is 0.10

    la:/tmp/test # ls -ila directory1/*
    358307 -rw-r--r-- 2 root root 5 Aug 21 23:06 directory1/file1.txt
    358308 -rw-r--r-- 2 root root 5 Aug 21 23:07 directory1/file2.txt
    directory1/subdir1:
    total 4
    358305 drwxr-xr-x 2 root root 80 Aug 21 23:14 .
    115666 drwxr-xr-x 3 root root 136 Aug 21 23:07 ..
    358322 -rw-r--r-- 1 root root 11 Aug 21 23:14 file3.txt

    la:/tmp/test # ls -ila directory2/*
    358307 -rw-r--r-- 2 root root 5 Aug 21 23:06 directory2/file1.txt
    358308 -rw-r--r-- 2 root root 5 Aug 21 23:07 directory2/file2.txt
    directory2/subdir1:
    total 4
    358310 drwxr-xr-x 2 root root 80 Aug 21 23:07 .
    358309 drwxr-xr-x 3 root root 136 Aug 21 23:07 ..
    358306 -rw-r--r-- 1 root root 5 Aug 21 23:07 file3.txt




August 14, 2005

Bluetooth on the way back

When King Danish Harald Blåtand, united Norway and Denmark, little did he know that a technology named after him ( Blåtand translates to blue-tooth) will have a chance of becoming a corner stone of the telecommunication industry.

This industry is one of the fastest growing sectors in todays world, and whether you'd like it or not it is constantly changing the world around you.
If it were not for the cell phone industry, we would still be hooked to our wired phones, and had it not been for the internet E-mails would just have been a fantasy.
And in this fast changing world one protocol which is growing very rapidly is 'bluetooth' . And just like everything before 'bluetooth' wasn't created in a day. In fact it went through some rough times before its started catching on again.The telecom industry today is not very different from what it was 1000s of years ago. There still are many different ways to communicate and some are more popular than others. But human ingenuity over time and has lead to unification of communication protocols. Though it may look like its doing the same thing, a telephone is very different from a cellphone and a cellphone is different from a satellite phone. But they all manage to get along very well, and if I call your home phone line from a cellphone in US over a satellite connection, it will still reach you and we'd still be able to talk. Internet is another perfect example of this unification which brought together computers worldwide.

While people were still fascinated by internet and wired networks, in the early 90s Ericsson predicted that the day is not far away when computers inside your home will talk to other computers and even with other electronic devices like cell phones, digital cameras, keyboards and mouse wirelessly. In 1994 they started an effort to come up with a standard for devices to communicate with each other they way computers can over wired networks. This search for a new, inexpensive communication standard ( protocol ) which could allow one device to detect the presence of another and allow it to communicate with another it using low powered radio signals was soon joined by 5 companies. Unfortunately, in spite of some early success, the process of defining a standard slowed down significantly by 1999 when the consortium had over 1200 company participants. This is when blue-tooth's problems started.

While bluetooth was still in its infancy, a new protocol IEEE 802.11 started gaining momentum. This new communication protocol was specifically designed for high speed communication between computers and networking devices using radio frequency. This was probably the toughest moment in the history of bluetooth. Eclipsed by 802.11s success bluetooth standard was on the verge of extinction.

Interestingly, though 802.11 is faster, allowed greater distances and supported much more communication features, its complexity required the device to do more work and send stronger radio signals for it to be able to communicate with others. This inadvertently forced it to draw much more power. This was not a problem for devices which are hooked up to the power, or for laptops which are charged very frequently, but it definitely was a problem for devices like cellphones and digital camera's which have very small battery capacity and cant be connected to power outlet for extended periods of times. This together with the realization of low cost of manufacturing bluetooth devices marked the comeback of this unique protocol from the dead. IEEE 802.11 still has very strong market presence, but bluetooth has carved a niche for itself which has a very big fan base.

The most popular bluetooth device today which demonstrates the power and simplicity this popular protocol is the cellphone. Most new cellphones allow users to exchange phone numbers with a click of a button, some allow you to transfer files ands photographs between your computer, some even allow you to talk using handsfree bluetooth headset. Among the other devices which are very quickly catching on are bluetooth enabled keyboards and mouse which replace ps2 and usb wires giving the users the freedom of moving around without being tied to their computers.

Infact the day is not far when you would see bluetooth in remote controls of you televisions and VCRs and may be one day even be able to control your VCR from your cellphone. Bluetooth brings with it the freedom of communication with other devices which is unmatched by anything else in the communication industry today.