Advanced applications of rsync
Over the past 20 years, the use of computer networks has exploded. The growth of the Internet, commensurate and reciprocal investments in national and international backbone infrastructure, and the plummeting price of networking and computing hardware have driven usage. Today, networks are both pervasive and commonplace, and applications still push the envelope of network scale and speed. The Internet may have gotten its start on a handful of tiny workstations, but it and its private analogs now connect countless computers.
Over the same period, UNIX® has grown as well and kept pace with increasingly capable networking software. FTP was among the first tools to share files between systems and remains in widespread use.
rcp, short for “remote copy,” improved on FTP, because it mimicked the traditional
cp utility but copied files from machine to machine.
rdist, based on
rcp, distributed files from one machine to many systems automatically.
Today, all the latter utilities are antiques:
rdist were made obsolete because both were inherently insecure.
scp took their place. While FTP remains in wide use, Secure FTP (SFTP), the secure version of FTP, should be used whenever possible. Other options exist, too — WebDAV and BitTorrent™ among them. Of course, the more machines you have, the more difficult it is to keep all in sync — or at least in a known state — and
scp and WebDAV offer no respite, unless you want to script a solution yourself.
The best tool for distributing files is
rsync can resume a transfer after interruption; it transfers only those portions of a file that differ between source and destination; and
rsync can perform entire or incremental backups. Better yet,
rsync is available on every flavor of UNIX, including Mac OS X, so it’s easy to interconnect virtually any set of systems.
Let’s look at some common uses of
rsync as review, then look at more advanced applications. The demonstration systems employed here are Mac OS X version 10.5 Leopard (a variant of FreeBSD) and Ubuntu Linux® version 8. If you use a different operating system, chances are, most of the examples here are portable; check your machine’s
rsync man page to verify proper operation.
A quick review
rsync copies files from a source to a destination. Unlike
cp, the source and destination of an
rsync operation can be local or remote. For instance, the command in Listing 1 copies the directory /tmp/photos and its entire contents verbatim to a home directory.
Listing 1. Copy the contents of a directory verbatim
$ rsync -n -av /tmp/photos ~ building file list ... done photos/ photos/Photo 2.jpg photos/Photo 3.jpg photos/Photo 6.jpg photos/Photo 9.jpg sent 218 bytes received 56 bytes 548.00 bytes/sec total size is 375409 speedup is 1370.11
-v option enables verbose messages. The
-a option (where a stands for archive), is a shorthand for
-rlptgoD (recurse, copy symbolic links as symbolic links, preserve permissions, preserve file times, preserve group, preserve owner, and preserve devices and special files, respectively). Typically,
-a mirrors files; exceptions occur when the destination cannot or does not support the same attributes. For example, copying a directory from UNIX to Windows® does not map perfectly. Some suggestions for unusual cases appear below.
rsync has a lot of options. If you worry that your options or source or destination specifications are incorrect, use
-n to perform a dry run. A dry run previews what will happen to each file but does not move a single byte. When you are confident of all the settings, drop the
-n and proceed.
Listing 2 provides an example where
-n is invaluable. The command in Listing and the following command yield different results.
Listing 2. Copy the contents of a named directory
$ rsync -av /tmp/photos/ ~ ./ Photo 2.jpg Photo 3.jpg Photo 6.jpg Photo 9.jpg sent 210 bytes received 56 bytes 532.00 bytes/sec total size is 375409 speedup is 1411.31
What is the difference? The difference is the trailing slash on the source argument. If the source has a trailing slash, the contents of the named directory but not the directory itself are copied. A slash on the end of the destination is immaterial.
And Listing 3 provides an example of moving the same directory to another system.
Listing 3. Move a directory to a
$ rsync -av /tmp/photos example.com:album created directory album Photo 2.jpg Photo 3.jpg Photo 6.jpg Photo 9.jpg sent 210 bytes received 56 bytes 21.28 bytes/sec total size is 375409 speedup is 1411.31
Assuming that you have the same login name on the remote machine,
rsync prompts you with a password and, given the proper credential, creates the directory album and copies the images to that directory. By default,
rsync uses Secure Shell (SSH) as its transport mechanism; you can reuse your machine aliases and public keys with
The examples in Listing 2 and Listing 3 demonstrate two of
rsync‘s four modes. The first example was shell mode, also dubbed local mode. The second sample was remote shell mode and is so named because SSH powers the underlying connection and transfers.
rsync has two additional modes. List mode acts like
ls: It lists the contents of source, as shown in Listing 4.
Listing 4. List the contents of a source
$ drwxr-xr-x 238 2009/08/22 18:49:50 photos -rw-r--r-- 6148 2008/07/03 01:36:18 photos/.DS_Store -rw-r--r-- 71202 2008/06/18 04:51:36 photos/Photo 2.jpg -rw-r--r-- 69632 2008/06/18 04:51:45 photos/Photo 3.jpg -rw-r--r-- 61046 2008/07/14 00:31:17 photos/Photo 6.jpg -rw-r--r-- 167381 2008/07/14 00:31:56 photos/Photo 9.jpg
The fourth mode is server mode. Here, the
rsync daemon runs perennially on a machine, accepting requests to transfer files. A transfer can send files to the daemon or request files from it. Server mode is ideal for creating a central backup server or project repository.
To differentiate between remote shell mode and server mode, the latter employs two colons (
:) in the source and destination names. Assuming that whatever.example.com exists, the next command copies files from the source to a local destination:
rsync -av whatever.example.com::src /tmp
And what exactly is
src? It’s an
rsync module that you define and configure on the daemon’s host. A module has a name, a path that contains its files, and some other parameters, such as
read only, which protects the contents from modification.
To run an
rsync daemon, type:
sudo rsync --daemon
rsync daemon as the superuser, root, is not strictly necessary, but the practice protects other files on your machine. Running as root,
rsync restricts itself to the module’s directory hierarchy (its path) using
chroot. After a
chroot, all other files and directories seem to vanish. If you choose to run the
rsync daemon with your own privileges, choose an unused socket and make sure its modules have sufficient permissions to allow download and/or upload. Listing 5 shows a minimal configuration to share some files in your home directory without the need for
sudo. The configuration is stored in file rsyncd.conf.
Listing 5. Simple configuration for sharing files
motd file = /home/strike/rsyncd/rsync.motd_file pid file = /home/strike/rsyncd/rsyncd.pid port = 7777 use chroot = no [demo] path = /home/strike comment = Martin home directory list = no [dropbox] path = /home/strike/public/dropbox comment = A place to leave things for Martin read only = no [pickup] path = /home/strike/public/pickup comment = Get your files here!
The file has two segments. The first segment — here, the first four lines — configures the operation of the
rsync daemon. (Other options are available, too.) The first line points to a file with a friendly message to identify your server. The second line points to another file to record the process ID of the server. This is a convenience in the event you must manually kill the
kill -INT `cat /home/strike/rsyncd/rsyncd.pid`
The two files are in a home directory, because this example does not use superuser privileges to run the software. Similarly, the port chosen for the daemon is above 1000, which users can claim for any application. The fourth line turns off
The remaining segment is subdivided into small sections, one section per module. Each section, in turn, has a header line and a list of (key-value) pairs to set options for each module. By default, all modules are read only; set
read only = no to allow Write operations. Also by default, all modules are listed in the module catalog; set
list = no to hide the module.
To start the daemon, run:
rsync --daemon --config=rsyncd.conf
Now, connect to the daemon from another machine, and omit a module name. You should see this:
rsync --port=7777 mymachine.example.com:: Hello! Welcome to Martin's rsync server. dropbox A place to leave things for Martin pickup Get your files here!
If you do not name a module after the colons (
::), the daemon responds with a list of available modules. If you name a module but do not name a specific file or directory within the module, the daemon provides a catalog of the module’s contents, as shown in Listing 6.
Listing 6. Catalog output of a module’s contents
rsync --port=7777 mymachine.example.com::pickup Hello! Welcome to Martin's rsync server. drwxr-xr-x 4096 2009/08/23 08:56:19 . -rw-r--r-- 0 2009/08/23 08:56:19 article21.html -rw-r--r-- 0 2009/08/23 08:56:19 design.txt -rw-r--r-- 0 2009/08/23 08:56:19 figure1.png
And naming a module and a file copies the file locally, as shown in Listing 7.
Listing 7. Name a module to copy files locally
rsync --port=7777 mymachine.example.com::pickup/ Hello! Welcome to Martin's rsync server. drwxr-xr-x 4096 2009/08/23 08:56:19 . -rw-r--r-- 0 2009/08/23 08:56:19 article21.html -rw-r--r-- 0 2009/08/23 08:56:19 design.txt -rw-r--r-- 0 2009/08/23 08:56:19 figure1.png
You can also perform an upload by reversing the source and destination, then pointing to the module for writes, as shown in Listing 8.
Listing 8. Reverse source and destination directories
$ rsync -v --port=7777 application.js mymachine.example.com::dropbox Hello! Welcome to Martin's rsync server. application.js sent 245 bytes received 38 bytes 113.20 bytes/sec total size is 164 speedup is 0.58
That’s a quick but thorough review. Next, let’s see how you can apply
rsync to daily tasks.
rsync is especially useful for backups. And because it can synchronize a local file with its remote counterpart — and can do that for an entire file system, too — it’s ideal for managing large clusters of machines that must be (at least partially) identical.
Back up your data with rsync
Performing backups on a frequent basis is a critical but typically ignored chore. Perhaps it’s the demands of running a lengthy backup each day or the need to have large external media to store files; never mind the excuse, copying data somewhere for safekeeping should be an everyday practice.
To make the task painless, use
rsync and point to a remote server — perhaps one that your service provider hosts and backs up. Each of your UNIX machines can use the same technique, and it’s ideal for keeping the data on your laptop safe.
Establish SSH keys and an
rsync daemon on the remote machine, and create a backup module to permit writes. Once established, run
rsync to create a daily backup that takes hardly any space, as shown in Listing 9.
Listing 9. Create daily backups
#!/bin/sh # This script based on work by Michael Jakl (jakl.michael AT gmail DOTCOM) and used # with express permission. HOST=mymachine.example.com SOURCE=$HOME PATHTOBACKUP=home-backup date=`date "+%Y-%m-%dT%H:%M:%S"` rsync -az --link-dest=$PATHTOBACKUP/current $SOURCE $HOST:PATHTOBACKUP/back-$date ssh $HOST "rm $PATHTOBACKUP/current && ln -s back-$date $PATHTOBACKUP/current"
HOST with the name of your backup host and
SOURCE with the directory you want to save. Change
PATHTOBACKUP to the name of your module. (You can also embed the three final lines of the script in a loop, dynamically change
SOURCE, and back up a series of separate directories on the same system.) Here’s how the backup works:
- To begin,
dateis set to the current date and time and yields a string like
2009-08-23T12:32:18, which identifies the backup uniquely.
rsynccommand performs the heavy lifting.
-azpreserves all file information and compresses the transfers. The magic lies in
--link-dest=$PATHTOBACKUP/current, which specifies that if a file has not changed, do not copy it to the new backup. Instead, create a hard link from the new backup to the same file in the existing backup. In other words, the new backup only contains files that have changed; the rest are links.
More specifically (and expanding all variables),
mymachine.example.com::home-backup/currentis the current archive. The new archive for /home/strike is targeted to
mymachine.example.com::home-backup/back-2009-08-23T12:32:18. If a file in /home/strike has not changed, the file is represented in the new backup by a hard link to the current archive. Otherwise, the new file is copied to the new archive.
If you touch but a few files or perhaps a handful of directories each day, the additional space required for what is effectively a full backup is paltry. Moreover, because each daily backup (except the very first) is so small, you can keep a long history of the files on hand.
- The last step is to alter the organization of the backups on the remote machine to promote the newly created archive to be the current archive, thereby minimizing the differences to record the next time this script runs. The last command removes the current archive (which is merely a symbolic link) and recreates the same symbolic link pointing to the new archive.
Keep in mind that a hard link to a hard link points to the same file. Hard links are very cheap to create and maintain, so a full backup is simulated using only an incremental scheme.
Other advanced tricks and tips
Once you begin using remote
rsync in daily tasks, you’ll likely find it necessary to keep your daemon running at all times. Linux and UNIX machines have a startup script for
rsync, usually in /etc/init.d/rsync. Check your operating system for a startup script and the utility that enables and disables components. In contrast, if you are running
rsync as a daemon for your own use, or if you do not have access to the startup scripts, you can still start
@reboot /usr/bin/rsync --daemon --port=7777 --config=/home/strike/rsyncd/rsyncd.conf
This command launches the daemon each time the machine restarts. Place this line in your crontab file, and save the file.
You saw how a preview with
-n can reveal problems before any occur. You can also monitor the state of your transfers with two options:
--stats. The former renders a progress bar. The latter shows how compression and transmission. Further, you can hasten the transfer between two machines with
--compress. Rather than send raw data, the data is compressed by the sender and decompressed by the receiver, making the transit across the wire faster — fewer bytes translates to better times.
rsync ensures that all files in the source are copied to the destination. This is duplication. If you want a mirror, where the destination is an exact copy of the source, provide
--delete. For example, if the source has files A, B, and C, a standard
rsync copy duplicates A, B, and C to the destination. However, if you delete B from the source and duplicate again, the destination no longer mirrors the source: B is no longer valid. The
--delete command mirrors and removes files in the destination that no longer exist in the source.
Oftentimes, there are files you never want to copy to a backup or an archive. These include scratch files created by editors (usually denoted by a trailing tilde [
~]) and other utilities and a wide variety of files that are nonessential, such as the MP3 files in your home directory that can be recreated if need be. You can exclude files from processing using patterns. You can specify a pattern on the command line or a list of patterns in a text file. You can also combine the patterns with the
--delete-excluded command to remove files from the destination.
To exclude files based on a pattern using the command line, use
--exclude. Remember that if any characters in the pattern have special meaning to the shell, such as
*, wrap the pattern in single quotes:
rsync -a --exclude='*~' /home/strike/data example.com::data
Assuming that the file /home/strike/excludes had a list of patterns like this:
*~ *.old *.mp3 tmp
you can exclude all files that match any of those patterns with:
rsync -a --exclude-from=/home/strike/excludes /home/strike/data example.com::data
Sync ’em up
Now that you know about
rsync, you have no excuse to skip a healthy backup regimen. What’s that? Your dog ate your hard disk? (Plausible these days, no?) See, and you said your data would be just fine. Now your valuable files live in FIDOnet.