use eval to run commands generated by awk

Here’s one way to generate a set of commands with awk, and then run them in a loop with eval.

# cat snippet
field1 /mnt/somedir/785/8785/948785 41 /mnt/somedir2/785/8785/948785 1 2
field1 /mnt/somedir/791/8791/948791 2 /mnt/somedir2/791/8791/948791 6 2
field1 /mnt/somedir/924/8924/948924 2 /mnt/somedir2/924/8924/948924 23 2
field1 /mnt/somedir/993/8993/948993 2 /mnt/somedir2/993/8993/948993 19876 2
field1 /mnt/somedir/3/9003/949003 8 /mnt/somedir2/3/9003/949003 273 2
field1 /mnt/somedir/70/9070/949070 341 /mnt/somedir2/70/9070/949070 6 2
field1 /mnt/somedir/517/4517/954517 2 /mnt/somedir2/517/4517/954517 14 2
field1 /mnt/somedir/699/4699/954699 210 /mnt/somedir2/699/4699/954699 1 2
field1 /mnt/somedir/726/4726/954726 1 /mnt/somedir2/726/4726/954726 6 2

Now use awk to get the output you want and generate commands. Use a forloop and eval to run them.

# for i in `awk '{if($3>$5) print "rsync -a --ignore-existing "$2"/ "$4}' left.compare.sorted  `; do echo $i; eval $i; done;
rsync -a --ignore-existing /mnt/somedir/70/9070/949070/ /mnt/somedir2/70/9070/949070
rsync -a --ignore-existing /mnt/somedir/699/4699/954699/ /mnt/somedir2/699/4699/954699
#

get directory mtime in unix time

In scripts when you need to compare last modification date of directories, you can get the date using stat in a unix timestamp or seconds from the Epoch:

# stat -c '%Z' /usr/local/sbin
1373673278

Using date you can get the same format like this:

# date +%s
1373673486

You could use this in a script to do something if a directory is older or newer than some amount of time:

#!/bin/bash
# FILE: sync_usr_local_sbin.sh
# AUTHOR: ForDoDone <fordodone at email.com>
# DATE: 2013-07-12
# NOTES: syncs /usr/local/sbin to hostxyz if it's been modified in the last 5 minutes
#

now=`date +%s`

uls_lastmtime=`stat -c '%Z' /usr/local/sbin`

uls_diff=$(echo $now - $uls_lastmtime |bc)

if [ $uls_diff -lt 300 ]
then
  rsync -a /usr/local/sbin/ hostxyz:/usr/local/sbin
fi

Of course rsync has a bunch of options to check whether it needs to do an update of files, this is just an example.

merge directories with rsync

rsync -a --ignore-existing --remove-source-files src/ dest

Any existing files in the destination will not be overwritten. After it’s done, look in src to see what is also in destination, then diff to see which ones to manually keep, or quickly write a one-liner to compare time stamps and keep newer ones and overwrite older versions.

rsync migration with manifest of transfer

I once had a migration project to move 40TB of data that needed to be moved from source to destination NFS volumes. Naturally, I went with rsync. This was basically the command for the initial transfers:

rsync -a --out-format="transfer:%t,%b,%f" --itemize-changes /mnt/srcvol /mnt/destvol >> /log/file.log

Pretty simple right? The logs are a simple csv file that looked like this:

transfer:2013/05/02 10:16:13,35291,mnt/srcvol/archive/foo/bar/barfoo/IMAGES/1256562131100.jpg

The customer asked for daily updates on progress. I said no problem, and this one liner takes care of it:

# grep transfer /log/file.log | awk -F "," '{if ($2!=0) i=i+$2; x++} END {print "total Gbytes: "i/1073741824"\ntotal files: "x}'
total Gbytes: 1153.29
total files: 123686

From the rsync command above, the %t means timestamp (2013/05/02 10:16:13), the %b means bytes transferred (35291), and the %f means the whole file path. By adding up the %b column of output and counting how many times you added it, you get both the total bytes transferred and the total number of files transferred. Directories show up as 0 byte transfers so in awk we don’t count them. Also, I threw in the divide by 1073741824 (1024*1024*1024), which converts bytes to Gebibytes.

I ended up putting it in a shell script and adding options such as, just find transfers for a particular day/hour, better handling for the Gbytes number, rate calculation, and the ability to add logs from multiple data moving servers.

rsync on different ssh port

By default ssh runs on TCP port 22. If you have a ssh configured to listen on a non standard port, you may need a special option to make rsync connect to that port. I was writing a quick backup script for this worpress site, and ran into this issue. In my case I was trying to rsync to a remote server with ssh listening on 4590. You have to give rsync a special ssh option:

# rsync -a --rsh='ssh -p 4590' /srv/www/wp-uploads/ backupsite.com:/backups/wp/wp-uplodas