Use find and awk to get directories with very long names.
find . -type d | awk '{ if (length($0) > 254) print}'
Use find and awk to get directories with very long names.
find . -type d | awk '{ if (length($0) > 254) print}'
You can easily diff
the output of commands instead of files. In this case hexdump
prints thousands of lines, but I’m only interested in the difference:
# diff <(hexdump file1.bin) <(hexdump file2.bin)
1,2c1,2
< 0000000 6a49 b610 0000 0000 5733 7261 4465 4243
< 0000010 0000 0000 0001 0000 9006 4e0b 0b28 000f
---
> 0000000 6a49 b616 0000 0000 5733 7261 4465 4243
> 0000010 0000 0000 0001 0000 9006 4e11 0b28 000f
Run the hexdump
in subshell using parenthesis, then redirect the output back to diff
. I’m only interested in the 2 pieces that are different for each binary file:
# for i in `ls *.bin | sort -nk1.7`; do echo -n "$i: "; hexdump -C $i | grep '33 57 61 72 65 44\|4e 28 0b 0f 00' | awk '{if(NR==1) print $4;if(NR==2) print $12}' | paste - -; done | column -t 2>/dev/null
file0.bin: 1a 15
file1.bin: 19 14
file2.bin: 18 13
file3.bin: 17 12
file4.bin: 16 11
file5.bin: 15 10
file6.bin: 14 0f
file8.bin: 12 0d
file9.bin: 11 0c
file10.bin: 10 0b
file12.bin: 0e 09
file13.bin: 0d 08
file14.bin: 0f 0a
file15.bin: 0b 06
file16.bin: 0a 05
file17.bin: 09 04
file18.bin: 08 03
file19.bin: 07 02
file20.bin: 06 01
file21.bin: 05 00
file22.bin: 0c 07
By default dd is silent. It just copies whatever blocks you want from in to out. In order to see progress, send it a USR1 signal using kill.
Start a useless dd:
# dd if=/dev/zero of=/dev/null
In another terminal find the pid:
# ps aux | grep dd | grep -v grep
root 7784 90.5 0.0 2884 560 pts/9 R+ 10:01 0:06 dd if /dev/zero of /dev/null
#
# kill -USR1 7784
The original window will now show this:
# dd if=/dev/zero of=/dev/null
14501614+0 records in
14501614+0 records out
7424826368 bytes (7.4 GB) copied, 16.2149 seconds, 458 MB/s
Then you can ctrl+c it to get the final output:
# dd if=/dev/zero of=/dev/null
14501614+0 records in
14501614+0 records out
7424826368 bytes (7.4 GB) copied, 16.2149 seconds, 458 MB/s
16888077+0 records in
16888076+0 records out
8646694912 bytes (8.6 GB) copied, 19.3507 seconds, 447 MB/s
This one liner will start your dd, then monitor it and output progress every 20 seconds. Once the dd is finished it will stop and give your shell back.
dd if=/dev/zero of=/dev/null & pid=$! && sleep 20s && while true; do i=`ps aux | awk '{print $2}' | grep ^$pid$`; if [ "${i:-a}" != "$pid" ]; then break; fi; kill -USR1 $pid; sleep 20s; done;
When printing with awk, it uses scientific notation by default. Take this snippet from an example file. The first column is a count of how many times a file is present, the second column is the md5sum of that file and the third is the number of bytes that the file is.
# tail -3 md5sums
14737 113136892f2137aa0116093a524ade0b 53
19402 1c7b413c3fa39d0fed40556d2658ac73 44
52818 b7f10e862d0e82f77a86b522159ce3c8 45
#
If I wanted to sum up the number of files counted in this file, and how much total space they are all taking up, I do this:
# awk '{i=i+$1;j=j+($3*$1);} END {print i; print j}' md5sums
22412000
1.45255e+13
So awk counted 22412000 files, totaling about 14.5 TB. Let’s make that a little more readable:
# awk '{i=i+$1;j=j+($3*$1);} END {printf ("%d\n", i); printf("%d\n", j)}' md5sums
22412000
2147483647
Um… that’s not right. But 2147483647 is a special number. You should recognize it as the maximum value of a 32 bit unsigned integer or ((2^32)/2)-1. In this case printf doesn’t handle large integers at all. Instead, use print, but tell awk what the output format should look like:
awk 'BEGIN {OFMT = "%.0f"} {i=i+$1;j=j+($3*$1);} END {print i; print j}' md5sums
22412000
14525468874034
Recently, I tried to import a SQL dump created by mysqldump
that somehow had a duplicate entry for a primary key. Here’s a sample of the contents:
INSERT INTO `table1` VALUES ('B97bKm',71029594,3,NULL,NULL,'2013-01-22 09:25:39'),('dZfUHQ',804776,1,NULL,NULL,'2012-09-05 16:15:23'),('hWkGsz',70198487,0,NULL,NULL,'2013-01-05 10:55:36'),('n6366s',69480146,1,NULL,NULL,'2012-
12-18 03:27:45'),('tBP6Ug',65100805,1,NULL,NULL,'2012-08-29 21:32:39'),('yfpewZ',18724906,0,NULL,NULL,'2013-03-31 17:12:58'),('UNz5qp',8392940,2,NULL,NULL,'2012-11-28 02:00:00'),('9WVpVV',71181566,0,NULL,NULL,'2013-01-25 06:15:03'),('kEP
Qu5',64972980,9,NULL,NULL,'2012-09-01 06:00:36')
It goes on for another 270,000 entries. I was able to find the duplicate value like this:
# cat /tmp/table1.sql | grep INSERT | sed -e 's/),/\n/g' | sed -e 's/VALUES /\n/' | grep -v INSERT | awk -F, '{print $2}' | sort | uniq -c | awk '{if($1>1) print;}'
2 64590015
#
The primary key value 64590015 had 2 entries. I removed the spurious entry, and subsequently the SQL imported fine.
If you can’t remove an .iso file, because it’s mounted by a VM, you can search through many VMs, or use this sloppy one liner:
eval `vim-cmd vmsvc/getallvm | grep -v Vmid | awk '{print "echo \"vmName: "$2"\"; vim-cmd vmsvc/device.getdevices "$1 ";"}'` | grep -e ".iso\|vmName" | grep -v fileName | grep -B1 summary
When looking on a pxe boot install server, you can see what the newest clients were to boot. If you don’t have KVM access on new servers to be installed, just look at the newest lease info, and make an educated guess about which new one to login to the auto-installer environment (preseed) via ssh.
Here’s a snippet from the leases file:
lease 10.101.40.85 {
starts 3 2013/05/15 19:54:36;
ends 3 2013/05/15 20:54:36;
cltt 3 2013/05/15 19:54:36;
binding state active;
next binding state free;
hardware ethernet 00:30:48:5c:cf:34;
uid "\001\0000H\\\3174";
}
And after some parsing:
# cat /var/lib/dhcp/dhcpd.leases | grep -e lease -e hardware -e start -e end | grep -v format | grep -v written | sed -e '/start/s/\// /g' -e 's/;//g' -e '/starts/s/:/ /g' | paste - - - - | awk '{print $2" "$18" "$6" "$7" "$8" "$9" "$10" "$11" "$14" "$15}' | sort -k 3,3n -k 4,4n -k 5,5n -k 6,6n -k 7,7n -k 8,8n | awk '{print $1" "$2" "$3"/"$4"/"$5" "$6":"$7":"$8" "$9" "$10}' | column -t
10.101.40.127 00:1e:68:9a:e5:ac 2013/04/26 22:02:58 2013/04/26 23:02:58
10.101.40.129 00:1e:68:9a:e5:ac 2013/04/26 23:10:01 2013/04/27 00:10:01
10.101.40.122 00:1e:68:9a:e5:ac 2013/04/26 23:27:57 2013/04/26 23:30:42
10.101.40.118 00:1e:68:9a:ee:69 2013/05/14 16:21:28 2013/05/14 17:21:28
10.101.40.85 00:30:48:5c:cf:34 2013/05/14 16:54:43 2013/05/14 17:54:43
10.101.40.118 00:1e:68:9a:ee:69 2013/05/14 17:14:04 2013/05/14 18:14:04
10.101.40.85 00:30:48:5c:cf:34 2013/05/14 17:24:43 2013/05/14 18:24:43
10.101.40.85 00:30:48:5c:cf:34 2013/05/14 17:54:42 2013/05/14 18:54:42
#
I once had a migration project to move 40TB of data that needed to be moved from source to destination NFS volumes. Naturally, I went with rsync
. This was basically the command for the initial transfers:
rsync -a --out-format="transfer:%t,%b,%f" --itemize-changes /mnt/srcvol /mnt/destvol >> /log/file.log
Pretty simple right? The logs are a simple csv file that looked like this:
transfer:2013/05/02 10:16:13,35291,mnt/srcvol/archive/foo/bar/barfoo/IMAGES/1256562131100.jpg
The customer asked for daily updates on progress. I said no problem, and this one liner takes care of it:
# grep transfer /log/file.log | awk -F "," '{if ($2!=0) i=i+$2; x++} END {print "total Gbytes: "i/1073741824"\ntotal files: "x}'
total Gbytes: 1153.29
total files: 123686
From the rsync
command above, the %t means timestamp (2013/05/02 10:16:13), the %b means bytes transferred (35291), and the %f means the whole file path. By adding up the %b column of output and counting how many times you added it, you get both the total bytes transferred and the total number of files transferred. Directories show up as 0 byte transfers so in awk
we don’t count them. Also, I threw in the divide by 1073741824 (1024*1024*1024), which converts bytes to Gebibytes.
I ended up putting it in a shell script and adding options such as, just find transfers for a particular day/hour, better handling for the Gbytes number, rate calculation, and the ability to add logs from multiple data moving servers.
If you need to get the last few characters of a string, you can just use awk. This works great because you don’t have to know how long or short the string is to begin with. It first gets the length of the string. So in this example the length is 9 characters, and the ‘1’ would be in position 9 of the string. Then it subtracts 2 from position 9, to get position 7 (which is the ‘3’), then it gets 3 characters starting from position 7: ‘371’
# echo server371 | awk '{print substr($0,length($0)-2,3)}'
371
# echo anotherserver892 | awk '{print substr($0,length($0)-2,3)}'
892