find duplicate entry in sql dump

Recently, I tried to import a SQL dump created by mysqldump that somehow had a duplicate entry for a primary key. Here’s a sample of the contents:

INSERT INTO `table1` VALUES ('B97bKm',71029594,3,NULL,NULL,'2013-01-22 09:25:39'),('dZfUHQ',804776,1,NULL,NULL,'2012-09-05 16:15:23'),('hWkGsz',70198487,0,NULL,NULL,'2013-01-05 10:55:36'),('n6366s',69480146,1,NULL,NULL,'2012-
12-18 03:27:45'),('tBP6Ug',65100805,1,NULL,NULL,'2012-08-29 21:32:39'),('yfpewZ',18724906,0,NULL,NULL,'2013-03-31 17:12:58'),('UNz5qp',8392940,2,NULL,NULL,'2012-11-28 02:00:00'),('9WVpVV',71181566,0,NULL,NULL,'2013-01-25 06:15:03'),('kEP
Qu5',64972980,9,NULL,NULL,'2012-09-01 06:00:36')

It goes on for another 270,000 entries. I was able to find the duplicate value like this:

# cat /tmp/table1.sql | grep INSERT | sed -e 's/),/\n/g' | sed -e 's/VALUES /\n/' | grep -v INSERT | awk -F, '{print $2}' | sort | uniq -c | awk '{if($1>1) print;}'
    2 64590015
#

The primary key value 64590015 had 2 entries. I removed the spurious entry, and subsequently the SQL imported fine.

find exec with grep pipe

If you have to search 62,000 log files for a specific string what’s the best way to do it? This will not work:

# zgrep string www1*/apache2/fordodone.com/201*/*/*/*error*.log.gz

Because shell will expand the list, there will be too many arguments for zgrep to process.

Instead use find to find the list of logfiles. You could redirect to a file, then run a forloop on each one, but we can just use -exec with find to run commands on the log files as we find them. This is nice, because you can process the files, and have output as it chugs along. Either of these works:

# find www1*/apache2/fordodone.com/201*/*/*/ -name '*error*.log.gz' -exec zgrep string {} \;

# find www1*/apache2/fordodone.com/201*/*/*/ -name '*error*.log.gz' -exec sh -c 'zgrep string $0' {} \;

In my head it sounds something like this: “find the files in the matching directories, that are named like ‘*error*.log.gz’, and as you find them, execute a command on them. The command is a new shell command to zgrep for the string in the file you just found.”

The first one works fine, BUT if you need to pipe your zgrep or whatever to some other command you need to execute a sub shell for that.

## do sed substitution after
-exec sh -c 'zgrep string $0 | sed -e \'s/A/B/g\'' {} \;

## read backwards and find first (aka last) occurrence
-exec sh -c 'zcat $0 | tac | grep -m1 string' {} \;

Always use single quotes for the subshell command sh -c , becuase you don’t want the current shell to interpret it, but pass the $0 as a literal so that the subshell can interpret it. The $0 in the subshell refers to the FIRST argument it is passed, which in this case is {}, or the file that find has currently found.

mount windows disk image in linux

I was recently converting a Windows installation from a physical desktop to a virtual machine. Because the installation was an OEM installation the P2V conversion failed the “hardware” check and the target VM was unregistered. The only way to fix was to do a repair install changing the installation from an OEM to a Retail version of Windows. This needed to be done on the physical source desktop. There’s no way I was going to attempt this without a total backup of the system. I could have just copied important files, but I decided a disk clone was more appropriate. I booted off of a Debian rescue cd, mounted a NFS share, and dd’d a copy of the entire drive to a flat file.

# mkdir /mnt/nfsserver/diskclone
# mount nfsserver:/vol/diskclone /mnt/nfsserver/diskclone
# cd /mnt/nfsserver/diskclone
#
# dd if=/dev/sda of=desktop.img
156250000+0 records in
156250000+0 records out
80000000000 bytes (80 GB) copied, 7079.64601 seconds, 11.3 MB/s
#
# ls -l
total 78278496
-rw-r--r-- 1 root root 80000000000 2013-05-20 15:47 desktop.img

I then needed to mount the image and take a look at the contents. A flat disk image file is just a block for block copy of the actual disk, so the first step is to look at the partition table listed in sectors and find the offset. The offset will tell mount where the beginning of the file system is.

# losetup /dev/loop0 desktop.img
#
# fdisk -l -u  /dev/loop0

Disk /dev/loop0: 80.0 GB, 80000000000 bytes
255 heads, 63 sectors/track, 9726 cylinders, total 156250000 sectors
Units = sectors of 1 * 512 = 512 bytes 

      Device Boot      Start         End      Blocks   Id  System
/dev/loop0p1   *          63   156232124    78116031    7  HPFS/NTFS
#
#

Find the offset by multiplying the 63 (start sector) by 512 bytes ( 63*512 ) and the offset is 32256. Now we can tell mount where the start is.

# mkdir /mnt/diskimage
# mount -t ntfs -o offset=32256,ro desktop.img /mnt/diskimage
#

I mounted readonly so that we don’t mess anything up. You may need to install ntfsprogs to be able to read an NTFS file system (apt-get install ntfsprogs) if you don’t have it installed already

remove many empty directories

# find . -depth -mindepth 1 -maxdepth 3 -type d -exec rmdir {} \;

This finds directories, between 1 and 3 levels deep and attempts to remove them. The -depth flag finds the deepest child directories, before finding parents. This is great, because it tries to remove foo/bar/ before it will try to remove foo/. Without removing foo/bar/ first, rmdir foo/ would fail. Because rmdir will fail if there are any contents in a directory, the operation is safe to run without removing any files. You could redirect STDERR to a file, and capture all the directories that are not empty for processing later.

disable Linux/Unix user

You can remove a user and their home directory, however to preserve their login, but disable it the easiest way is to change their shell to one that doesn’t exist, or /bin/false. /bin/nologin leaves a breadcrumb that someone else can tell that a human disabled the account.

# chsh -s /bin/nologin firedemployee

change Baytech RPC MAC address

Why any manufacturer would not burn a MAC address into an interface is beyond me. There are many of these RPC-3 units around that all have the same MAC address ( 00:C0:48:00:56:CE ), because they weren’t set at the factory. To set them:

1) log in on console
2) select 3 for Configuration
3) type in an up carat ^
4) hit enter
5) enter the serial number of the unit (i.e. 02314798 not 02314798-00 )
6) hit enter, enter
7) accept changes Y
8) unit resets and uses the new MAC address

Because the serial numbers should be unique, the MACs will also be unique. This decimal serial number 02314798 translates to this hexadecimal number 23522E, and it’s used to form the last 6 digits of the MAC making the final address 00:C0:48:23:52:2E. They must have set all of them using the fake serial number 00022222 at the factory for some reason.

dhcpd lease information sorted by date and time

When looking on a pxe boot install server, you can see what the newest clients were to boot. If you don’t have KVM access on new servers to be installed, just look at the newest lease info, and make an educated guess about which new one to login to the auto-installer environment (preseed) via ssh.

Here’s a snippet from the leases file:

lease 10.101.40.85 {
  starts 3 2013/05/15 19:54:36;
  ends 3 2013/05/15 20:54:36;
  cltt 3 2013/05/15 19:54:36;
  binding state active;
  next binding state free;
  hardware ethernet 00:30:48:5c:cf:34;
  uid "\001\0000H\\\3174";
}

And after some parsing:

# cat /var/lib/dhcp/dhcpd.leases | grep -e lease -e hardware -e start -e end | grep -v format | grep -v written | sed -e '/start/s/\// /g' -e 's/;//g' -e '/starts/s/:/ /g' | paste - - - - | awk '{print $2" "$18" "$6" "$7" "$8" "$9" "$10" "$11" "$14" "$15}' | sort -k 3,3n -k 4,4n -k 5,5n -k 6,6n -k 7,7n -k 8,8n | awk '{print $1" "$2" "$3"/"$4"/"$5" "$6":"$7":"$8" "$9" "$10}' | column -t


10.101.40.127  00:1e:68:9a:e5:ac  2013/04/26  22:02:58  2013/04/26  23:02:58
10.101.40.129  00:1e:68:9a:e5:ac  2013/04/26  23:10:01  2013/04/27  00:10:01
10.101.40.122  00:1e:68:9a:e5:ac  2013/04/26  23:27:57  2013/04/26  23:30:42
10.101.40.118  00:1e:68:9a:ee:69  2013/05/14  16:21:28  2013/05/14  17:21:28
10.101.40.85   00:30:48:5c:cf:34  2013/05/14  16:54:43  2013/05/14  17:54:43
10.101.40.118  00:1e:68:9a:ee:69  2013/05/14  17:14:04  2013/05/14  18:14:04
10.101.40.85   00:30:48:5c:cf:34  2013/05/14  17:24:43  2013/05/14  18:24:43
10.101.40.85   00:30:48:5c:cf:34  2013/05/14  17:54:42  2013/05/14  18:54:42
#

merge directories with rsync

rsync -a --ignore-existing --remove-source-files src/ dest

Any existing files in the destination will not be overwritten. After it’s done, look in src to see what is also in destination, then diff to see which ones to manually keep, or quickly write a one-liner to compare time stamps and keep newer ones and overwrite older versions.