printing large integers with awk

When printing with awk, it uses scientific notation by default. Take this snippet from an example file. The first column is a count of how many times a file is present, the second column is the md5sum of that file and the third is the number of bytes that the file is.

# tail -3 md5sums
  14737 113136892f2137aa0116093a524ade0b        53
  19402 1c7b413c3fa39d0fed40556d2658ac73        44
  52818 b7f10e862d0e82f77a86b522159ce3c8        45
#

If I wanted to sum up the number of files counted in this file, and how much total space they are all taking up, I do this:

# awk '{i=i+$1;j=j+($3*$1);} END {print i; print j}' md5sums
22412000
1.45255e+13

So awk counted 22412000 files, totaling about 14.5 TB. Let’s make that a little more readable:

# awk '{i=i+$1;j=j+($3*$1);} END {printf ("%d\n", i); printf("%d\n", j)}' md5sums
22412000
2147483647

Um… that’s not right. But 2147483647 is a special number. You should recognize it as the maximum value of a 32 bit unsigned integer or ((2^32)/2)-1. In this case printf doesn’t handle large integers at all. Instead, use print, but tell awk what the output format should look like:

awk 'BEGIN {OFMT = "%.0f"} {i=i+$1;j=j+($3*$1);} END {print i; print j}' md5sums 
22412000
14525468874034