When printing with awk, it uses scientific notation by default. Take this snippet from an example file. The first column is a count of how many times a file is present, the second column is the md5sum of that file and the third is the number of bytes that the file is.
# tail -3 md5sums
14737 113136892f2137aa0116093a524ade0b 53
19402 1c7b413c3fa39d0fed40556d2658ac73 44
52818 b7f10e862d0e82f77a86b522159ce3c8 45
#
If I wanted to sum up the number of files counted in this file, and how much total space they are all taking up, I do this:
# awk '{i=i+$1;j=j+($3*$1);} END {print i; print j}' md5sums
22412000
1.45255e+13
So awk counted 22412000 files, totaling about 14.5 TB. Let’s make that a little more readable:
# awk '{i=i+$1;j=j+($3*$1);} END {printf ("%d\n", i); printf("%d\n", j)}' md5sums
22412000
2147483647
Um… that’s not right. But 2147483647 is a special number. You should recognize it as the maximum value of a 32 bit unsigned integer or ((2^32)/2)-1. In this case printf doesn’t handle large integers at all. Instead, use print, but tell awk what the output format should look like:
awk 'BEGIN {OFMT = "%.0f"} {i=i+$1;j=j+($3*$1);} END {print i; print j}' md5sums
22412000
14525468874034
I just wanted to thank you for this. I was struggling with achieving exactly this for quite some time until I found your hands tip. THANKS!
Thank you soooo much!