Using find to act on files is very useful, but if the files that are found need different actions based on their filetype, it gets a bit trickier. For example there are some log files foo.log but after 10 days they get compressed to foo.log.gz. So you are finding regular text files, as well as gzipped text files. Extend your find with an -exec and a bash shell to determine what file extension it is, and to run the appropriate grep or zgrep based on that. Then run it through awk or whatever else to parse out what you need.

# find . -type f -name 'foo.log*' -exec bash -c 'if [[ $0 =~ .log$ ]]; then grep foobar $0; elif [[ $0 =~ .log.gz$ ]]; then zgrep foobar $0; fi' {} \; | awk '{if(/typea/)a++; if(/typeb/)b++; tot++} END {print "typea: "a" - "a*100/tot"%"; print "typeb: "b" - "b*100/tot"%"; print "typec: "tot-(a+b)" - "(tot-(a+b))*100/tot"%"; print "total: "tot;}'
typea: 5301 - 67.4771%
typeb: 2539 - 32.3192%
typec: 16 - 0.203666%
total: 7856

find and search for string in gzipped and text logs

find logs dating back 3 weeks, if they are gzipped use zgrep, if they are a regular text log use grep, if they aren’t a log do nothing, search for the string in the found log file

# find /mnt/toaster1/logs/app_logs/application1/2014 -type f -mtime -21 -exec bash -c 'if [[ $0 == *.log ]]; then g=grep; elif [[ $0 == *.gz ]]; then g=zgrep; else g=:; fi; $g "foostring" $0' {} \;

find exec with grep pipe

If you have to search 62,000 log files for a specific string what’s the best way to do it? This will not work:

# zgrep string www1*/apache2/*/*/*/*error*.log.gz

Because shell will expand the list, there will be too many arguments for zgrep to process.

Instead use find to find the list of logfiles. You could redirect to a file, then run a forloop on each one, but we can just use -exec with find to run commands on the log files as we find them. This is nice, because you can process the files, and have output as it chugs along. Either of these works:

# find www1*/apache2/*/*/*/ -name '*error*.log.gz' -exec zgrep string {} \;

# find www1*/apache2/*/*/*/ -name '*error*.log.gz' -exec sh -c 'zgrep string $0' {} \;

In my head it sounds something like this: “find the files in the matching directories, that are named like ‘*error*.log.gz’, and as you find them, execute a command on them. The command is a new shell command to zgrep for the string in the file you just found.”

The first one works fine, BUT if you need to pipe your zgrep or whatever to some other command you need to execute a sub shell for that.

## do sed substitution after
-exec sh -c 'zgrep string $0 | sed -e \'s/A/B/g\'' {} \;

## read backwards and find first (aka last) occurrence
-exec sh -c 'zcat $0 | tac | grep -m1 string' {} \;

Always use single quotes for the subshell command sh -c , becuase you don’t want the current shell to interpret it, but pass the $0 as a literal so that the subshell can interpret it. The $0 in the subshell refers to the FIRST argument it is passed, which in this case is {}, or the file that find has currently found.