If you have to search 62,000 log files for a specific string what’s the best way to do it? This will not work:
# zgrep string www1*/apache2/fordodone.com/201*/*/*/*error*.log.gz
Because shell
will expand the list, there will be too many arguments for zgrep
to process.
Instead use find
to find the list of logfiles. You could redirect to a file, then run a forloop
on each one, but we can just use -exec
with find
to run commands on the log files as we find them. This is nice, because you can process the files, and have output as it chugs along. Either of these works:
# find www1*/apache2/fordodone.com/201*/*/*/ -name '*error*.log.gz' -exec zgrep string {} \;
# find www1*/apache2/fordodone.com/201*/*/*/ -name '*error*.log.gz' -exec sh -c 'zgrep string $0' {} \;
In my head it sounds something like this: “find the files in the matching directories, that are named like ‘*error*.log.gz’, and as you find them, execute a command on them. The command is a new shell command to zgrep
for the string in the file you just found.”
The first one works fine, BUT if you need to pipe
your zgrep
or whatever to some other command you need to execute a sub shell for that.
## do sed substitution after
-exec sh -c 'zgrep string $0 | sed -e \'s/A/B/g\'' {} \;
## read backwards and find first (aka last) occurrence
-exec sh -c 'zcat $0 | tac | grep -m1 string' {} \;
Always use single quotes for the subshell command sh -c
, becuase you don’t want the current shell to interpret it, but pass the $0
as a literal so that the subshell can interpret it. The $0
in the subshell refers to the FIRST argument it is passed, which in this case is {}
, or the file that find has currently found.