map CPU ID to threads to cores to sockets with hyperthreading

The Linux OS uses a predictable method to map hyperthread enabled CPU threads to CPU IDs. Here’s a reminder if you need a quick way to map them:

# egrep 'processor|core id|physical id' /proc/cpuinfo | cut -d : -f 2 | paste - - -  | awk '{print "CPU"$1"\tsocket "$2" core "$3}'
CPU0    socket 0 core 0
CPU1    socket 0 core 1
CPU2    socket 0 core 2
CPU3    socket 0 core 3
CPU4    socket 1 core 0
CPU5    socket 1 core 1
CPU6    socket 1 core 2
CPU7    socket 1 core 3
CPU8    socket 0 core 0
CPU9    socket 0 core 1
CPU10   socket 0 core 2
CPU11   socket 0 core 3
CPU12   socket 1 core 0
CPU13   socket 1 core 1
CPU14   socket 1 core 2
CPU15   socket 1 core 3

This is a dual quad core system with hyperthreading enabled. 2 physical CPUs with 4 cores each and 2 threads per core so the OS sees 16 CPUs. CPU0 and CPU8 are 2 threads on the first core (core 0 ) on the first physical processor (socket 0)

insert characters into string with sed

# echo 20150429 | sed -e 's/\(.\{4\}\)\(.\{2\}\)/\1\/\2\//'
2015/04/29

Start with a single flat directory with thousands of log files…

# ls | head -5
db301.20140216.log.gz
db301.20140217.log.gz
db301.20140218.log.gz
db301.20140219.log.gz
db301.20140220.log.gz

Now move timestamped files into sorted directory by day

# for i in `ls`; do j=$(echo $i| cut -d . -f 2  | sed -e 's/\(.\{4\}\)\(.\{2\}\)/\1\/\2\//'); mkdir -p $j && mv $i $j; done;


check your work

# find . -type f | head -5
./2014/02/16/db301.20140216.log.gz
./2014/02/17/db301.20140217.log.gz
./2014/02/18/db301.20140218.log.gz
./2014/02/19/db301.20140219.log.gz
./2014/02/20/db301.20140220.log.gz

forgot to rename files

# for i in `find . -mindepth 3 -type d `; do pushd $i; for j in `ls`; do k=$(echo $j | sed -e 's/\(\.[0-9]\{8\}\)//' ); mv $j $k;done; popd; done;

check your work

# find . -type f | head -5
./2014/02/16/db301.log.gz
./2014/02/17/db301.log.gz
./2014/02/18/db301.log.gz
./2014/02/19/db301.log.gz
./2014/02/20/db301.log.gz

Using find to act on files is very useful, but if the files that are found need different actions based on their filetype, it gets a bit trickier. For example there are some log files foo.log but after 10 days they get compressed to foo.log.gz. So you are finding regular text files, as well as gzipped text files. Extend your find with an -exec and a bash shell to determine what file extension it is, and to run the appropriate grep or zgrep based on that. Then run it through awk or whatever else to parse out what you need.

# find . -type f -name 'foo.log*' -exec bash -c 'if [[ $0 =~ .log$ ]]; then grep foobar $0; elif [[ $0 =~ .log.gz$ ]]; then zgrep foobar $0; fi' {} \; | awk '{if(/typea/)a++; if(/typeb/)b++; tot++} END {print "typea: "a" - "a*100/tot"%"; print "typeb: "b" - "b*100/tot"%"; print "typec: "tot-(a+b)" - "(tot-(a+b))*100/tot"%"; print "total: "tot;}'
typea: 5301 - 67.4771%
typeb: 2539 - 32.3192%
typec: 16 - 0.203666%
total: 7856

find and search for string in gzipped and text logs

find logs dating back 3 weeks, if they are gzipped use zgrep, if they are a regular text log use grep, if they aren’t a log do nothing, search for the string in the found log file

# find /mnt/toaster1/logs/app_logs/application1/2014 -type f -mtime -21 -exec bash -c 'if [[ $0 == *.log ]]; then g=grep; elif [[ $0 == *.gz ]]; then g=zgrep; else g=:; fi; $g "foostring" $0' {} \;

use eval to run commands generated by awk

Here’s one way to generate a set of commands with awk, and then run them in a loop with eval.

# cat snippet
field1 /mnt/somedir/785/8785/948785 41 /mnt/somedir2/785/8785/948785 1 2
field1 /mnt/somedir/791/8791/948791 2 /mnt/somedir2/791/8791/948791 6 2
field1 /mnt/somedir/924/8924/948924 2 /mnt/somedir2/924/8924/948924 23 2
field1 /mnt/somedir/993/8993/948993 2 /mnt/somedir2/993/8993/948993 19876 2
field1 /mnt/somedir/3/9003/949003 8 /mnt/somedir2/3/9003/949003 273 2
field1 /mnt/somedir/70/9070/949070 341 /mnt/somedir2/70/9070/949070 6 2
field1 /mnt/somedir/517/4517/954517 2 /mnt/somedir2/517/4517/954517 14 2
field1 /mnt/somedir/699/4699/954699 210 /mnt/somedir2/699/4699/954699 1 2
field1 /mnt/somedir/726/4726/954726 1 /mnt/somedir2/726/4726/954726 6 2

Now use awk to get the output you want and generate commands. Use a forloop and eval to run them.

# for i in `awk '{if($3>$5) print "rsync -a --ignore-existing "$2"/ "$4}' left.compare.sorted  `; do echo $i; eval $i; done;
rsync -a --ignore-existing /mnt/somedir/70/9070/949070/ /mnt/somedir2/70/9070/949070
rsync -a --ignore-existing /mnt/somedir/699/4699/954699/ /mnt/somedir2/699/4699/954699
#

find directories owned by root

Find the directories owned by root in a certain part of the tree:

# find . -depth -mindepth 1 -maxdepth 3 -type d -ls | awk '$5 ~ /root/ {print}'
  7930    0 drwxr-xr-x  12 root root      115 Oct 11 16:44 ./562
3805069    0 drwxr-xr-x   3 root root       20 Oct 11 16:44 ./562/8562
  7946    0 drwxr-xr-x   5 root root       46 Dec  8 23:52 ./563/6563
  7947    0 drwxr-xr-x   3 root root      21 Oct 21  2008 ./563/6563/456563
3464735    0 drwxr-xr-x   2 root root        6 Sep 26 17:29 ./563/6563/436563
4075144    4 drwxr-xr-x   2 root root     4096 Dec  9 00:39 ./563/6563/2366563

Change all the ownership to www-data:

# find . -depth -mindepth 1 -maxdepth 3 -type d -exec chown www-data: {} \;

You could do this:

# cd .. && chown -R www-data: dirname

But we only suspect the problem at a certain level in the tree, and it would be way slow to recursively chown hundreds of millions of files.