Connect to MongoDB Atlas Replica Set via SSH Tunnels

Using an ssh tunnel to connect to a MongoDB Atlas hosted replica set is not straight forward. There are multiple members of the replica set, access restrictions based on IP, and no SSH access into the Mongo instances themselves. Here’s an example project to use a docker container to connect to a remote, hosted MongoDB replica set with a ssh tunnel through a bounce box with an authorized IP. You’re kidding right? Find the working project here.

list contents of all docker volumes

To list the contents of a docker named volume, run a temporary container and mount the volume into the container, then do a directory listing. Loop over all the volumes to see what each one holds.

~$ for i in `docker volume ls -q`; do echo "volume: ${i}"; docker run --rm -it -v ${i}:/vol alpine:latest ls /vol; echo; done;
volume: 140f898b1c69b85585942aa7f25cf03eba6ac66125d4a122e2fe99455c4a1a3f

volume: 1fa7b49173076a3a1fdb07ea7ce65d7187ff80e8b1a56e2fa667ebbbc0543f3a

volume: 5564a11a1945567ffcc231145c01c806afe13a02b3e0a548f1504a1cd36c9374

volume: 6d0b313416430d2abc0c872b98fd4180bbda4d14560c0a5d98f534f33b792164
app                 ib_buffer_pool      private_key.pem
auto.cnf            ib_logfile0         public_key.pem
ca-key.pem          ib_logfile1         server-cert.pem
ca.pem              ibdata1             server-key.pem
client-cert.pem     mysql               sys
client-key.pem      performance_schema

volume: 8b08da4a38ca8f5924b90220db8c84384d90fe331a953a5aaa1a1944d826fc68

volume: 976239a3528b8b0b074b6b7438552e1d22c4f069cf20582d2250fcc1c068dc4f

volume: aac262a93155286f4c551271d8d2a70f81ed1ca4cd56925e94006248e458895e

volume: b385ebee063b72350fbc1158788cbb43d7da9b37ec95196b74caa6b22b1c115b

volume: c73f2001f829e5574bd4246b2ab7a261a3f4d9a7ef89997765d7bf43883e5c24

volume: ce73f9f85c475b1fd9cf4fede20fd04250ee7702e83db67c29c7118055275c28

volume: foovolume1

volume: efc8a8855ac2c13d83c23573aebfd53e15072ec68d23e2793262f662ea0ae308

volume: foovolume2
auto.cnf                     ibdata1
ca-key.pem                   mysql
ca.pem                       performance_schema
client-cert.pem              private_key.pem
client-key.pem               public_key.pem
ib_buffer_pool               server-key.pem
ib_logfile0                  sys

volume: foovolume3

volume: phpsockettest

volume: foovolume4
auto.cnf            ib_logfile0         public_key.pem
ca-key.pem          ib_logfile1         server-cert.pem
ca.pem              ibdata1             server-key.pem
client-cert.pem     mysql               sys
client-key.pem      performance_schema  
ib_buffer_pool      private_key.pem

asdf   asdf1  asdf2

This installation has some test files, backing files from a few different mysql databases, a unix socket, redis files, and empty volumes.

connect to ssh tunnel on Mac host from inside docker container

In some very edge use cases you may want to temporarily connect to a remote service on a different network from within a docker container. In this example I needed to provide a solution to connect from a docker container on a Mac laptop to a database hosted in a subnet far far away. Here’s a proof of concept.

setup the ssh tunnel from the Mac host:

ssh -L 56789: fordodone@

This starts an ssh tunnel with the TCP port 56789 open on localhost, forwarding through the ssh tunnel to port 3306 on a host with IP address on the remote network

start a container:

docker run --rm -it alpine:latest sh

This runs an interactive shell in a temporary docker container from the alpine:latest image

inside the container add mysql client for testing:

/ # apk add --update --no-cache mysql-client

This fetches a temporary package list and installs mysql-client

create a database connection:

/ # mysql -h docker.for.mac.localhost --port 56789 -udbuser -p

Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 675073323
Server version: 5.6.27-log MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]> 
MySQL [(none)]> show databases;
| Database           |
| information_schema |
| awesome_app        |
| innodb             |
| mysql              |
| performance_schema |
5 rows in set (0.08 sec)

MySQL [(none)]> 

The magic here comes from the special docker.for.mac.localhost hostname. The internal Docker for Mac DNS resolver uses this special entry to return the internal IP address used by the Mac host. Tell the mysql client to use the port-forwarded TCP port --port 56789 and the mysql client inside the container connects through the ssh tunnel to the remote database.

using docker-compose to prototype against different databases

Greenfield projects come along with the huge benefit of not having any existing or legacy code or infrastructure to navigate when trying to design an application. In some ways having a greenfield app land in your lap is the thing a developer’s dreams are made of. Along with the amazing opportunity that comes with a “start from scratch” project, comes a higher level of creative burden. The goals of the final product dictate the software architecture, and in turn the systems infrastructure, both of which have yet to be conceived.

Many times this question (or one similarly themed) arises:

“What database is right for my application?”

Often there is a clear and straightforward answer to the question, but in some cases a savvy software architect might wish to prototype against various types of persistent data stores.

This docker-compose.yml has a node.js container and four data store containers to play around with: MySQL, PostgreSQL, DynamoDB, and MongoDB. They can be run simultaneously, or one at a time, making it perfect for testing these technologies locally during the beginnings of the application software architecture. The final version of your application infrastructure is still a long ways off, but at least it will be easy to test drive different solutions at the outset of the project.

version: '2'
    container_name: my-api-node
    image: node:latest
      - ./:/app/
      - '3000:3000'

    container_name: my-api-mysql
    image: mysql:5.7
    #image: mysql:5.6
      MYSQL_ROOT_PASSWORD: secretpassword
      MYSQL_USER: my-api-node-local
      MYSQL_PASSWORD: secretpassword
      MYSQL_DATABASE: my_api_local
      - my-api-mysql-data:/var/lib/mysql/
      - '3306:3306'

    container_name: my-api-pgsql
    image: postgres:9.6
      POSTGRES_USER: my-api-node-local-pgsqltest
      POSTGRES_PASSWORD: secretpassword
      POSTGRES_DB: my_api_local_pgsqltest
      - my-api-pgsql-data:/var/lib/postgresql/data/
      - '5432:5432'

    container_name: my-api-dynamodb
    image: dwmkerr/dynamodb:latest
      - my-api-dynamodb-data:/data
    command: -sharedDb
      - '8000:8000'

    container_name: my-api-mongo
    image: mongo:3.4
      - my-api-mongo-data:/data/db
      - '27017:27017'


I love Docker. I use Docker a lot. And like any tool, you can do really stupid things with it. A great piece of advice comes to mind when writing a docker-compose project like this one:

“Just because you can, doesn’t mean you should.”

This statement elicits strong emotions from both halves of a syadmin brain. The first shudders at the painful thought of running multiple databases for an application (local or otherwise), and the other shouts “Hold my beer!” Which half will you listen to today?

ssh deploy key for continuous delivery

One pattern I see over and over again when looking at continuous delivery pipelines, is the use of an ssh client and a private key to connect to a remote ssh endpoint. Triggering scripts, restarting services, or moving files around could all be part of your deployment process. Keeping a private ssh key “secured” is critical to limiting authorized access to your resources accessible by ssh. Whether you use your own in-house application (read “unreliable mess of shell scripts”), Travis CI, Bitbucket Pipelines, or some other CD solution, you may find yourself wanting to store a ssh private key for use during deployment.

Bitbucket Pipelines already has a built-in way to store and provide ssh deploy keys, however, this is an example alternative to roll your own. The steps are pretty simple. We create an encrypted ssh private key, it’s corresponding public key, and a 64 character passphrase for the private key. The encrypted private key and public key get checked in to the repository, and the passphrase gets stored as a “secured” Bitbucket Pipelines variable. During build time, the private key gets decrypted into a file using the Bitbucket Pipelines passphrase variable. The ssh client can now use that key to connect to whatever resources you need it to.

set -e

if [ ! `which openssl` ] || [ ! `which ssh-keygen` ] || [ ! `which jq` ] || [ ! `which curl` ]
  echo "need ssh-keygen, openssl, jq, and curl to continue"

# generate random string of 64 characters
echo "generating random string for ssh deploy key passphrase..."
DEPLOY_KEY_PASSPHRASE=`< /dev/urandom LC_CTYPE=C tr -dc A-Za-z0-9#^%@ | head -c ${1:-64}`

# save passphrase in file to be used by openssl
echo "saving passphrase for use with openssl..."
echo -n "${DEPLOY_KEY_PASSPHRASE}" >passphrase.txt

# generate encrypted ssh rsa key using passphrase
echo "generating encrypted ssh private key with passphrase..."
openssl genrsa -out id_rsa_deploy.pem -passout file:passphrase.txt -aes256 4096
chmod 600 id_rsa_deploy.pem

# decrypt ssh rsa key using passphrase
echo "decrypting ssh private key with passphrase to temp file..."
openssl rsa -in id_rsa_deploy.pem -out id_rsa_deploy.tmp -passin file:passphrase.txt
chmod 600 id_rsa_deploy.tmp

# generate public ssh key for use on target deployment server
echo "generating public key from private key..."
ssh-keygen -y -f id_rsa_deploy.tmp >

# remove unencrypted ssh rsa key
echo "removing unencrypted temp file..."
rm id_rsa_deploy.tmp

# ask user for bitbucket credentials
echo -n "enter bitbucket username: "
echo -n "enter bitbucket password: "
read -s BBPASS

# bitbucket API doesn't have "UPSERT" capability for creating(if not exists) or updating(if exists) variables
# get variable if exists
echo "getting variable uuid if variable exists"
DEPLOY_KEY_PASSPHRASE_UUID=`curl -s --user ${BBUSER}:${BBPASS} -X GET -H "Content-Type: application/json" | jq -r '.values[]|select(.key=="DEPLOY_KEY_PASSPHRASE").uuid'`

  # create bitbucket pipeline variable
  echo "DEPLOY_KEY_PASSPHRASE variable does not exist... creating DEPLOY_KEY_PASSPHRASE"
  curl -s --user ${BBUSER}:${BBPASS} -X POST -H "Content-Type: application/json" -d '{"key":"DEPLOY_KEY_PASSPHRASE","value":"'"${DEPLOY_KEY_PASSPHRASE}"'","secured":"true"}'

  # update existing bitbucket pipeline variable by uuid
  # use --globoff to avoid curl interpreting curly braces in the variable uuid
  echo "DEPLOY_KEY_PASSPHRASE variable exists... updating DEPLOY_KEY_PASSPHRASE"
  curl --globoff -s --user ${BBUSER}:${BBPASS} -X PUT -H "Content-Type: application/json" -d '{"key":"DEPLOY_KEY_PASSPHRASE","value":"'"${DEPLOY_KEY_PASSPHRASE}"'","secured":"true","uuid":"'"${DEPLOY_KEY_PASSPHRASE_UUID}"'"}' "${DEPLOY_KEY_PASSPHRASE_UUID}"


# after passphrase is stored in bitbucket remove passphrase file
  echo "DEPLOY_KEY_PASSPHRASE successfully stored on bitbucket, removing passphrase file..."
rm passphrase.txt

echo "add, commit, and push encypted private key and corresponding public key, update ssh targets with new public key"
echo "  ->  git add id_rsa_deploy.pem && git commit -m 'deploy ssh key roll' && git push"

We use bitbucket username and password to authenticate the person running the script, they need access to insert the new ssh deploy key passphrase as a “secured” variable using the bitbucket API. The person running the script never sees the passphrase and doesn’t care what it is. This script can be run easily to update the key pair and passphrase. It’s easy and fast because when you need to roll a compromised key, you should never have to remember that damn openssl command that you have used your entire career, but somehow have never memorized.

Here’s how you could use the key in a Bitbucket Pipelines build container:

set -e

# store passphrase from BitBucket secure variable into file
# file is on /dev/shm tmpfs in memory (don't put secrets on disk)
echo "creating passphrase file from BitBucket secure variable DEPLOY_KEY_PASSPHRASE"
echo -n "${DEPLOY_KEY_PASSPHRASE}" >/dev/shm/passphrase.txt

# use passphrase to decrypt ssh key into tmp file (again in memory backed file system)
echo "writing decrypted ssh key to tmp file"
openssl rsa -in id_rsa_deploy.pem -out /dev/shm/id_rsa_deploy.tmp -passin file:/dev/shm/passphrase.txt
chmod 600 /dev/shm/id_rsa_deploy.tmp

# invoke ssh-agent to manage keys
echo "starting ssh-agent"
eval `ssh-agent -s`

# add ssh key to ssh-agent
echo "adding key to ssh-agent"
ssh-add /dev/shm/id_rsa_deploy.tmp

# remove tmp ssh key and passphrase now that the key is in ssh-agent
echo "cleaning up decrypted key and passphrase file"
rm /dev/shm/id_rsa_deploy.tmp /dev/shm/passphrase.txt

# get ssh host key
echo "getting host keys"
ssh-keyscan -H >> $HOME/.ssh/known_hosts

# test the key
echo "testing key"
ssh "uptime"

It uses a tmpfs memory backed file system to store the key and passphrase, and ssh-agent to add the key to the session. How secure is secured enough? Whether you use the built-in Pipelines ssh deploy key, or this method to roll your own and store a passphrase in a variable, or store the ssh key as a base64 encoded blob in a variable, or however you do it, you essentially have to trust the provider to keep your secrets secret.

There are some changes you could make to all of this, but it’s good boilerplate. Other things to think about:

    rewrite this in python and do automated key rolls once a day with Lambda, storing the dedicated bitbucket user/pass and git key in KMS.
    do you really need ssh-agent?
    you could turn off strict host key checking instead of using ssh-keyscan
    could this be useful for x509 TLS certs?

make Makefile target for help or usage options

Using make and Makefiles with a docker based application development strategy are a great way to track shortcuts and allow team members to easily run common docker or application tasks without having to remember the syntax specifics. Without a “default” target make will attempt to run the first target (the default goal). This may be desirable in some cases, but I find it useful to have make just print out a usage, and require the operator to specify the exact target they need.

DE=docker-compose exec app

.PHONY: help
  @sh -c "echo ; echo 'usage: make <target> ' ; cat Makefile | grep ^[a-z] | sed -e 's/^/            /' -e 's/://' -e 's/help/help (this message)/'; echo"

  $(DC) up -d

  $(DC) stop

  $(DC) rm -v

  $(DC) ps

  $(DC) logs

  $(DE) sh -c "vendor/bin/phpunit"

Now without any arguments make outputs a nice little usage message:

$ make 

usage: make <target> 
            help (this message) 

This assumes a bunch of things like you must be calling make from the correct directory, but is a good working proof of concept.

use tmpfs for docker images

For i/o intensive Docker builds, you may want to configure Docker to use memory backed storage for images and containers. Ephemeral storage has several applications, but in this case our Docker engine is on a temporary EC2 spot instance and participating in a continuous delivery pipeline. In other words, it’s ok to loose the instance and all of the Docker images it has on it. This is for a systemd based system, in this case Ubuntu 16.04.

Create the tmpfs, then reconfigure the Docker systemd unit to use it:

mkdir /mnt/docker-tmp
mount -t tmpfs -o size=25G tmpfs /mnt/docker-tmp
sed -i 's|/mnt/docker|/mnt/docker-tmp|' /etc/systemd/system/docker.service.d/docker-startup.conf
systemctl daemon-reload
systemctl restart docker

This could be part of a bootstrapping script for build instances, or more effectively translated into config management or rolled into an AMI.

percentile apache server request response times

I needed a hack to quickly find the 95th percentile of apache request response times. For example I needed to be able to say that “95% of our apache requests are served in X milliseconds or less.” In the apache2 config the LogFormat directive had %D (the time taken to serve the request, in microseconds) as the last field. Meaning the last field of each log line would be the time it took to serve the request. This would make it easy to pull out with $NF in awk

# PCT=.95; NR=`cat access.log | wc -l `; cat /var/log/apache2/access.log | awk '{print $NF}' | sort -rn | tail -n+$(echo "$NR-($NR*$PCT)" |bc | cut -d. -f1) |head -1

In this case 95% of the apache requests were served in 938 milliseconds or less (WTF?!). Then run on an aggregated group of logs, or change the date/time range to just run for logs on a particular day, or for multiple time periods.

Note: I couldn’t get scale to work here in bc for some reason.

wget use gzip header to received compressed output

This test endpoint returns Content-Type: application/json

Without gzip enabled header:

$ wget -qO test https://testendpoint
$ file test
test: ASCII text, with very long lines, with no line terminators
$ du -b test
7307    test

Setting the gzip enabled header:

$ wget --header="accept-encoding: gzip" -qO test.gz https://testendpoint
$ file test.gz
test.gz: gzip compressed data, from Unix
$ du -b test.gz
1694    test.gz

Telling the server that wget can accept gzip compressed content results in 77% reduction in bytes transferred.