using docker-compose to prototype against different databases

Greenfield projects come along with the huge benefit of not having any existing or legacy code or infrastructure to navigate when trying to design an application. In some ways having a greenfield app land in your lap is the thing a developer’s dreams are made of. Along with the amazing opportunity that comes with a “start from scratch” project, comes a higher level of creative burden. The goals of the final product dictate the software architecture, and in turn the systems infrastructure, both of which have yet to be conceived.

Many times this question (or one similarly themed) arises:

“What database is right for my application?”

Often there is a clear and straightforward answer to the question, but in some cases a savvy software architect might wish to prototype against various types of persistent data stores.

This docker-compose.yml has a node.js container and four data store containers to play around with: MySQL, PostgreSQL, DynamoDB, and MongoDB. They can be run simultaneously, or one at a time, making it perfect for testing these technologies locally during the beginnings of the application software architecture. The final version of your application infrastructure is still a long ways off, but at least it will be easy to test drive different solutions at the outset of the project.

version: '2'
services:
  my-api-node:
    container_name: my-api-node
    image: node:latest
    volumes:
      - ./:/app/
    ports:
      - '3000:3000'

  my-api-mysql:
    container_name: my-api-mysql
    image: mysql:5.7
    #image: mysql:5.6
    environment:
      MYSQL_ROOT_PASSWORD: secretpassword
      MYSQL_USER: my-api-node-local
      MYSQL_PASSWORD: secretpassword
      MYSQL_DATABASE: my_api_local
    volumes:
      - my-api-mysql-data:/var/lib/mysql/
    ports:
      - '3306:3306'

  my-api-pgsql:
    container_name: my-api-pgsql
    image: postgres:9.6
    environment:
      POSTGRES_USER: my-api-node-local-pgsqltest
      POSTGRES_PASSWORD: secretpassword
      POSTGRES_DB: my_api_local_pgsqltest
    volumes:
      - my-api-pgsql-data:/var/lib/postgresql/data/
    ports:
      - '5432:5432'

  my-api-dynamodb:
    container_name: my-api-dynamodb
    image: dwmkerr/dynamodb:latest
    volumes:
      - my-api-dynamodb-data:/data
    command: -sharedDb
    ports:
      - '8000:8000'

  my-api-mongo:
    container_name: my-api-mongo
    image: mongo:3.4
    volumes:
      - my-api-mongo-data:/data/db
    ports:
      - '27017:27017'

volumes:
  my-api-mysql-data:
  my-api-pgsql-data:
  my-api-dynamodb-data:
  my-api-mongo-data:

I love Docker. I use Docker a lot. And like any tool, you can do really stupid things with it. A great piece of advice comes to mind when writing a docker-compose project like this one:

“Just because you can, doesn’t mean you should.”

This statement elicits strong emotions from both halves of a syadmin brain. The first shudders at the painful thought of running multiple databases for an application (local or otherwise), and the other shouts “Hold my beer!” Which half will you listen to today?

list tables in mysql dump

$ zgrep -o '^CREATE.*' database_backup.sql.gz

CREATE TABLE `aa_migrations` (
CREATE TABLE `abcdefg_bar` (
CREATE TABLE `abcdefg_foo` (
CREATE TABLE `abcdefg_images` (
CREATE TABLE `abcdefg_table12` (
CREATE TABLE `abcdefg_table13` (
CREATE TABLE `abcdefg_table14` (
CREATE TABLE `abcdefg_table15` (
CREATE TABLE `abcdefg_users` (

You could also just count them:

$ zgrep -o '^CREATE.*' database_backup.sql.gz | wc -l
9

tcpdump mysql queries

If you have access to the MySQL server and logging is turned on then you have access to the queries as they are logged. Many production databases do not have logging turned on, simply because there are too many queries to handle. Also, there could be hundreds of servers hitting the logs at any given time, making it hard to see activity from a particular client. To take a look at MySQL queries as they leave a webserver you can use tcpdump and massage the output to get you what queries are being sent from that host.

# tcpdump -i eth0 -l -s 0 -w - dst port 3306 | stdbuf -o0 strings| stdbuf -o0 grep "SELECT\|INSERT\|UPDATE|\FROM\|WHERE\|ORDER\|AND\|LIMIT\|FROM\|SET\|COMMIT\|ROLLBACK"

Sometimes the query gets broken up into pieces if WHERE or LIMIT is used, and those pieces wind up on separate lines so we need to grep for them separately. Use stdbuf to force all the pipes to NOT buffer output, i.e. print output in pseudo real time.

count rows in mysqldump

Get rough estimate of rows from multiple mysqldump files from a single table split into multiple sections:

# cat table1.*.sql |sed -e 's/),(/\n/g' | wc -l
4699008

Here’s from the actual table:

mysql> select count(id) from table;
+-----------+
| count(id) |
+-----------+
|   4692064 |
+-----------+
1 row in set (0.00 sec)

Counting the inserts in the mysqldump is a good rough estimate. In this case it’s off by about .1%, because of pattern interpretation or shift in database between dump and the count.

get live config values on MySQL Cluster ndbcluster

Sometimes it’s necessary to double check configuration values that are actually being used by the live ndbd/ndbmtd process

Some operational configs for one node:

# ndb_config -r'\n' -f: --nodeid=12 -q id,MaxNoOfConcurrentIndexOperations,MaxNoOfFiredTriggers --config-from-node=12 
12:81920:40000

Memory allocations on data (ndbd) nodes:

# ndb_config -r'\n' -f: --type ndbd -q id,DataMemory,IndexMemory --config-from-node=12
11:51539607552:4294967296
12:51539607552:4294967296
13:51539607552:4294967296
14:51539607552:4294967296
15:51539607552:4294967296
16:51539607552:4294967296
17:51539607552:4294967296
18:51539607552:4294967296
19:51539607552:4294967296
20:51539607552:4294967296
21:51539607552:4294967296
22:51539607552:4294967296
23:51539607552:4294967296
24:51539607552:4294967296
25:51539607552:4294967296
26:51539607552:4294967296

Due to config caching, pending restarts, or mismatch in management server (ndb_mgmd) nodes, the values in config.ini might be different than what is actually being used by the ndbcluster processes

compare mysql datetime timestamp days ago

To get all entries in a table with a “datetime” field that is newer than 3 days ago (i.e. maybe a user last logged in in the last 72 hours):

mysql> select id,timestamp_field from sometable where timestamp_field > ( CURDATE() - INTERVAL 3 DAY );

Note that you could just use “where like” if you were looking for just today, or just yesterday, or just some month. In this case “newer than 3 days” means “since right now minus 72 hours”

There’s a multitude of ways to compare timestamps from a point in time (i.e. now()).

Here’s older than 3 months:

 timestamp_field < ( CURDATE() - INTERVAL 3 MONTH )