Creating DE environment 4 - Postgres, MongoDB and DbGate

2022-11-20

6 minute read

data engineering , docker , dataops , ansible , mongodb , postgres , dbgate

Introduction

The databases are the heart of data-related projects. They can be also dependencies of the other tools, for example, Authelia configured previously. In this post I will configure two database management systems:

PostgreSQL - a powerful, open-source object-relational database system,
MongoDB - a document database management system used to build highly available and scalable internet applications.

As you could notice the PostgreSQL was deployed in the post about Caddy and Authelia. This is the magic of containers. It is possible to easily have multiple instances of the same software. The database software I will deploy in this post will act as a tool for my hobby projects. They will store a lot of data for web apps, and APIs or just be transformed in some way. Personally, I wouldn’t like to store that data in the same place with sensitive data like credentials. To be honest, a lot of software might depend on another instance of PostgreSQL in the future also. I have used PostgreSQL and MongoDB a lot in the past. That’s the main reason I think they are must have in my environment. Obviously, I won’t rely on them forever. Tools should always be selected according to the needs. However, I am sure that I will always need to have PostgreSQL and MongoDB in the stack, so it is good to own them in the “infrastructure as a code” way. Deploying both databases with Docker is pretty straightforward, but doing it more in a secure way is not that obvious, so it might be an interesting outcome from this post.

It is also useful to have some UI to manage the database. A lot can be also done with the command line tools, but database managers are usually easier and more effective for users who are not well-prepared database administrators. Most database systems have dedicated tools. For example, PostgreSQL has pgAdmin and MongoDB has Mongo Express. However, I would like to manage my databases from one place. There are not so many open-source tools that can manage both relational and nonrelational databases. Luckily, there is a DbGate - an awesome, open-source cross-platform database manager. It has also a web version with a Docker image provided - perfect.

Both databases and DbGate will be deployed to the master node.

Let’s get to work!

DbGate

A quick tutorial for deploying DbGate as a container can be found in the docker hub. The only changes to the default docker-compose file I make here are:

adding the credentials for accessing the web app,
exposing port instead of mapping,
creating a docker network dbgate-network, because the databases will also be deployed in containers.

The file looks like this:

# composes/dbgate/docker-compose.yaml
version: '3.8'

services:
  dbgate:
    image: dbgate/dbgate
    container_name: dbgate
    restart: always
    expose:
      - 3000
    volumes:
      - dbgate-data:/root/.dbgate
    networks:
      - dbgate-network
    environment:
      LOGINS: charizard_dbgate
      LOGIN_PASSWORD_charizard_dbgate: ${DBG_PASSWORD}

      CONNECTIONS: con1,con2

      LABEL_con1: Postgres
      SERVER_con1: postgres
      USER_con1: ${PG_USER}
      PASSWORD_con1: ${PG_PASSWORD}
      PORT_con1: 5432
      ENGINE_con1: postgres@dbgate-plugin-postgres

      LABEL_con2: MongoDB
      URL_con2: ${MONGO_AUTH}
      ENGINE_con2: mongo@dbgate-plugin-mongo


volumes:
  dbgate-data:
    driver: local

networks:
  dbgate-network:
    name: dbgate-network
  caddy-network:
	  name: caddy-network
	  external: true

And the Ansible playbook:

# deploy_dbg.yaml
---
- hosts: data_master_nodes
  vars:
    docker_compose_dir: /docker/dbgate/

  tasks:
    - name: Copy compose source template
      template:
        src: ./composes/dbgate/docker-compose.yaml
        dest: "{{ docker_compose_dir }}"
        mode: 0600

    - name: Copy .env
      template:
        src: ./composes/dbgate/.env
        dest: "{{ docker_compose_dir }}"
        mode: 0600

    - name: Build dbgate
      community.docker.docker_compose:
        project_src: "{{ docker_compose_dir }}"
      register: output

Now let’s execute it with:

ansible-playbook deploy_dbg.yaml -u charizard

The app will be deployed successfully, but won’t work properly yet, because pgsql and Mongo are not deployed yet, and the port is not opened. It is possible to check if everything works after deploying the databases. It just requires adding the DbGate container to the Caddyfile. The service has own authentication, but having it also behind the Authelia is a good idea.

dbmonitor.brozen.best {
forward_auth authelia:9091 {
uri /api/verify?rd=https://whoareyou.brozen.best/
copy_headers Remote-User Remote-Groups Remote-Name Remote-Email
}

reverse_proxy dbgate:3000
}

Now it should be possible to connect to the web app with the credentials and have the successfully connected pgsql and MongoDB.

PostgreSQL

PostgreSQL container can be deployed in a second with Docker. However, there are a few tricks that can help run more securely and smoothly. First of all, is having strong credentials, here I use also environment variables for my docker-compose file. The second thing is using persistent volumes. It is essential, without them all of the data will be lost when the container restarts. The image I use is the alpine version because it has the smallest size. Alpine is also known as a secure distribution. I also use the health check feature of the docker-compose. It checks if the database is ready using the pg_isready command. I also use the expose instead of the ports, because I don’t need them to be accessible outside their assigned networks. The compose file looks like this:

# composes/pgsql/docker-compose.yaml
version: "3"

services:
  postgres:
    container_name: postgres
    image: postgres
    environment:
      - POSTGRES_USER=${PG_USER}
      - POSTGRES_PASSWORD=${PG_PASSWORD}
    volumes:
      - pg-data:/var/lib/postgresql/data
    networks:
      - mongodb-network
      - dbgate-network
    expose:
      - 5432
    healthcheck:
      test:
        ["CMD", "pg_isready", "-q", "-d", "postgres", "-U", "${PG_USER}"]
      timeout: 45s
      interval: 10s
      retries: 10
    restart: unless-stopped

volumes:
  pg-data:
    name: pg-data

networks:
  postgres-network:
    name: postgres-network
  dbgate-network:
	  name: dbgate-network
	  external: true

Ansible playbook looks like this:

# deploy_pg.yaml
---
- hosts: data_master_nodes
  vars:
    docker_compose_dir: /docker/pgsql/

  tasks:
    - name: Copy compose source template
      template:
        src: ./composes/pgsql/docker-compose.yaml
        dest: "{{ docker_compose_dir }}"
        mode: 0600

    - name: Copy .env
      template:
        src: ./composes/pgsql/.env
        dest: "{{ docker_compose_dir }}"
        mode: 0600

    - name: Build postgres
      community.docker.docker_compose:
        project_src: "{{ docker_compose_dir }}"
      register: output

It copies the files to the docker/pgsql directory and runs the docker-compose using the adequate Ansible module. Now just run it with:

ansible-playbook deploy_pg.yaml -u charizard

Unfortunately, having a PostgreSQL database in a container is quite limited. Databases are stateful software, while Docker containers are more useful for stateless ones. Terminating a container in the middle of the transaction could lead to a tragedy. That’s why containerized PostgreSQL is not recommended for the production environment. Personally, I think it is good enough for hobby projects like mine.

MongoDB

Building a MongoDB container is also fairly easy. I usually make just a few changes to the default docker-compose provided by the product creators. Obviously, the data volume should be defined to make the container more portable and avoid data loss. I also use expose instead of ports. The important thing to note is that the authentication in MongoDB is disabled by default. To enable it the root environmental variables and --auth command should be included in the compose file.

version: "3"

services:
  mongodb:
    image: mongo
    container_name: mongodb
    environment:
      - MONGO_INITDB_ROOT_USERNAME=${MONGO_ADMIN}
      - MONGO_INITDB_ROOT_PASSWORD=${MONGO_PASS}
    volumes:
      - mongo-data:/data/db
    networks:
      - mongodb-network
      - dbgate-network
    expose:
      - 27017
    restart: unless-stopped
    command: [ --auth ]

volumes:
  mongo-data:
    name: mongo-data

networks:
  mongodb-network:
    name: mongodb-network
  dbgate-network:
	  name: dbgate-network
	  external: true

The Ansible playbook looks like this:

# deploy_mongo.yaml
---
- hosts: data_master_nodes
  vars:
    docker_compose_dir: /docker/mongo/

  tasks:
    - name: Copy compose source template
      template:
        src: ./composes/mongo/docker-compose.yaml
        dest: "{{ docker_compose_dir }}"
        mode: 0600

    - name: Copy .env
      template:
        src: ./composes/mongo/.env
        dest: "{{ docker_compose_dir }}"
        mode: 0600

    - name: Build mongo
      community.docker.docker_compose:
        project_src: "{{ docker_compose_dir }}"
      register: output

And can be executed like always:

ansible-playbook deploy_mongo.yaml -u charizard

MongoDB container usually goes with the mongo-express container. However, I do not use it, because I will have another administration interface. Mongo is also a database, so having a production version of it in the Docker container is like playing in the fire, but for my simple usage is good enough in my opinion.

Conclusion

As you can see deploying the database software with Docker is not so complicated. However, having the databases in containers is not the best way for production. It would be better to own them as stateful software. Also, there is a lot of different software to help with managing the databases. I think the DbGate is a brilliant choice here, despite not so immense popularity. The databases are already here, so let’s configure the Prefect orchestration tool, which will use pgsql. Cya in the next post from the series!

Introduction

DbGate

PostgreSQL

MongoDB

Conclusion

References