Creating DE environment 1 - general overview and Docker swarm

8 minute read

Introduction

Recently, I was such an inactive blogger. I was busy working, learning new things, and transforming my server to the new cloud provider. I have a few ideas for blog projects, but mostly they require a bigger computational power to work properly. I could just scale up my VPS vertically, but I think the more elegant and interesting is scaling it up horizontally. Distributed systems and parallel processing are such important things in the data engineering field nowadays, so it is good to have some practical knowledge about them. I suppose it will be a series of short posts, containing:

  1. General overview and setting Docker swarm
  2. Setting the Caddy web server and Authelia
  3. Setting the Dask cluster
  4. Setting the MongoDB, PostgresDB and DbGate
  5. Setting the Prefect orchestration tool

The clean infrastructure will also be crucial for my next projects and posts. I hope I will be able to deploy things much faster in the future. I would also like to inform you that the posts are not tutorials. To be honest I will use a lot of other tutorials and documentation from the web (ofc I will add them to the references as always). This series of posts is something like a diary for me. The two main tools I will use in the posts are Ansible - an open-source IT automation tool and Docker - an open-source containerization tool. My whole infrastructure will be hosted on OVH. I choose Dask as my parallel computing library because it is more lightweight. It is really important for my cheap instances.

The goal is to build an architecture with the 3 nodes:

  • The data-master-node - swarm manager node for the Dask cluster. It will contain containers with all tools like databases, schedulers, or brokers, Caddy web server with authentication and authorization service - Authelia, and act like a master node (scheduler) for the Dask and. Everything except Dask services will be deployed as standalone containers.
  • The data-worker-node-1 and the data-worker-node-2 - just Dask workers, reponsible for the computations only.

I choose the docker swarm as the deployment tool for my Dask cluster because it is relatively easy to set up. My infrastructure will be also humble, so I think using something like Kubernetes would be overkill.

Let’s get to work!

Initial configuration

Here I would like to show a few obvious steps I would like to perform on every machine. The list contains:

  • installing aptitude - a package manager with several useful features. It is optional, but aptitude makes working with Ansible quite easier,
  • setting the passwordless sudo,
  • creating a new user with sudo privileges - avoiding extensive use of the root user is a good practice,
  • setting alternate SSH port - it is also good safety practice, you can avoid some automatic attack attempts by simply doing this,
  • adding SSH setup - using SSH keys instead of passwords is a good practice also,
  • disabling SSH access for root users - also the safety reasons,
  • installing required packages - I need only ufw to open only the necessary ports, python 3 with pip to install docker-compose because Ansible requires the version from pip for some reason, and jsondiff - it is a dependency of docker_stack module
  • configuring firewall.

The first thing to do should be to specify inventory in the hosts file. As you could notice in the initial configuration SSH port of the machines will change. That’s why I usually create the group for_initial_config with all machines and port 22. This group is used just once. Then I create separated groups for different use cases, but with the alternative SSH port for every machine. The hosts file should be like this:

[for_initial_config]
data-master-node-1_init ansible_port=22 ansible_host=<ip>
data-worker-node-1_init ansible_port=22 ansible_host=<ip>
data-worker-node-2_init ansible_port=22 ansible_host=<ip>

[data_worker_nodes]
data-worker-node-1 ansible_port=1212 ansible_host=<ip>
data-worker-node-2 ansible_port=1212 ansible_host=<ip>

[data_master_nodes]
data-master-node-1 ansible_port=1212 ansible_host=<ip>

Now let’s create an initial_config.yaml playbook.

# initial_config.yaml
---
- hosts: for_initial_config
  become: true
  vars:
    created_username: charizard

  tasks:
    - name: Install aptitude
      apt:
        name: aptitude
        state: latest
        update_cache: true

    - name: Setup passwordless sudo
      lineinfile:
        path: /etc/sudoers
        state: present
        regexp: '^%sudo'
        line: '%sudo ALL=(ALL) NOPASSWD: ALL'
        validate: '/usr/sbin/visudo -cf %s'

    - name: Create a new regular user with sudo privileges
      user:
        name: "{{ created_username }}"
        state: present
        groups: sudo
        append: true
        create_home: true

    - name: Setup alternate SSH port
      lineinfile:
        dest: "/etc/ssh/sshd_config"
        regexp: "^Port"
        line: "Port 1212"
      notify: "Restart sshd"

    - name: Set authorized key for remote user
      ansible.posix.authorized_key:
        user: "{{ created_username }}"
        state: present
        key: "{{ lookup('file', lookup('env','HOME') + '/.ssh/id_rsa.pub') }}"

    - name: Disable ssh access for root user
      lineinfile:
        path: /etc/ssh/sshd_config
        state: present
        regexp: '^#?PermitRootLogin'
        line: 'PermitRootLogin no'

    - name: Update apt and install required system packages
      apt:
        pkg:
          - ufw
          - python3
          - python3-pip
          - jsondiff
        state: latest
        update_cache: true

    - name: UFW - Allow SSH connections
      community.general.ufw:
        rule: allow
        port: '1212'
        proto: tcp

    - name: UFW - Allow all access to tcp port 80
      community.general.ufw:
        rule: allow
        port: '80'
        proto: tcp

    - name: UFW - Allow all access to tcp port 443
      community.general.ufw:
        rule: allow
        port: '443'
        proto: tcp

    - name: UFW - Allow all access to tcp port 2377
      community.general.ufw:
        rule: allow
        port: '2377'
        proto: tcp

    - name: UFW - Allow all access to tcp port 7946
      community.general.ufw:
        rule: allow
        port: '7946'
        proto: tcp

    - name: UFW - Allow all access to udp port 7946
      community.general.ufw:
        rule: allow
        port: '7946'
        proto: udp

    - name: UFW - Allow all access to udp port 4789
      community.general.ufw:
        rule: allow
        port: '4789'
        proto: udp

    - name: UFW - Enable and deny by default
      community.general.ufw:
        state: enabled
        default: deny

  handlers:
    - name: Restart sshd
      service:
        name: sshd
        state: restarted

As you can the tasks are pretty straightforward. All the opened ports except HTTP/HTTPS and SSH are required for the Docker swarm mode. The playbook can be executed with the following commands:

ansible-playbook initial_config.yaml -l data-master-node_init -u root -k
ansible-playbook initial_config.yaml -l data-worker-node-1_init -u root -k
ansible-playbook initial_config.yaml -l data-worker-node-2_init -u root -k

As you can see I run the playbook separately for every host. Ansible is usually used with passwordless sudo, but this is the initial config when I configure this feature. The -k flag is quite important for running it for the first time, it allows the user to log in with the SSH password. It is not that easy to run it with multiple hosts, because of just one password prompt. It can be also done by specifying the passwords in the inventory file with the ansible_ssh_pass variable or using Ansible Vaults. I have just 3 instances, so I can run it manually because it is the safest way in my opinion.

Install docker

Installing Docker with an Ansible is just like following the Docker installation guide. The one thing to note is installing the docker-compose with pip, it is some strange Ansible requirement. I also create the Docker directory with reading, writing, and executing permissions for my user, just to keep all services’ Docker files here.

# docker_install.yaml
---
- name: install Docker
  hosts: data_worker_nodes:data_master_nodes
  become: true
  tasks:
    - name: Install apt-transport-https
      ansible.builtin.apt:
        name:
          - apt-transport-https
          - ca-certificates
          - lsb-release
          - gnupg
        state: latest
        update_cache: true

    - name: Add signing key
      ansible.builtin.apt_key:
        url: "https://download.docker.com/linux/debian/gpg"
        state: present

    - name: Add repository into sources list
      ansible.builtin.apt_repository:
        repo: "deb https://download.docker.com/linux/debian bullseye stable stable"
        state: present
        filename: docker
    - name: Install Docker
      ansible.builtin.apt:
        name:
          - docker-ce
          - docker-ce-cli
          - containerd.io
        state: latest
        update_cache: true

    - name: Install python docker-compose sdk for ansible
      ansible.builtin.pip:
        name: docker-compose

    - name: Make sure Docker is active
      service:
        name: docker
        state: started
        enabled: yes

    - name: Add remote user to docker group
      user:
        name: charizard
        groups: "docker"
        append: yes

    - name: Create Docker directory
      file:
        path: /docker
        state: directory
        owner: charizard
        mode: '0766'

Let’s execute this playbook with a command:

ansible-playbook docker_install.yaml -u charizard

Creating the swarm

Docker swarm is a container orchestration tool. It helps with managing multiple containers deployed on multiple host instances. All of the features can be found here. Creating the swarm manually is not so hard task, but doing it with Ansible is more convenient, especially if there are a lot of nodes. Luckily, Ansible has modules - docker_swarm and docker_node - that makes the whole process fast and easy.

The first two tasks are initiating the swarm and setting the join_token_worker variable. The token is required to join other nodes to the swarm. Those tasks will be executed on the master node machine - which will be the swarm manager.

# swarm_init.yaml
---
  - name: Init a new swarm with default parameters
    docker_swarm:
      state: present
    register: init_swarm

  - name: Set fact - join token worker
    set_fact:
      join_token_worker: "{{ init_swarm.swarm_facts.JoinTokens.Worker }}"

The second thing to do is add the worker nodes. This task will run on worker machines, but requires two external variables from the master host:

  • the token we obtained previously,
  • host IP address
# swarm_join_workers.yaml
---
  - name: Add nodes
    docker_swarm:
      state: join
      join_token: "{{ hostvars[groups['data_master_nodes'][0]].join_token_worker }}"
      remote_addrs: "{{ hostvars[groups['data_master_nodes'][0]].ansible_host }}:2377"

The last thing is labeling the nodes, just for easier working with docker-compose in the future. This task needs to be executed on the swarm manager machine.

# swarm_label_nodes.yaml
  - name: Give a label for a master
    docker_node:
      hostname: "{{ ansible_hostname }}"
      labels:
        role: master
      labels_state: merge

  - name: Give a label for a data worker 1
    docker_node:
      hostname: "{{ hostvars[groups['data_worker_nodes'][0]].ansible_hostname }}"
      labels:
        role: data-worker
      labels_state: merge

  - name: Give a label for a data worker 2
    docker_node:
      hostname: "{{ hostvars[groups['data_worker_nodes'][1]].ansible_hostname }}"
      labels:
        role: data-worker
      labels_state: merge

All tasks can be executed from the single playbook:

# swarm_deploy.yaml
---
- hosts: data_master_nodes
  become: true
  tasks:
    - include_tasks: swarm_init.yaml

- hosts: data_worker_nodes
  become: true
  tasks:
    - include_tasks: swarm_join_workers.yaml

- hosts: data_master_nodes
  become: true
  tasks:
    - include_tasks: swarm_label_nodes.yaml

Just run this command:

ansible-playbook swarm_deploy.yaml -u charizard

The swarm is running now. It can be checked running the following command from the master node:

docker node ls

Docker and UFW issue

Docker bypassing the UFW rules is a popular problem for years. Every published port by Docker can be accessed from the outside, no matter if you block it in UFW. The problem is super serious, it breaks the whole UFW intention. Luckily the not-so-elegant solution is here, it requires modifying UFW after.rules file. I won’t do this manually, because there is already awesome repo with a helpful utility script, that also supports the swarm mode. Let’s install it with Ansible:

# ufw_docker.yaml
- hosts: data_master_nodes
  become: true

  tasks:
    - name: Download ufw-docker script
      ansible.builtin.get_url:
        url: https://github.com/chaifeng/ufw-docker/raw/master/ufw-docker
        dest: /usr/local/bin/ufw-docker
        mode: 'u+x'

    - name: Uwf-docker install
      ansible.builtin.command: ufw-docker install

    - name: Reload ufw
      community.general.ufw:
        state: reloaded

It requires installing only on the manager node.

ansible-playbook ufw_docker.yaml -u charizard

Now the easier firewall tasks can be accomplished with ufw-docker commands and the more complicated ones with ufw route allow.

Conclusion

I think there are a lot of ways to build the simple Docker swarm architecture. My way is not the most professional one, but I think it is easy to understand and will serve its purpose. Working with Ansible is charming. It is an impressive tool and I think I will use it a lot in the future. Cya in the next post from the series!

References

  1. https://www.digitalocean.com/community/tutorials/how-to-use-ansible-to-automate-initial-server-setup-on-ubuntu-20-04
  2. https://towardsdatascience.com/diy-apache-spark-docker-bb4f11c10d24
  3. https://www.seelk.co/blog/docker-swarm-on-aws-with-ansible/
  4. https://www.howtogeek.com/devops/how-to-use-docker-with-a-ufw-firewall/
  5. https://blog.neuvector.com/article/docker-swarm-container-networking