Single node dev cluster with the Hashicorp stack

2021-01-13

With the current blog post I'll present a way to set up a single node cluster with the help of Hashicorp's tools (Nomad and Consul) and Docker. This setup is intended for small teams and companies that are at early stages of their development, teams that don't have or are with limited Ops capacity.


TL;DR: This is a summary step-by-step tutorial to provision a single node cluster with Hashicorp's Nomad and Consul, based on their official production deployment guides. You can check my GitHub repository: https://github.com/vpenkoff/hashicorp-dev-cluser to find Ansible automation for it.


The presented cluster is similar to the one I'm managing at my current company - GO2 Markets GmbH, and we are a team of 1.

The cluster gives you an environment to run containerized applications with monitoring, health-checking and service discovery. You can scale easily by tweaking the applications deployment jobs.

One of the advantages of using the Hashicorp's tools is the simplicity and the seamless integration between them. The tools follow the UNIX philosophy of doing one thing, and doing it well. Second, setting up a full Kubernetes installation for a production environment is complex and resource intensive. Of course, there are plenty of managed Kubernetes distribution like RedHat's OpenShift, Amazons EKS, Google's GKE, Linode's LKE and so on, but they are for companies that can afford it, or they don't want to spend time on managing K8S themselves.

In contrast, Nomad and Consul are just single binaries, where each binary can run as server or client. Moreover, they are far easier to operate, thanks to nicely done CLI and HTTP interface.


CAUTION: this cluster is intended for development environments. It's NOT INTENDED FOR PRODUCTION ENVIRONMENT.


So, let's get started with a list of steps that we need to perform:
  1. Provision a Linux host

    The easiest way to start is to run Debian, or Debian based linux distribution on some kind of virtualization. I'm running Debian Buster on KVM/QEMU with libvirt. Feel free to choose the easiest way for you - VirtualBox, Vagrant, or directly in the cloud (AWS, Digital Ocean, Google)...

  2. Installing Docker

    Following the official guide, let's install docker:

    $: sudo apt-get install apt-transport-https \ ca-certificates \ curl \ gnupg-agent \ software-properties-common

    Adding Docker's official GPG Key:

    $: curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -

    Verify the apt key:

    $: sudo apt-key fingerprint 0EBFCD88

    Add the official Debian repo:

    $: sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/debian \ $(lsb_release -cs) \ stable"

    Install the docker engine:

    $: sudo apt-get update $: sudo apt-get install docker-ce docker-ce-cli containerd.io
  3. Installing and setting up Consul

    As a reference, this is the deployment guide form Hashicorp. Let's download the binary:

    $: export CONSUL_VERSION="1.9.1" $: export CONSUL_URL="https://releases.hashicorp.com/consul" $: curl --silent --remote-name \ ${CONSUL_URL}/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_linux_amd64.zip

    Unzip, change owner to root and move to bin location:

    $: unzip consul_${CONSUL_VERSION}_linux_amd64.zip $: sudo chown root:root consul $: sudo mv consul /usr/local/bin/

    Let's add a new dummy user, which will run consul:

    $: sudo useradd --system --home /etc/consul.d --shell /bin/false consul

    We need to create the data dir for consul:

    $: sudo mkdir --parents /opt/consul $: sudo chown --recursive consul:consul /opt/consul

    With the following line we generate an encryption key to ensure integrity between consul agents. Store it somewhere, as we gonna needed if we want to connect more agents.

    $: consul keygen

    Now let's configure our agent. We need to create a new directory where the configurations are stored, usually in /etc.

    $: sudo mkdir --parents /etc/consul.d $: sudo touch /etc/consul.d/consul.hcl $: sudo chown --recursive consul:consul /etc/consul.d $: sudo chmod 640 /etc/consul.d/consul.hcl

    Now let's put the following configuration in /etc/consul.d/consul.hcl:

    datacenter = "dc1" data_dir = "/opt/consul" encrypt = "the encryption key mentioned above (the output from consul keygen)" retry_join = ["127.0.0.1"] bind_addr = "127.0.0.1"

    In a similar way, let's configure the server:

    $: sudo touch /etc/consul.d/server.hcl $: sudo chown --recursive consul:consul /etc/consul.d $: sudo chmod 640 /etc/consul.d/server.hcl

    Add the following configuratuin in /etc/consul.d/server.hcl:

    datacenter = "dc1" data_dir = "/opt/consul" server = true bootstrap_expect = 1 ui = true client_addr = "127.0.0.1"

    Now the last step is to add consul to systemd. First add the following configuration to /usr/lib/systemd/system/consul.service:

    [Unit] Description="HashiCorp Consul - A service mesh solution" Documentation=https://www.consul.io/ Requires=network-online.target After=network-online.target ConditionFileNotEmpty=/etc/consul.d/consul.hcl [Service] Type=notify User=consul Group=consul ExecStart=/usr/local/bin/consul agent -config-dir=/etc/consul.d/ ExecReload=/bin/kill --signal HUP $MAINPID KillMode=process KillSignal=SIGTERM Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target

    Then, let's make systemd register it as a service:

    $: sudo systemctl enable consul $: sudo systemctl start consul $: sudo systemctl status consul
  4. Installing and configuring dnsmasq

    Let's install dnsmasq, so we can use it as forward DNS proxy to Consul. This way requests between running services could be done by resolving their DNS names, i.e. app1.service.consul. The application DNS names are configured in the Nomad job templates and are registered to Consul on deployment time.

    $: sudo apt-get install dnsmasq

    Let's add new configuration for Consul in the dnsmasq.d directory, /etc/dnsmasq.d/10-consul:

    # Enable forward lookup of the 'consul' domain: server=/consul/127.0.0.1#8600

    We need to restart dnsmasq, so the new configuration is reflected:

    $: sudo systemctl restart dnsmasq.d
  5. Installing and setting up Nomad

    Let's download and unzip nomad's binary now:

    $: export NOMAD_VERSION="1.0.1" $: curl --silent --remote-name https://releases.hashicorp.com/nomad/${NOMAD_VERSION}/nomad_${NOMAD_VERSION}_linux_amd64.zip $: unzip nomad_${NOMAD_VERSION}_linux_amd64.zip

    We need to change its owner and put it in /usr/local/bin:

    $: sudo chown root:root nomad $: sudo mv nomad /usr/local/bin/

    Let's create the data dir of Nomad:

    $: sudo mkdir --parents /opt/nomad

    Let's create common configuration:

    $: sudo mkdir --parents /etc/nomad.d $: sudo chmod 700 /etc/nomad.d $: sudo touch /etc/nomad.d/nomad.hcl

    Copy the following configuration into /etc/nomad.d/nomad.hcl:

    datacenter = "dc1" data_dir = "/opt/nomad" acl { enabled = true }

    Copy and paste the following configuration for the client into

    /etc/nomad.d/client.hcl:
    bind_addr = "0.0.0.0" client { enabled = true } acl { enabled = true }

    Now copy and paste the configuration for the server into /etc/nomad.d/server.hcl:

    server { enabled = true bootstrap_expect = 1 }

    Use the following configuration to register Nomad as a service:

    [Unit] Description=Nomad Documentation=https://www.nomadproject.io/docs Wants=network-online.target After=network-online.target [Service] ExecReload=/bin/kill -HUP $MAINPID ExecStart=/usr/local/bin/nomad agent -config /etc/nomad.d KillMode=process KillSignal=SIGINT LimitNOFILE=infinity LimitNPROC=infinity Restart=on-failure RestartSec=2 StartLimitBurst=3 StartLimitIntervalSec=10 TasksMax=infinity [Install] WantedBy=multi-user.target

    Now run to register it into systemd:

    $: sudo systemctl enable nomad $: sudo systemctl start nomad $: sudo systemctl status nomad

    We are almost done. We need to enable the ACL system, so only authorized access could be possible. Let's first bootstrap the nomad's ACL:

    $: nomad acl bootstrap

    Store the SecretId somewhere securely. This is the management token. If you lose it, you need to reset the ACL. Save the following policy as dev.policy. It will give the ability to a developer to deploy nomad jobs and read their logs:

    namespace "default" { policy = "read" capabilities = ["submit-job", "dispatch-job", "read-logs"] }

    Now using the management token, we will add the policy above to Nomad and then we will attach it to a new token:

    $: NOMAD_TOKEN=MANAGEMENT_TOKEN nomad acl policy apply description="Developer Policy" dev-policy dev.policy

    Now let's create a new token for the developer and attach the policy to it:

    $: NOMAD_TOKEN=MANAGEMENT_TOKEN nomad acl token create -name="Developer token" -policy="dev-policy" -type=client

    Now, whoever has the token, can deploy jobs to nomad and see their logs. For example:

    $: NOMAD_ADDR="http://nomad.addr.com" NOMAD_TOKEN="dev-token" nomad status
  6. Deploying Redis on the cluster

    Now when the cluster is up and running, let's deploy Redis on it. First, let's generate a job template:

    $: nomad job init

    The job template has already sane defaults and Redis as service. So we don't need to do anything else. Let's deploy it to our cluster:

    $: NOMAD_ADDR="http://debian-host.addr.com" NOMAD_TOKEN=NOMAD_TOKEN nomad job run example.nomad

    HINT: You can export the environment variables NOMAD_ADDR and NOMAD_TOKEN to avoid typing every time, i.e. put them in your ./bashrc .

    Let's check if our service is up and running:

    $: NOMAD_ADDR="http://debian-host.addr.com" NOMAD_TOKEN=NOMAD_TOKEN nomad job status example ID = example Name = example Submit Date = 2021-01-13T05:30:23-05:00 Type = service Priority = 50 Datacenters = dc1 Namespace = default Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost cache 0 0 1 0 0 0 Latest Deployment ID = a95ba3bf Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy Progress Deadline cache 1 1 1 0 2021-01-13T10:40:39Z Allocations ID Node ID Task Group Version Desired Status Created Modified 2c659851 9706486e cache 0 run running 1m12s ago 56s ago
    Voilà! Our cluster is fully operational and we deployed our first service.

Summary

In this guide I went step-by-step through the process of setting up a fully operational cluster with the tools from Hashicorp: Nomad and Consul.

If you're too lazy to run everything by hand (like me), here's a GitHub repo containing Ansible playbook that will bootstrap the cluster for you: https://github.com/vpenkoff/hashicorp-dev-cluser.