DevOps Docs

Cloud Installation Requirements

  • Startup infrastructure using Terraform
  • Insert public IPs of the VMs into the Ansible inventory files
  • Run an Ansible deploy on the inventory file

Nodes

  • Single Node - 4 core machines with 15GB of memory
  • Cluster Nodes - 2 core machines with 7.5GB of memory

Basic cluster nodes = 6 virtual machines

Name # of nodes Contains
LB 1 HAProxy & console
DB 1 Postgres, Prometheus (monitors metrics),and Loki (obtains all of the logs)
Cache 1 Redis Cache
Backend 3 (default) ClearBlade containers

External Disks

DB and single nodes get additional external disk (starts at 100GB of SSD storage) for database data (such as customer, Prometheus, and Loki data). The external disk transfers the data to another VM in the infrastructure and starts it up. This is recommended when the root disk fills up or gets corrupted.

There is also the option to have all data on a bigger root disk (100GB) with no external disc. With this disk, Postgres points to the location where the folder is going to live.

DevOps Tasks

How to run backups

Backups run through Chron jobs that are set up via Ansible. Ansible is used when deploying to perform Linux tuning, installing Docker, and installing images and all configuration files. Ansible is currently not made public due to security reasons.

The Chron job is a backend script that does a PG dump on the Postgres database into a file. Then the file is uploaded to Google Buckets. A Jenkins job performs checks every day to verify the previous day’s file is available. These files are permanently available.

How to replicate a database

There can be a separate VM to replicate databases in a different data center for larger cluster nodes. This data center is identical to the Postgres database and is set up as a streaming replication Postgres instance. All changes that are performed on the Postgres get streamed over to the replica and are executed there. The backups are performed there because they will lock database tables until the data dumping is complete. This is recommended because locking tables on the original database is not recommended in extreme cases (such as improving performance). The Chron job process is the same on the replica as the Ansible backup.

How to monitor metrics

ClearBlade container integrates with Prometheus to export some metrics. There are exporters running Docker containers for each part of the stack. HAProxy and Postgres exporters periodically query the applications for the metrics and convert them into a format that makes it readable to Prometheus. Prometheus pulls the exporters every 5 seconds to retrieve the latest data and store it in a time-series database that offers a query language to review the metrics. Promtail is configured to retrieve all of the logs for the running Docker containers and send them to Loki, which lives on the instances as well on the database VM. Loki is a storage database built by Grafana that performs well with logs. Loki integrates with Grafana, which is the front end that is used to view Prometheus and Loki data for querying.

How to update an instance

Ansible is used to update an instance. Update the version number on the inventory file called cb_version for the customer and rerun the deployment.

How to generate license keys

Encrypt the license key in an Ansible inventory file by encrypting a string on the Ansible command line. Then rerun the deployment to update.

To generate the license key in the admin panel, insert the license key and click update. This will cause it to revert to the old license key on Ansible. It is NOT recommended to generate license keys in the admin panel on the frontend.

How to release a version

To start the release job, enter a version number to release a new version in Jenkins. Downstream jobs will then start building and releasing all CLI. It will then build and release console for various teams and default. It is also dockerized and releases console to ClearBlade and all customer teams. Bladerunner and platform are also required to be built and released. After everything releases, backend builds a tag and it uploads to a bucket in the Google Cloud repo. Edge builds for different architectures, the file hosting container, and Edge Linux packages must also be built and released. The releases run on a VM and the output is a Docker image that runs when deploying a new version. After all of the releases are generated, the release notes are generated and deployed on docs.clearblade.com. The release notes are generated using Python scripts in $CBOPS.

Data Management

Container versions can be found here.

Customer inventory files (for base images and versions) can be found here

Roles of each container

  • postgres is the main database
  • loki stores the logs of the Docker containers
  • prometheus stores the metrics with the postgres-exporter
  • promtail gathers all the logs for the containers
  • file-hosting lives in the LB node and is used for Over the Air Upgrades for edges
  • alert-manager handles sending Prometheus alerts to Slack based on defined rules (ex. if a DB disk is full)
  • redis-exporter is the Prometheus exporter retrieving metrics from Redis
  • node-exporter lives on all nodes and gathers CPU, memory, and disk utilizations for the VM
  • docker-status-exporter is custom written to retrieve the start times of Docker containers and per container metrics based on CPU and memory usage

Connections

All connections go through HAproxy, except direct SSH. This includes querying for Prometheus and Loki and accessing the back end and the front end. REST calls will get split based on the request path. If the calls start with /api, /admin, /codeadmin, or any specific backend API route, they should be handled in the backend. All others should be TCP requests on port 443 to be handled by cb_console. cb_console can also communicate with the backend via REST calls. Port 80 is by default when you have SSL turned on. HTTP calls are routed using the Round Robbin approach. Persistent TCP connections (such as MQTT) are routed to the backend nodes on a least-connection basis. To balance connections, HAproxy communicates directly to the backend VMs via the internal IPs through HAproxy configuration. The backend nodes communicate with each other based on the host file that is built in the Ansible deploy and managed by Docker. Keys are then populated in Redis. The ClearBlade containers will then set up direct internal RPC connections and send messages back and forth to each other. ClearBlade containers communicate with Postgres based on the internal IPs of the VMs that are found in the ClearBlade configuration file. These containers open some ports via Docker (that are not SSH) through TCP connections to other nodes. SSH is secured based on a private key.

External Ports

Use Default Description
HTTP 80 When SSL is enabled, this should redirect to 443. When disabled, this will provide access to the console
HTTPS 443 Used to access the console when SSL is enabled
MQTT 1883 MQTT Messaging
MQTT TLS 1884 MQTT Messaging with TLS
Websockets MQTT 8903 Websockets MQTT messaging
Websockets MQTT TLS 8904 Websockets MQTT messaging with TLS
MQTT TLS Auth 8905 Authenticate with the platform via MQTT messaging with TLS
Websockets MQTT TLS Auth 8907 Authenticate with the platform via Websockets MQTT messaging with TLS
Edge RPC 8950 Incoming connections for Edges
Edge RPC TLS 8951 Incoming connections for Edges with TLS
Prometheus 9090 Access metrics from all containers
Loki 9091 Access logs from all containers