Cloud Installation Requirements
- Startup infrastructure using Terraform
- Insert public IPs of the VMs into the Ansible inventory files
- Run an Ansible deploy on the inventory file
- Single Node - 4 core machines with 15GB of memory
- Cluster Nodes - 2 core machines with 7.5GB of memory
Basic cluster nodes = 6 virtual machines
|Name||# of nodes||Contains|
|LB||1||HAProxy & console|
|DB||1||Postgres, Prometheus (monitors metrics),and Loki (obtains all of the logs)|
|Backend||3 (default)||ClearBlade containers|
DB and single nodes get additional external disk (starts at 100GB of SSD storage) for database data (such as customer, Prometheus, and Loki data). The external disk transfers the data to another VM in the infrastructure and starts it up. This is recommended when the root disk fills up or gets corrupted.
There is also the option to have all data on a bigger root disk (100GB) with no external disc. With this disk, Postgres points to the location where the folder is going to live.
How to run backups
Backups run through Chron jobs that are set up via Ansible. Ansible is used when deploying to perform Linux tuning, installing Docker, and installing images and all configuration files. Ansible is currently not made public due to security reasons.
The Chron job is a backend script that does a PG dump on the Postgres database into a file. Then the file is uploaded to Google Buckets. A Jenkins job performs checks every day to verify the previous day’s file is available. These files are permanently available.
How to replicate a database
There can be a separate VM to replicate databases in a different data center for larger cluster nodes. This data center is identical to the Postgres database and is set up as a streaming replication Postgres instance. All changes that are performed on the Postgres get streamed over to the replica and are executed there. The backups are performed there because they will lock database tables until the data dumping is complete. This is recommended because locking tables on the original database is not recommended in extreme cases (such as improving performance). The Chron job process is the same on the replica as the Ansible backup.
How to monitor metrics
ClearBlade container integrates with Prometheus to export some metrics. There are exporters running Docker containers for each part of the stack. HAProxy and Postgres exporters periodically query the applications for the metrics and convert them into a format that makes it readable to Prometheus. Prometheus pulls the exporters every 5 seconds to retrieve the latest data and store it in a time-series database that offers a query language to review the metrics. Promtail is configured to retrieve all of the logs for the running Docker containers and send them to Loki, which lives on the instances as well on the database VM. Loki is a storage database built by Grafana that performs well with logs. Loki integrates with Grafana, which is the front end that is used to view Prometheus and Loki data for querying.
How to update an instance
Ansible is used to update an instance. Update the version number on the inventory file called
cb_version for the customer and rerun the deployment.
How to generate license keys
Encrypt the license key in an Ansible inventory file by encrypting a string on the Ansible command line. Then rerun the deployment to update.
To generate the license key in the admin panel, insert the license key and click update. This will cause it to revert to the old license key on Ansible. It is NOT recommended to generate license keys in the admin panel on the frontend.
How to release a version
To start the release job, enter a version number to release a new version in Jenkins. Downstream jobs will then start building and releasing all CLI. It will then build and release console for various teams and default. It is also dockerized and releases console to ClearBlade and all customer teams. Bladerunner and platform are also required to be built and released. After everything releases, backend builds a tag and it uploads to a bucket in the Google Cloud repo.
Edge builds for different architectures, the file hosting container, and Edge Linux packages must also be built and released.
The releases run on a VM and the output is a Docker image that runs when deploying a new version.
After all of the releases are generated, the release notes are generated and deployed on
docs.clearblade.com. The release notes are generated using Python scripts in
Container versions can be found here.
Customer inventory files (for base images and versions) can be found here
Roles of each container
postgresis the main database
lokistores the logs of the Docker containers
prometheusstores the metrics with the
promtailgathers all the logs for the containers
file-hostinglives in the LB node and is used for Over the Air Upgrades for edges
alert-managerhandles sending Prometheus alerts to Slack based on defined rules (ex. if a DB disk is full)
redis-exporteris the Prometheus exporter retrieving metrics from Redis
node-exporterlives on all nodes and gathers CPU, memory, and disk utilizations for the VM
docker-status-exporteris custom written to retrieve the start times of Docker containers and per container metrics based on CPU and memory usage
All connections go through HAproxy, except direct SSH. This includes querying for Prometheus and Loki and accessing the back end and the front end.
REST calls will get split based on the request path. If the calls start with
/codeadmin, or any specific backend API route, they should be handled in the backend. All others should be TCP requests on port 443 to be handled by
cb_console can also communicate with the backend via REST calls. Port 80 is by default when you have SSL turned on.
HTTP calls are routed using the Round Robbin approach. Persistent TCP connections (such as MQTT) are routed to the backend nodes on a least-connection basis.
To balance connections, HAproxy communicates directly to the backend VMs via the internal IPs through HAproxy configuration. The backend nodes communicate with each other based on the host file that is built in the Ansible deploy and managed by Docker. Keys are then populated in Redis. The ClearBlade containers will then set up direct internal RPC connections and send messages back and forth to each other.
ClearBlade containers communicate with Postgres based on the internal IPs of the VMs that are found in the ClearBlade configuration file. These containers open some ports via Docker (that are not SSH) through TCP connections to other nodes. SSH is secured based on a private key.
|HTTP||80||When SSL is enabled, this should redirect to 443. When disabled, this will provide access to the console|
|HTTPS||443||Used to access the console when SSL is enabled|
|MQTT TLS||1884||MQTT Messaging with TLS|
|Websockets MQTT||8903||Websockets MQTT messaging|
|Websockets MQTT TLS||8904||Websockets MQTT messaging with TLS|
|MQTT TLS Auth||8905||Authenticate with the platform via MQTT messaging with TLS|
|Websockets MQTT TLS Auth||8907||Authenticate with the platform via Websockets MQTT messaging with TLS|
|Edge RPC||8950||Incoming connections for Edges|
|Edge RPC TLS||8951||Incoming connections for Edges with TLS|
|Prometheus||9090||Access metrics from all containers|
|Loki||9091||Access logs from all containers|