Back up and restore a cluster

See below to understand how to back up and restore a Managed cluster.

Back up a cluster

All important Dynatrace Managed configuration files (naming rules, tags, management zones, alerting profiles, and more) and monitoring data can be backed up automatically on a daily basis. For maximum security, it's a good idea to save your backup files to an off-site location.

The configuration files and internal database are contained in an uncompressed tar archive.

Each node should be connected to the NFS (the NFS disk should be mounted at the same shared directory on each node). Dynatrace server process should have read/write permissions to the NFS. The protocol which is used to transmit data depends on your configuration. Our recommendation is to use the NFSv4 protocol. We don't recommend the CIFS protocol.

Notes:

  1. Backup history isn't preserved; Dynatrace Managed keeps only the latest backup.
  2. Transaction storage isn't backed up, so when you restore backups you may see some gaps in deep monitoring data.

Estimated cluster backup size: 1-3 nodes

The overall size of a cluster backup tar archive can be roughly estimated as equal to the size of the Metrics storage of the cluster plus twice the amount of Elasticsearch storage utilized by each node in the cluster. You can find the Metrics storage and Elasticsearch storage amounts listed on each node details page.

Estimated cluster backup size: 4-6 nodes

Follow the same calculation explained above for backup of 1-3 node clusters and double that amount to arrive at a rough estimate of the overall archive size.

Restore a cluster

To restore a cluster, follow the steps below.

Before you begin

  • Make sure the machines prepared for the cluster restore have similar hardware and disk layout as the original cluster and sufficient capacity to handle the load after restore.
  • On each target node, mount the NFS backup storage, for example to /mnt/backup, referred to as <path-to-backup>.
  • Ensure the installer has read permissions to the NFS. For example: sudo adduser dynatrace && sudo chown -R dynatrace:dynatrace <path-to-backup>
  • Create your cluster inventory. You'll need this information during the restore.
    • IDs of nodes in the cluster - the backup of each node is stored in a dedicated directory named after its identifier in the node_<node_id> format, for example node_1, node_5, etc.
    • IPv4 addresses of the new machines.
    • Decide on target machines for each node.
    • Decide which node will become the master (seed) node in the cluster.

Copy the installer to target nodes
To restore the cluster, you need to use the exact same installer version as in the original one. Copy the installer from <path-to-backup>/<UUID>/node_<node_id>/ to a local disk on each target node.
For example cp <path-to-backup>/<UUID>/node_<node_id>/files/backup-001-dynatrace-managed-installer.sh /tmp/

Launch Dynatrace restore on each node
In parallel, on each node, execute the Dynatrace Managed installer using the following parameters:

  • --restore - switches the installer into the restore mode.
  • --cluster-ip - IPv4 address of the node on which you run the installer.
  • --cluster-nodes - the comma-delimited list of IDs and IP addresses of all nodes in the cluster, including the one on which you run the installer, in the following format <node_id>:<node_ip>,<node_id>:<node_ip>.
  • --seed-ip - IPv4 address of the seed node.
  • backup-file - the path to the backup *.tar file.

Get the IDs and IP addresses from the inventory you created before you started.

For example:
10.176.41.168 - The IP address of the node to restore
1: 10.176.41.168, 3: 10.176.41.169, 5: 10.176.41.170 - Node IDs and new IP addresses of all nodes in the cluster

sudo /tmp/backup-001-dynatrace-managed-installer.sh/
--restore
--cluster-ip "10.176.41.168"
--cluster-nodes "1:10.176.41.168,3:10.176.41.169,5:10.176.41.170"
--seed-ip "10.176.41.169"
--backup-file /mnt/backup/bckp/c9dd47f0-87d7-445e-bbeb-26429fac06c6/node_1/files/backup-001.tar

Start the firewall, Cassandra and Elasticsearch
On each node successively, start the firewall, Cassandra and Elasticsearch using the launcher script:

/opt/dynatrace-managed/launcher/firewall.sh start
/opt/dynatrace-managed/launcher/cassandra.sh start
/opt/dynatrace-managed/launcher/elasticsearch.sh start

Verify Cassandra state
On each node, check if Cassandra is running. Execute the command:
<dynatrace-install-dir>/utils/cassandra-nodetool.sh status

All the nodes of the restored cluster should be listed in the response with the following values:
Status = Up
State = Normal

Verify Elasticsearch state
On each node, check if Elasticsearch is running. Execute the command:
curl -s -N -XGET 'http://localhost:9200/_cluster/health?pretty' | grep status

You should get the following response:
"status" : "green"
or for one node setup:
"status" : "yellow"

Restore the Elasticsearch database
On each node successively, starting with the seed node, run the following command:
<dynatrace-install-dir>/utils/restore-elasticsearch-data.sh <path-to-backup>/<UUID>

Restore Cassandra data files
On each node successively, starting with the seed node, run the following command:
<dynatrace-install-dir>/utils/restore-cassandra-data.sh <path-to-backup>/<UUID>/node_<node_id>/files/backup-001.tar
Wait until Cassandra has its cluster fully set. Use the command:
<dynatrace-install-dir>/utils/cassandra-nodetool.sh status

  • You should get the following response:
    Status = Up
    State = Normal

Start Dynatrace
On each node successively, starting with the seed node, run the following command:
<dynatrace-install-dir>/launcher/dynatrace.sh start
Wait until you can sign in to Cluster Management Console.

Optional Remove remaining references to old nodes
In case you decided to restore fewer nodes than in the original cluster, remove the nodes marked as Offline in the Cluster Management Console. For more information, see Remove a cluster node

Switch OneAgents to the new cluster address
If you originally configured the cluster with the DNS for OneAgents, you only need to update the DNS records as explained in the next step.

Otherwise, you must configure Cluster ActiveGates (or OneAgents if no ActiveGates are used) with the new target address and restart them.

Execute the following cluster API call for each node, replacing <node-id> with the node identifier, <node-ip> with the node IPV4 address, and <Api-Token> with a valid Cluster API token.

curl -ikS -X PUT -d <node-ip> https://<node_ip>:8021/api/v1.0/onpremise/endpoint/publicIp/agents/<node-id>?Api-Token=<Api-Token> -H  "accept: application/json" -H  "Content-Type: application/json"

You should receive the 200 response as in the example below:

HTTP/1.1 200 OK
Date: Tue, 19 Feb 2019 17:49:06 GMT
X-Robots-Tag: noindex
Server: ruxit server
Content-Length: 0

Optional Update cluster DNS records
If the cluster restore resulted in changing the IP addresses, update the DNS records.

  • If you use automatic domain and certificate management, execute to following cluster API call for each node, replacing <node-id> with the node identifier, <node-ip> with the node IPV4 address, and <Api-Token> with a valid API token.
curl -ikS -X PUT -d <node-ip> https://<Node-ip>:8021/api/v1.0/onpremise/endpoint/publicIp/domain/<node-id>?Api-Token=<Api-Token> -H  "accept: application/json" -H  "Content-Type: application/json"

You should receive the 200 response as in the example below:

HTTP/1.1 200 OK
Date: Tue, 19 Feb 2019 17:49:06 GMT
X-Robots-Tag: noindex
Server: ruxit server
Content-Length: 0
  • If you use your own DNS, update your cluster domain to a new IP address.

Cluster is ready.