Back up and restore a cluster

See below to understand how to back up and restore a Managed cluster.

Back up a cluster

All important Dynatrace Managed configuration files (naming rules, tags, management zones, alerting profiles, and more) and monitoring data can be backed up automatically on a daily basis. For maximum security, it's a good idea to save your backup files to an off-site location.

The configuration files and internal database are contained in an uncompressed tar archive.

Each node should be connected to the NFS (the NFS disk should be mounted at the same shared directory on each node). The Dynatrace server process should have read/write permissions for the NFS. The protocol used to transmit data depends on your configuration. We recommend NFSv4. We don't recommend CIFS.

No transaction storage backup

Transaction storage isn't backed up, so when you restore backups you may see some gaps in deep monitoring data.

Cassandra

Dynatrace backs up each node individually and keeps only the latest backup.

Backup characteristics

  • The backup is done daily; it's not incremental.
  • Any data that's replicated between nodes is also stored in the backup (there is no deduplication).
  • Dynatrace excludes the most frequently changing column families (excluded column families are about 80% of the entire storage).
  • Managed 1.166 or later Dynatrace excludes 1 minute and 5 minute resolution data.

Elasticsearch

Dynatrace backs up the entire cluster. While the data is replicated across nodes and there are two replicas in addition to the primary shard, the backup excludes the replicated data.

Backup characteristics

  • The backup is done hourly and is incremental.
    Initially, Dynatrace copies the entire data set and then creates snapshots of the differences. Snapshots are copied hourly. Older snapshots are removed gradually once they are 6 months old. Dynatrace keeps at least one snapshot per month.

  • For a one-node or two-node cluster, Dynatrace stores only one of the two replicas per index. As a result, the ratio of backup size to disk size is higher for one-node and two-node clusters.

Since Dynatrace keeps some of the older snapshots, backup size grows regardless of the current size on disk. The snapshots are incremental, but Elasticsearch merges data segments over time, which results in certain duplicates in the backup.

Restore a cluster

To restore a cluster, follow the steps below.

Before you begin

  • Make sure the machines prepared for the cluster restore have similar hardware and disk layout as the original cluster and sufficient capacity to handle the load after restore.
We recommend

We recommend that you restore the cluster to the same number of nodes as the backed up cluster. In exceptional cases it's possible to restore to a cluster with up to two nodes less than the backed up cluster. You risk losing the cluster configuration if you attempt to restore to a cluster that is more than two nodes short of the original backed up cluster.

  • On each target node, mount the NFS backup storage, for example to /mnt/backup, referred to as <path-to-backup>.
  • Ensure the installer has read permissions to the NFS. For example: sudo adduser dynatrace && sudo chown -R dynatrace:dynatrace <path-to-backup>
  • Create your cluster inventory. You'll need this information during the restore.
    • IDs of nodes in the cluster - The backup of each node is stored in a dedicated directory named after its identifier, in the format node_<node_id> (for example, node_1, node_5, etc).
    • IPv4 addresses of the new machines.
    • Decide what the target machine for each node will be.
    • Decide which node will become the master (seed) node in the cluster.

Restore from backup

To restore a cluster, follow the steps below:

Copy the installer to target nodes
To restore the cluster, you need to use the exact same installer version as in the original one. Copy the installer from <path-to-backup>/<UUID>/node_<node_id>/ to a local disk on each target node.
For example cp <path-to-backup>/<UUID>/node_<node_id>/files/backup-001-dynatrace-managed-installer.sh /tmp/

Launch Dynatrace restore on each node
In parallel, on each node, execute the Dynatrace Managed installer using the following parameters:

  • --restore - switches the installer into the restore mode.
  • --cluster-ip - IPv4 address of the node on which you run the installer.
  • --cluster-nodes - the comma-delimited list of IDs and IP addresses of all nodes in the cluster, including the one on which you run the installer, in the following format <node_id>:<node_ip>,<node_id>:<node_ip>.
  • --seed-ip - IPv4 address of the seed node.
  • backup-file - the path to the backup *.tar file.

Get the IDs and IP addresses from the inventory you created before you started.

For example:
10.176.41.168 - The IP address of the node to restore
1: 10.176.41.168, 3: 10.176.41.169, 5: 10.176.41.170 - Node IDs and new IP addresses of all nodes in the cluster

sudo /tmp/backup-001-dynatrace-managed-installer.sh/
--restore
--cluster-ip "10.176.41.168"
--cluster-nodes "1:10.176.41.168,3:10.176.41.169,5:10.176.41.170"
--seed-ip "10.176.41.169"
--backup-file /mnt/backup/bckp/c9dd47f0-87d7-445e-bbeb-26429fac06c6/node_1/files/backup-001.tar

Start the firewall, Cassandra and Elasticsearch
On each node successively, start the firewall, Cassandra and Elasticsearch using the launcher script:

/opt/dynatrace-managed/launcher/firewall.sh start
/opt/dynatrace-managed/launcher/cassandra.sh start
/opt/dynatrace-managed/launcher/elasticsearch.sh start

Verify Cassandra state
On each node, check if Cassandra is running. Execute the command:
<dynatrace-install-dir>/utils/cassandra-nodetool.sh status

All the nodes of the restored cluster should be listed in the response with the following values:
Status = Up
State = Normal

Verify Elasticsearch state
On each node, check if Elasticsearch is running. Execute the command:
curl -s -N -XGET 'http://localhost:9200/_cluster/health?pretty' | grep status

You should get the following response:
"status" : "green"
or for one node setup:
"status" : "yellow"

Restore the Elasticsearch database
On the master (seed) node, run the following command: <dynatrace-install-dir>/utils/restore-elasticsearch-data.sh <path-to-backup>/<UUID>

Restore Cassandra data files
On each node successively, starting with the seed node, run the following command:
<dynatrace-install-dir>/utils/restore-cassandra-data.sh <path-to-backup>/<UUID>/node_<node_id>/files/backup-001.tar
Wait until Cassandra has its cluster fully set. Use the command:
<dynatrace-install-dir>/utils/cassandra-nodetool.sh status

  • You should get the following response:
    Status = Up
    State = Normal

Optional Repair Cassandra
On one of the nodes, initiate the cluster-wide Cassandra repair:

<dynatrace-install-dir>/utils/cassandra-nodetool.sh repair

This is for ensuring data consistency between the nodes. This step may take several hours to complete.

Start Dynatrace
On each node successively, starting with the seed node, run the following command:

<dynatrace-install-dir>/launcher/dynatrace.sh start

Wait until you can sign in to Cluster Management Console.

Optional Remove remaining references to old nodes
In case you decided to restore fewer nodes than in the original cluster, remove the nodes marked as Offline in the Cluster Management Console. For more information, see Remove a cluster node

Switch OneAgents to the new cluster address
If you originally configured the cluster with the DNS for OneAgents, you only need to update the DNS records as explained in the next step.

Otherwise, you must configure Cluster ActiveGates (or OneAgents if no ActiveGates are used) with the new target address and restart them.

Execute the following cluster API call for each node, replacing <node-id> with the node identifier, <node-ip> with the node IPV4 address, and <Api-Token> with a valid Cluster API token.

curl -ikS -X PUT -d <node-ip> https://<node_ip>:8021/api/v1.0/onpremise/endpoint/publicIp/agents/<node-id>?Api-Token=<Api-Token> -H  "accept: application/json" -H  "Content-Type: application/json"

You should receive the 200 response as in the example below:

HTTP/1.1 200 OK
Date: Tue, 19 Feb 2019 17:49:06 GMT
X-Robots-Tag: noindex
Server: ruxit server
Content-Length: 0

Optional Update cluster DNS records
If the cluster restore resulted in changing the IP addresses, update the DNS records.

  • If you use automatic domain and certificate management, execute to following cluster API call for each node, replacing <node-id> with the node identifier, <node-ip> with the node IPV4 address, and <Api-Token> with a valid API token.
curl -ikS -X PUT -d <node-ip> https://<Node-ip>:8021/api/v1.0/onpremise/endpoint/publicIp/domain/<node-id>?Api-Token=<Api-Token> -H  "accept: application/json" -H  "Content-Type: application/json"

You should receive the 200 response as in the example below:

HTTP/1.1 200 OK
Date: Tue, 19 Feb 2019 17:49:06 GMT
X-Robots-Tag: noindex
Server: ruxit server
Content-Length: 0
  • If you use your own DNS, update your cluster domain to a new IP address.

Cluster is ready.