Back up and restore a cluster

See below to understand how to back up and restore a Managed cluster.

Prerequisites

  • The IP addresses for backup and restore should be the same.
  • Storage configured for backup must be accessible (both read and write) by all cluster nodes (i.e., using the NFS protocol). To ensure reliability of backup storage, it's recommended that you use an external and dedicated storage volume with a replication mechanism.

Back up a cluster

All important Dynatrace Managed configuration files (naming rules, tags, management zones, alerting profiles, and more) and monitoring data can be backed up automatically on a daily basis. For maximum security, it's a good idea to save your backup files to an off-site location.

The configuration files and internal database are contained in an uncompressed tar archive.

Each node should be connected to the NFS (the NFS disk should be mounted at the same shared directory on each node). Dynatrace server process should have read/write permissions to the NFS. The protocol which is used to transmit data depends on your configuration. Our recommendation is to use the NFSv4 protocol. We don't recommend the CIFS protocol.

Notes:

  1. Backup history isn't preserved; Dynatrace Managed keeps only the latest backup.
  2. Transaction storage isn't backed up, so when you restore backups you may see some gaps in deep monitoring data.

Estimated cluster backup size: 1-3 nodes

The overall size of a cluster backup tar archive can be roughly estimated as equal to the size of the Metrics storage of the cluster plus twice the amount of Elasticsearch storage utilized by each node in the cluster. You can find the Metrics storage and Elasticsearch storage amounts listed on each node details page.

Estimated cluster backup size: 4-6 nodes

Follow the same calculation explained above for backup of 1-3 node clusters and double that amount to arrive at a rough estimate of the overall archive size.

Restore a cluster

To restore a cluster, follow the steps below.

On each node successively, execute the Dynatrace Managed installer using the following arguments:
--restore --backup-file <path-to-backup-file>/backup-001.tar

Note:

  • Please use the same version of the installer that was used during backup (get the installer from <path-to-backup>).
  • It's recommended that you restore all nodes from the cluster.

On each node successively, start the firewall using the launcher script via the following command:
<full-path-to-Dynatrace-binaries-directory>/launcher/firewall.sh start

On each node successively, start Cassandra using the launcher script:

  • Execute the command:
    <full-path-to-Dynatrace-binaries-directory>/launcher/cassandra.sh start

  • On the last node, check if Cassandra is running using the command:
    <full-path-to-Dynatrace-binaries-directory>/utils/cassandra-nodetool.sh status

  • You should get the following response:
    Status = Up
    State = Normal

On each node successively, run nodetool repair.

  • Execute the command:
    <full-path-to-Dynatrace-binaries-directory>/utils/cassandra-nodetool.sh repair

On each node successively, start Elasticsearch using a launcher script.

  • Execute the command:
    <full-path-to-Dynatrace-binaries-directory>/launcher/elasticsearch.sh start

  • On the last node check if Elasticsearch is running using the command:
    curl -s -N -XGET 'http://localhost:9200/_cluster/health?pretty' | grep status

  • You should get the following response:
    "status" : "green"
    or for one node setup:
    "status" : "yellow"

Create the dynatrace_repository.

  • On one of the nodes, execute the following command (enter proper location and the path.repo property from the configuration file <full-path-to-Dynatrace-binaries-directory>elasticsearch/config/elasticsearch.yml):
    curl -s -N -XPUT 'http://localhost:9200/_snapshot/dynatrace_repository' -H 'Content-Type: application/json' -d'
    {
        "type": "fs",
        "settings": {
            "location": "enter-here-full-path-to-elasticseach-backup-location",
            "compress": true
        }
    }'
    
  • You should get the following response:
    {"acknowledged":true}

Close indices.

  • On one of the nodes execute the command:
    for index in `curl -X XGET 'http://localhost:9200/_cat/indices/?h=index'`; do
      curl -s -N -XPOST "http://localhost:9200/$index/_close"
    done
    

Find the latest snapshot.

  • On one of the nodes execute the command:
    curl -s -N -XGET 'http://localhost:9200/_cat/snapshots/dynatrace_repository?h=id&s=end_epoch:desc' | head -n 1
  • In response, you should get a snapshot, for example:
    snapshot_2018-01-04-08-58-utc

Restore the Elasticsearch database.

  • On one of the nodes execute the command:
    curl -s -N -XPOST 'http://localhost:9200/_snapshot/dynatrace_repository/<put-snapshot-name-here>/_restore'

Monitor the progress of restoring a snapshot.

  • On one of the nodes execute the command:
    curl -s -N -XGET 'http://localhost:9200/_snapshot/dynatrace_repository/<put-snapshot-name-here>/_status?pretty' | grep state
  • In response, you should get a snapshot:
    "state" : "SUCCESS"

On each node successively, start Dynatrace Server and the other components using the launcher script via the following command:
<full-path-to-Dynatrace-binaries-directory>/launcher/dynatrace.sh start

Cluster is ready.