• Home
  • Dynatrace Managed
  • Fault domain awareness
  • Premium HA - Replicate nodes across DCs

Premium HA - Replicate nodes across DCs

Premium High Availability

Dynatrace Premium High Availability (Premium HA) is a self-contained, out-of-the-box solution that provides near-zero downtime and allows monitoring to continue without data loss in failover scenarios. This solution requires additional licensing for your deployment.

To create a globally distributed high availability deployment (Premium High Availability), you must add a redundant set of nodes to your original Managed cluster deployment. Typically, such high availability deployments span across multiple data centers. Dynatrace Managed enables you to add mirrored nodes located in another data center.

In this procedure, the following terms are designated as follows:

  • DC-1 - The data center where the initial Dynatrace Managed cluster is located.
  • DC-2 - An additional data center designated for Premium High Availability deployment.
  • seed node - Any node within DC-1 that will be used for performing the installation tasks and distribution of configuration.

The procedure involves migration and replication of Dynatrace Managed components separately so they are prepared for data replication across two data centers. See Overview of Dynatrace Managed components.

Requirements

  • Premium High Availability license. See Calculate Dynatrace monitoring consumption

  • DC-1 cluster release must be 1.222+.

  • DC-1 cluster must have backup disabled before starting the migration procedure. We recommend that you create a fresh cluster backup and disable cluster backup shortly before deploying the additional data center.

  • Migration must not take longer than four weeks. If your migration will exceed four weeks, contact Dynatrace ONE.

  • DC-1 cluster automatic update must be disabled before starting the migration procedure. The cluster must not be upgraded during migration. See Automatic update. Contact Dynatrace ONE if your automatic update option is disabled.

  • Make sure machines are prepared for the cluster in DC-2.

    We recommend

    Since DC-2 will replicate the data of DC-1, we recommend that you designate the same number of nodes with the same hardware, including disk storage.
    All nodes in DC-1 and DC-2 must be time-synchronized. This can be achieved by setting up Network Time Protocol (NTP).

  • Premium High Availability deployment requires at least three nodes in DC-1 and three corresponding nodes in DC-2.

  • Make sure that all nodes in both data centers can communicate with each other. To check if a node in DC-1 is reachable from a host in DC-2, you can execute a health-check REST call. For example, run a following command from at a host in DC-2:

shell
curl -k https://<DC-1-node-IP>/rest/health

where <DC-1-node-IP> is the IP address of any node in DC-1. You should receive "RUNNING" in a reponse if the connection can be established successfully.

Preparation

Ensure that your system meets the specified hardware and operating system requirements.

Gather information

The commands will use these variables in executing the REST API calls. For this, you will need the following information:

  • <seed-node-ip> - The IP address of the seed node from DC-1.
    This can be any node running in the existing data center that will be used for performing the installation tasks and distribution of configuration.

  • <nodes-ips> - The list of IPV4 addresses of new nodes in DC-2.
    Example: "176.16.0.5", "176.16.0.6", "176.16.0.7"

  • <api-token> - A valid Cluster API token (ServiceProviderAPI scope is required).
    You can generate it in the Dynatrace Managed Cluster Management Console. See Cluster API - Authentication.

  • <dynatrace-directory> - The directory where Dynatrace Managed is installed on the seed node.
    The default Dynatrace Managed installation directory is /opt/dynatrace-managed

  • <datacenter-1> - The DC-1 name must be the same as the Cassandra DC name.
    The default Cassandra DC name is datacenter1.

    Get the DC name.

    To get the DC name, execute this command on the seed node before starting migration:

    shell
    sudo <dynatrace-directory>/utils/cassandra-nodetool.sh status

    You will get a response that includes the DC-1 name. Example for a DC named datacenter1:

    plaintext
    Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.176.42.20 65.54 GB 256 100.0% f053dd8d-ecf3-7834-b099-68542439817b rack1 UN 10.176.42.244 65.47 GB 256 100.0% 2aa7e790-a423-9273-88f9-45bcd158dd6e rack1 UN 10.176.42.168 65.47 GB 256 100.0% 48543bca-41f5-26d3-b2fd-6cfdf5c0f3b2 rack1
  • <datacenter-2> - The DC-2 name can be any string that begins and ends with an alphanumeric character and is no longer than 80 characters. Underscores and dashes are allowed within the name. Example: dc-us-east-2.

Set variables

To streamline the numerous REST API calls during the deployment, set environment variables on every node in DC-1 and DC-2.

shell
SEED_IP=<seed-ip> DT_DIR=<dynatrace-directory> NODES_IPS=$(echo '[<nodes-ips]') API_TOKEN=<api-token> DC1_NAME=<datacenter-1> DC2_NAME=<datacenter-2>

For example:

shell
SEED_IP=10.176.37.201 DT_DIR=/opt/dynatrace-managed NODES_IPS=$(echo '["10.176.37.218", "10.176.37.227", "10.176.37.120"]') API_TOKEN=R_SZOpV4RTOmjr9fFmK4x DC1_NAME=datacenter1 DC2_NAME=dc-us-east-2

Check for custom settings

If your Cassandra or Elasticsearch cluster is configured with custom.settings that enable rack-awareness, contact Dynatrace ONE to apply these custom settings before proceeding with DC-2 installation.

To check whether custom settings are applied, execute on seed node:

plaintext
ls $DT_DIR/installer/custom.settings

If the custom.settings file exists, you are using custom settings.

Installation

To create a cluster in a second DC, follow this procedure.

Update Elasticsearch

Distribute the installer

Prepare cluster data for a replication

Create the data center topology

Open firewall rules

Install second data center nodes

Replicate Cassandra

Replicate Elasticsearch

Migrate the server

Enable the new data center

Reconfigure DC-1

Reconfigure DC-2

API return codes

Each of the REST API calls will return the HTTP code. Go to the next step only when the returned code is 200. Expect the following return codes:

200 - Go to the next step, current step was executed successfully.
207 - The request is being processed, repeat step after few minutes if there's no response.
40x - Revise your request path and arguments, then repeat the request.
5xx - Contact Dynatrace ONE.

Update Elasticsearch

Update Elasticsearch to the proper version. Execute the following command on each existing DC-1 node successively:

plaintext
sudo nohup $DT_DIR/installer/reconfigure.sh --only els --premium-ha on &

Distribute the installer

In this step, you will copy the node installer to every node in DC-2.

  1. Log into your Dynatrace Managed Cluster Management Console.

  2. In the Dynatrace menu, go to Home for the Dynatrace Managed deployment status page.

  3. Click Add new cluster node.

  4. Copy the wget command line from the Run this command on the target host text box.

    Do not run the installer script

    The Run this installer script with root rights text box contains a command for the installation script. Ignore this command; do not execute the provided script.

  5. Paste and execute only the wget command line into your terminal window.

Prepare cluster data for a replication

In this step, you will prepare data indexes for replication.

Prepare cluster data

Execute the following cluster API call only on the seed node:

plaintext
curl -ikS -X POST https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/clusterReplicationPreparation?Api-Token=$API_TOKEN

If the status code is not 200 and the response does not suggest next steps, contact Dynatrace ONE.

Check cluster preparation status

Execute the following cluster API call only on the seed node:

plaintext
curl -ikS -X GET https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/clusterReplicationPreparation?Api-Token=$API_TOKEN -H "accept: application/json"

If the status code from this call is not 200, try again after a few minutes.

Create the data center topology

In this step, you will create a configuration that defines which node belongs to which data center.

Execute the following cluster API call only on the seed node:

plaintext
curl -ikS -X POST -d "{\"newDatacenterName\" : \"$DC2_NAME\", \"nodesIp\" :$NODES_IPS}" https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/datacenterTopology?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

If the status code is not 200 and the response does not suggest next steps, contact Dynatrace ONE.

Open firewall rules

In this step, you will add firewall rules that open ports for traffic to the new DC-2 nodes.

Open ports

To open ports to traffic from the new DC-2 nodes, execute the following cluster API call only on the seed node:

plaintext
curl -ikS -X POST -d "$NODES_IPS" https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/clusterNodes/currentDc?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

If successful, the status code will be 200 and the response body will contain a request ID you need to check the firewall rules status.

If the status code is not 200 and the response does not suggest next steps, contact Dynatrace ONE.

Verify firewall rules

Set the request ID environment variable on seed node only. The request ID is from the response in the previous API call.

shell
REQ_ID=<topology-configuration-request-id>

To check the firewall rules status, execute the following cluster API call only on the seed node:

plaintext
curl -ikS https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/clusterNodes/currentDc/$REQ_ID?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

If the status code from this call is not 200, try again after a few minutes.

Install second data center nodes

In this step, you will install managed nodes on all hosts within DC-2 and, once completed, you will check for the presence of a Nodekeeper service. This will indicate if all nodes were successfully installed in DC-2.

Install nodes in DC-2

Execute the following command on every node in DC-2. Follow the on-screen instructions, as this will be a typical node installation.

plaintext
sudo /bin/sh ./managed-installer.sh --install-new-dc --premium-ha on --datacenter $DC2_NAME --seed-auth $API_TOKEN

This operation should take 3 to 5 minutes and the expected result should be similar to this:

plaintext
Installation in new data center completed successfully after 2 minutes 51 seconds.

Check Nodekeeper in DC-2

Execute the following cluster API call only on the seed node when all nodes in DC-2 finish installing:

plaintext
curl -ikS https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/nodekeeper/healthCheck?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

If the status code is not 200, try again after a few minutes.

Replicate Cassandra

In this step, you will reconfigure Cassandra in DC-1 and DC-2 for cross-data center replication, trigger data synchronization, rebuild Cassandra data, and verify the Cassandra state.

It may take minutes to hours, depending on your metric storage size.

  1. Replication of Cassandra in DC-1

    In this step, you will reconfigure Cassandra for cross-data center replication.

    Replicate Cassandra

    To start Cassandra replication in the DC-1 data center, execute the following cluster API call only on the seed node:

    plaintext
    curl -ikS -X POST https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/cassandra/currentDc?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

    If successful, the status code will be 200 and the response body will contain a request ID that you need to check replication status. Set the request ID environment variable only on the seed node. The request ID is from the response in the previous API call.

    shell
    REQ_ID=<replication-old-datacenter-request-id>

    If the status code is not 200 and the response does not suggest next steps, contact Dynatrace ONE.

    Check replication status

    To check replication status, execute the following cluster API call only on the seed node:

    plaintext
    curl -ikS -X GET https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/cassandra/currentDc/$REQ_ID?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

    If the status code is not 200, try again after a few minutes.

  2. Replication of Cassandra in DC-2

    In this step, you will reconfigure Cassandra for cross data center replication and trigger data synchronization.

    Replicate Cassandra

    To start replication of Cassandra in the DC-2 data center, execute the following cluster API call only on the seed node:

    plaintext
    curl -ikS -X POST https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/cassandra/newDc?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

    If successful, the status code will be 200 and the response body will contain a request ID that you need to check replication status. Set the request ID environment variable only on the seed node. The request ID is from the response in the previous API call.

    shell
    REQ_ID=<replication-new-datacenter-request-id>

    If the status code is not 200 and the response does not suggest next steps, contact Dynatrace ONE.

    Check replication status

    To check the replication status, execute the following cluster API call only on the seed node:

    plaintext
    curl -ikS -X GET https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/cassandra/newDc/$REQ_ID?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

    If the status code is not 200, try again after a few minutes.

  3. Rebuild Cassandra data

    Rebuild Cassandra data and Verify Cassandra state for Dynatrace release 254 and earlier.

    In this step, you will rebuild Cassandra and verify the progress by checking the status. Depending on the size of your Cassandra database, this can take several hours.

    Rebuild data

    To rebuild Cassandra, run the following command on each new DC-2 node successively. Use the nohup command to prevent interruption of script execution (such as session disconnect) during important operations.

    plaintext
    sudo nohup $DT_DIR/utils/cassandra-nodetool.sh rebuild -- $DC1_NAME &
    Verify progress and status

    To verify the progress and status, execute the following cluster API call only on the seed node:

    plaintext
    curl -ikS -X GET https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/cassandra/rebuildStatus?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

    If the status code is not 200, try again after approximately 15 minutes. Remember that the rebuild process can be time-consuming.

    Verify Cassandra state

    To verify the Cassandra cluster state, execute the cassandra-nodetool.sh with the status parameter only on the seed node:

    plaintext
    sudo $DT_DIR/utils/cassandra-nodetool.sh status

    The result should look similar to this:

    plaintext
    Datacenter: dc1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.176.41.167 18.82 GB 256 100.0% 3af25127-4f99-4f43-afc3-216d7a2c10f8 rack1 UN 10.176.41.154 19.44 GB 256 100.0% 5a618559-3a73-42ec-83f0-32d28e08beec rack1 UN 10.176.41.43 19.58 GB 256 100.0% 191f3b30-949a-4cf2-b620-68a40eebf31e rack1 Datacenter: dc2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.176.42.54 19.18 GB 256 100.0% 852ce236-a430-400a-92a6-daeed99acf68 rack1 UN 10.176.42.104 19.12 GB 256 100.0% 84479219-b64d-442c-a807-a832db9aae18 rack1 UN 10.176.42.234 19.4 GB 256 100.0% 507b377c-5bfc-4667-b251-a9b7c453ed22 rack1

    The Load value should not differ significantly between the nodes and Status should be UN on all nodes.

    Dynatrace version 1.256+

    In this step, you'll rebuild Cassandra and verify the progress by checking the status. Depending on the size of your Cassandra database, this can take several hours.

    Rebuild data

    To rebuild Cassandra data in the DC-2 data center, execute the following cluster API call only on the seed node:

    plaintext
    curl -ikS -X POST https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/cassandra/rebuild?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

    If successful, the status code will be 200. If the status code is not 200 and the response does not suggest the following steps, please contact a Dynatrace ONE product specialist.

    Check the rebuild data status

    To check the rebuild data status, execute the following cluster API call only on the seed node:

    plaintext
    curl -ikS -X GET https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/cassandra/rebuild?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

    If the status code is not 200, try again after approximately 15 minutes. Remember that the rebuilding data process can be time-consuming.

    If the response has an error flag set to true, please reach out to Dynatrace ONE.

Replicate Elasticsearch

In this step, you will replicate Elasticsearch to the DC-2 data center and verify the configuration and data replication. This step may take minutes or hours, depending on your Elasticsearch storage.

Replicate Elasticsearch to DC-2

To start replication of Elasticsearch to the DC-2 data center, execute the following cluster API call only on the seed node:

plaintext
curl -ikS -X POST https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/elasticsearch?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

If successful, the status code will be 200 and the response body will contain a request ID that you need to check replication status. Set the request ID environment variable only on the seed node. The request ID is from the response in the previous API call.

shell
REQ_ID=<replication-elasticsearch-request-id>

If the status code is not 200 and the response does not suggest next steps, contact Dynatrace ONE.

Verify progress and status

To check the replication status of Elasticsearch, execute the following cluster API call only on the seed node:

plaintext
curl -ikS -X GET https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/elasticsearch/$REQ_ID?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

If the status code is not 200, try again after a few minutes.

Verify data replication

To verify replication Elasticsearch data replication, execute the following cluster API call only on the seed node:

plaintext
curl -ikS -X GET https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/elasticsearch/indexMigrationStatus?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

If the status code is not 200, try again after a few minutes.

Migrate the server

In this step, you will migrate the server, start ActiveGate, and start NGINX in the DC-2 data center.

Migrate server

Launch the Dynatrace Managed cluster in the DC-2 by executing the following cluster API call only on the seed node:

plaintext
curl -ikS -X POST https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/server?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

If successful, the status code will be 200 and the response body will contain a request ID that you need to check cluster readiness. Set the request ID environment variable only on the seed node. The request ID is from the response in the previous API call.

shell
REQ_ID=<replication-server-request-id>

If the status code is not 200 and the response does not suggest next steps, contact Dynatrace ONE.

Check cluster readiness

To check if the cluster is ready, execute the following cluster API call only on the seed node:

plaintext
curl -ikS -X GET https://$SEED_IP/api/v1.0/onpremise/multiDc/migration/server/$REQ_ID?Api-Token=$API_TOKEN -H "accept: application/json" -H "Content-Type: application/json"

If the status code is not 200, try again after a few minutes.

Enable the new data center

  1. Enable OneAgent traffic.
    For details, see Enable/Disable a cluster node.
  2. Enable backup in one of the data centers. Your backup is disabled after migration.
    For details, see Back up and restore a cluster.

Reconfigure DC-1

In this step, you will refresh installers in DC-1 that are used to add nodes.
Execute the following command successively on every node only in DC-1.

plaintext
sudo nohup $DT_DIR/installer/reconfigure.sh &

Reconfigure DC-2

In this step, you will refresh authorization tokens in DC-2 that are used to enable OneAgent connectivity to new data center.
Execute the following command successively on every node only in DC-2.

plaintext
sudo nohup $DT_DIR/installer/reconfigure.sh --fix-ag-token --only ag &