• Home
  • Deploy Dynatrace
  • Set up Dynatrace on cloud platforms
  • Microsoft Azure
  • Integrations
  • Azure cloud services
  • Azure Managed Instance for Apache Cassandra Monitoring

Azure Managed Instance for Apache Cassandra Monitoring

From both a data and infrastructure perspective, this Prometheus Extension 2.0 allows you to monitors and analyze the activity of your Apache Cassandra clusters. It visualize your cluster's health and shows metrics like CPU, connectivity, request latency, suspension, and garbage collection time. Additionally, with Davis, it automatically detects performance problems and provides precise root cause analysis.

Prerequisites

  • Azure Managed Instance for Apache Cassandra created and running.
  • An Ubuntu virtual machine deployed inside the Azure Virtual Network where the managed instance is present.
  • Prometheus server set up to scrape Cassandra nodes and with relabel config in place
  • Environment ActiveGate version 1.231+ with access to the Prometheus server

Setup

  1. Create an Ubuntu virtual machine in the same virtual network as your Azure Managed Instance for Apache Cassandra.
  2. Ensure Docker is installed on your virtual machine.
  3. Create a file named prometheus.yml on your virtual machine with the contents below.
    Add every Cassandra Node IP address and port 9443 in the static_configs section. The IP addresses can be gathered from the Data Center section of the Azure Portal for your Cassandra Cluster.
    yaml
    static_configs: - targets: ["<Node_IP_1>:9443", "<Node_IP_2>:9443", "<Node_IP_N>:9443"]
prometheus.yml
yaml
global: scrape_interval: 15s scrape_timeout: 10s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: [] scheme: http timeout: 10s scrape_configs: - job_name: prometheus scrape_interval: 15s scrape_timeout: 15s metrics_path: /metrics scheme: http static_configs: - targets: - localhost:9090 - job_name: "mcac" scrape_interval: 15s scrape_timeout: 15s static_configs: - targets: ["<Node_IP_1>:9443", "<Node_IP_2>:9443", "<Node_IP_N>:9443"] honor_labels: true honor_timestamps: false scheme: https tls_config: insecure_skip_verify: true metric_relabel_configs: #drop metrics we can calculate from prometheus directly - source_labels: [__name__] regex: .*rate_(mean|1m|5m|15m) action: drop #save the original name for all metrics - source_labels: [__name__] regex: (collectd_mcac_.+) target_label: prom_name replacement: ${1} - source_labels: ["prom_name"] regex: .+_bucket_(\d+) target_label: le replacement: ${1} - source_labels: ["prom_name"] regex: .+_bucket_inf target_label: le replacement: +Inf - source_labels: ["prom_name"] regex: .*_histogram_p(\d+) target_label: quantile replacement: .${1} - source_labels: ["prom_name"] regex: .*_histogram_min target_label: quantile replacement: "0" - source_labels: ["prom_name"] regex: .*_histogram_max target_label: quantile replacement: "1" #Table Metrics *ALL* we can drop - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.table\.(\w+) action: drop #Table Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.table\.(\w+)\.(\w+)\.(\w+) target_label: table replacement: ${3} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.table\.(\w+)\.(\w+)\.(\w+) target_label: keyspace replacement: ${2} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.table\.(\w+)\.(\w+)\.(\w+) target_label: __name__ replacement: mcac_table_${1} #Keyspace Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.keyspace\.(\w+)\.(\w+) target_label: keyspace replacement: ${2} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.keyspace\.(\w+)\.(\w+) target_label: __name__ replacement: mcac_keyspace_${1} #ThreadPool Metrics (one type is repair.task so we just ignore the second part) - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.thread_pools\.(\w+)\.(\w+)\.(\w+).* target_label: pool_type replacement: ${2} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.thread_pools\.(\w+)\.(\w+)\.(\w+).* target_label: pool_name replacement: ${3} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.thread_pools\.(\w+)\.(\w+)\.(\w+).* target_label: __name__ replacement: mcac_thread_pools_${1} #ClientRequest Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)$ target_label: request_type replacement: ${2} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)$ target_label: __name__ replacement: mcac_client_request_${1} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)\.(\w+)$ target_label: cl replacement: ${3} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)\.(\w+)$ target_label: request_type replacement: ${2} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)\.(\w+)$ target_label: __name__ replacement: mcac_client_request_${1}_cl #Cache Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.cache\.(\w+)\.(\w+) target_label: cache_name replacement: ${2} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.cache\.(\w+)\.(\w+) target_label: __name__ replacement: mcac_cache_${1} #CQL Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.cql\.(\w+) target_label: __name__ replacement: mcac_cql_${1} #Dropped Message Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.dropped_message\.(\w+)\.(\w+) target_label: message_type replacement: ${2} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.dropped_message\.(\w+)\.(\w+) target_label: __name__ replacement: mcac_dropped_message_${1} #Streaming Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.streaming\.(\w+)\.(.+)$ target_label: peer_ip replacement: ${2} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.streaming\.(\w+)\.(.+)$ target_label: __name__ replacement: mcac_streaming_${1} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.streaming\.(\w+)$ target_label: __name__ replacement: mcac_streaming_${1} #CommitLog Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.commit_log\.(\w+) target_label: __name__ replacement: mcac_commit_log_${1} #Compaction Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.compaction\.(\w+) target_label: __name__ replacement: mcac_compaction_${1} #Storage Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.storage\.(\w+) target_label: __name__ replacement: mcac_storage_${1} #Batch Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.batch\.(\w+) target_label: __name__ replacement: mcac_batch_${1} #Client Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.client\.(\w+) target_label: __name__ replacement: mcac_client_${1} #BufferPool Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.buffer_pool\.(\w+) target_label: __name__ replacement: mcac_buffer_pool_${1} #Index Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.index\.(\w+) target_label: __name__ replacement: mcac_sstable_index_${1} #HintService Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.hinted_hand_off_manager\.([^\-]+)-(\w+) target_label: peer_ip replacement: ${2} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.hinted_hand_off_manager\.([^\-]+)-(\w+) target_label: __name__ replacement: mcac_hints_${1} #HintService Metrics - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+) target_label: peer_ip replacement: ${1} - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+) target_label: __name__ replacement: mcac_hints_hints_delays - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.hints_service\.([^\-]+) target_label: __name__ replacement: mcac_hints_${1} # Misc - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.memtable_pool\.(\w+) target_label: __name__ replacement: mcac_memtable_pool_${1} - source_labels: ["mcac"] regex: com\.datastax\.bdp\.type\.performance_objects\.name\.cql_slow_log\.metrics\.queries_latency target_label: __name__ replacement: mcac_cql_slow_log_query_latency - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.read_coordination\.(.*) target_label: read_type replacement: $1 - source_labels: ["mcac"] regex: org\.apache\.cassandra\.metrics\.read_coordination\.(.*) target_label: __name__ replacement: mcac_read_coordination_requests #GC Metrics - source_labels: ["mcac"] regex: jvm\.gc\.(\w+)\.(\w+) target_label: collector_type replacement: ${1} - source_labels: ["mcac"] regex: jvm\.gc\.(\w+)\.(\w+) target_label: __name__ replacement: mcac_jvm_gc_${2} #JVM Metrics - source_labels: ["mcac"] regex: jvm\.memory\.(\w+)\.(\w+) target_label: memory_type replacement: ${1} - source_labels: ["mcac"] regex: jvm\.memory\.(\w+)\.(\w+) target_label: __name__ replacement: mcac_jvm_memory_${2} - source_labels: ["mcac"] regex: jvm\.memory\.pools\.(\w+)\.(\w+) target_label: pool_name replacement: ${2} - source_labels: ["mcac"] regex: jvm\.memory\.pools\.(\w+)\.(\w+) target_label: __name__ replacement: mcac_jvm_memory_pool_${2} - source_labels: ["mcac"] regex: jvm\.fd\.usage target_label: __name__ replacement: mcac_jvm_fd_usage - source_labels: ["mcac"] regex: jvm\.buffers\.(\w+)\.(\w+) target_label: buffer_type replacement: ${1} - source_labels: ["mcac"] regex: jvm\.buffers\.(\w+)\.(\w+) target_label: __name__ replacement: mcac_jvm_buffer_${2} #Append the prom types back to formatted names - source_labels: [__name__, "prom_name"] regex: (mcac_.*);.*(_micros_bucket|_bucket|_micros_count_total|_count_total|_total|_micros_sum|_sum|_stddev).* separator: ; target_label: __name__ replacement: ${1}${2} - regex: prom_name action: labeldrop
  1. Start your Prometheus server Docker container.
    Important: Be sure to change the path in the command below to point to the prometheus.yml file from above.

    plaintext
    docker run \ -d \ -p 9090:9090 \ -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \ prom/prometheus
  2. If your virtual machine is not available from the internet, install a Dynatrace Environment ActiveGate on your Ubuntu VM.
    Recommended: Set the group property on the installation.

Enable and configure extension

  1. In the Dynatrace menu, go to Dynatrace Hub.

  2. Search for Azure Managed Instance for Apache Cassandra and enable the extension.

  3. Verify that the Prometheus endpoint publishes the Cassandra metrics. Use either of these queries:

    {__name__=~"mcac.*"}

    http://<Prometheus Server URL>:9090/api/v1/query?query=%7B__name__%3D%7E%22mcac.*%22%7D

  4. Add the endpoint of your Prometheus server to the Extension Monitoring Configuration:

    http://<Prometheus Server URL>:9090/api/v1

    Note: The <Prometheus Server URL> does not need to be public. If you install your ActiveGate on the same VM or same VNet as the Prometheus server, localhost or a private IP can be used.

  5. Select the ActiveGate group on which to enable this extension.

  6. Add a Monitoring Configuration description and select the Feature Sets of the metrics you'd like to collect.

  7. A dashboard named Azure Managed Instance for Apache Cassandra Overview is provided with the extension.

Metrics

Available metrics are listed below.

Note:

  • Metric metadata and dimensions are available using the Data explorer after the extension is enabled.
  • See Apache Cassandra Monitoring Documentation for more information about collected metrics.

Cluster node metrics

Metric NameMetric KeyDescription
Storage Loadcom.dynatrace.extension.prometheus.azure_cassandra_storage_loadSize, in bytes, of the on-disk data size this node manages.
Storage Exceptionscom.dynatrace.extension.prometheus.azure_cassandra_storage_exceptions.countNumber of internal exceptions caught. In normal operation, this should be zero.
Commit Log Pending Taskscom.dynatrace.extension.prometheus.azure_cassandra_commit_log_pending_tasksNumber of commit log messages written but yet to be fsync'd.
Commit Log Completed Tasks Totalcom.dynatrace.extension.prometheus.azure_cassandra_commit_log_completed_tasks_total.countTotal number of commit log messages written since start/restart.
Buffer Pool Sizecom.dynatrace.extension.prometheus.azure_cassandra_buffer_pool_sizeSize, in bytes, of the managed buffer pool.
Buffer Pool Misses Totalcom.dynatrace.extension.prometheus.azure_cassandra_buffer_pool_misses_total.countThe number of misses in the pool. The higher this is, the more allocations incurred.
Client Connected Native Clientscom.dynatrace.extension.prometheus.azure_cassandra_client_connected_native_clientsNumber of clients connected to this node's native protocol server.
Client Auth Failure Totalcom.dynatrace.extension.prometheus.azure_cassandra_client_auth_failure_total.countNumber of clients who experience authentication failures.
Client Auth Success Totalcom.dynatrace.extension.prometheus.azure_cassandra_client_auth_success_total.countNumber of clients who successfully authenticate.
Storage Total Hints Totalcom.dynatrace.extension.prometheus.azure_cassandra_storage_total_hints_total.countNumber of hint messages written to this node since start/restart. Includes one entry for each host to be hinted per hint.
CQL Prepared Statements Executed Totalcom.dynatrace.extension.prometheus.azure_cassandra_cql_prepared_statements_executed_total.countNumber of prepared statements executed.
CQL Regular Statements Executed Totalcom.dynatrace.extension.prometheus.azure_cassandra_cql_regular_statements_executed_total.countNumber of non-prepared statements executed.
Dropped Messages Totalcom.dynatrace.extension.prometheus.azure_cassandra_dropped_messages_total.countNumber of dropped messages.
JVM GC Countcom.dynatrace.extension.prometheus.azure_cassandra_jvm_gc_count.countTotal number of collections that have occurred.
JVM GC Timecom.dynatrace.extension.prometheus.azure_cassandra_jvm_gc_time.countApproximate accumulated collection elapsed time in milliseconds.
JVM Memory Usedcom.dynatrace.extension.prometheus.azure_cassandra_jvm_memory_usedAmount of used memory in bytes.
JVM Memory Usagefunc:com.dynatrace.extension.prometheus.azure_cassandra_jvm_memory_usageRatio of used memory to maximum memory.
Thread Pools Active Taskscom.dynatrace.extension.prometheus.azure_cassandra_thread_pools_active_tasksNumber of tasks being actively worked on by this pool.
Thread Pools Total Blocked Tasks Totalcom.dynatrace.extension.prometheus.azure_cassandra_thread_pools_total_blocked_tasks_total.countNumber of tasks that were blocked due to queue saturation.
Thread Pools Completed Taskscom.dynatrace.extension.prometheus.azure_cassandra_thread_pools_completed_tasksNumber of tasks completed.
Client Request Latency Totalcom.dynatrace.extension.prometheus.azure_cassandra_client_request_latency_total.countLatency of client requests.
Client Request Failures Totalcom.dynatrace.extension.prometheus.azure_cassandra_client_request_failures_total.countNumber of transaction failures encountered.
Client Request Unavailables Totalcom.dynatrace.extension.prometheus.azure_cassandra_client_request_unavailables_total.countNumber of unavailable exceptions encountered.
Cache Hit Ratefunc:com.dynatrace.extension.prometheus.azure_cassandra_cache_hit_rateAll-time cache hit rate.
Cache Capacitycom.dynatrace.extension.prometheus.azure_cassandra_cache_capacityCache capacity in bytes.
Cache Misses Totalcom.dynatrace.extension.prometheus.azure_cassandra_cache_misses_total.countTotal number of cache misses.
Cache Sizecom.dynatrace.extension.prometheus.azure_cassandra_cache_sizeTotal size of occupied cache, in bytes.

Keyspace metrics

Metric NameMetric KeyDescription
Keyspace All Memtables Live Data Sizecom.dynatrace.extension.prometheus.azure_cassandra_keyspace_all_memtables_live_data_sizeTotal amount of live data stored in the memtables (2i and pending flush memtables included) that resides off-heap, excluding any data structure overhead.
Keyspace Bloom Filter Disk Space Usedcom.dynatrace.extension.prometheus.azure_cassandra_keyspace_bloom_filter_disk_space_usedDisk space used by bloom filter (in bytes).
Keyspace Live Disk Space Usedcom.dynatrace.extension.prometheus.azure_cassandra_keyspace_live_disk_space_usedDisk space used by SSTables belonging to this table (in bytes).
Keyspace Memtable Columns Countcom.dynatrace.extension.prometheus.azure_cassandra_keyspace_memtable_columns_count.gaugeTotal number of columns present in the memtable.
Keyspace Memtable Live Data Sizecom.dynatrace.extension.prometheus.azure_cassandra_keyspace_memtable_live_data_sizeTotal amount of live data stored in the memtable, excluding any data structure overhead.
Keyspace Memtable Switch Countcom.dynatrace.extension.prometheus.azure_cassandra_keyspace_memtable_switch_count.gaugeNumber of times that flush has resulted in the memtable being switched out.
Keyspace Pending Compactioncom.dynatrace.extension.prometheus.azure_cassandra_keyspace_pending_compactionEstimated number of compactions remaining to perform.
Keyspace Pending Flushescom.dynatrace.extension.prometheus.azure_cassandra_keyspace_pending_flushesEstimated number of flush tasks pending for this table.
Keyspace Read Total Latency Totalcom.dynatrace.extension.prometheus.azure_cassandra_keyspace_read_total_latency_total.countRead latency.
Keyspace Total Disk Space Usedcom.dynatrace.extension.prometheus.azure_cassandra_keyspace_total_disk_space_usedTotal disk space used by SSTables belonging to this table, including obsolete ones waiting for GC.
Keyspace Write Total Latency Totalcom.dynatrace.extension.prometheus.azure_cassandra_keyspace_write_total_latency_total.countWrite Latency.

Table metrics

Metric NameMetric KeyDescription
Table Bloom Filter Disk Space Usedcom.dynatrace.extension.prometheus.azure_cassandra_table_bloom_filter_disk_space_usedDisk space used by bloom filter (in bytes).
Table Bloom Filter False Positivescom.dynatrace.extension.prometheus.azure_cassandra_table_bloom_filter_false_positivesNumber of false positives on table's bloom filter.
Table Bloom Filter False Ratiofunc:com.dynatrace.extension.prometheus.azure_cassandra_table_bloom_filter_false_ratioFalse positive ratio of table's bloom filter.
Table Bytes Flushed Totalcom.dynatrace.extension.prometheus.azure_cassandra_table_bytes_flushed_total.countTotal number of bytes flushed since server start/restart.
Table Compaction Bytes Written Totalcom.dynatrace.extension.prometheus.azure_cassandra_table_compaction_bytes_written_total.countTotal number of bytes compacted since server start/restart.
Table Compression Ratiofunc:com.dynatrace.extension.prometheus.azure_cassandra_table_compression_ratioCurrent compression ratio for all SSTables.
Table Dropped Mutations Totalcom.dynatrace.extension.prometheus.azure_cassandra_table_dropped_mutations_total.countNumber of dropped mutations on this table.
Table Estimated Partition Countcom.dynatrace.extension.prometheus.azure_cassandra_table_estimated_partition_count.gaugeApproximate number of keys in table.
Table Key Cache Hit Ratefunc:com.dynatrace.extension.prometheus.azure_cassandra_table_key_cache_hit_rateKey cache hit rate for this table.
Table Live Disk Space Used Totalcom.dynatrace.extension.prometheus.azure_cassandra_table_live_disk_space_used_totalDisk space used by SSTables belonging to this table (in bytes).
Table Live SSTable Countcom.dynatrace.extension.prometheus.azure_cassandra_table_live_ss_table_count.gaugeNumber of SSTables on disk for this table.
Table Memtable Columns Countcom.dynatrace.extension.prometheus.azure_cassandra_table_memtable_columns_count.gaugeTotal number of columns present in the memtable.
Table Memtable Live Data Sizecom.dynatrace.extension.prometheus.azure_cassandra_table_memtable_live_data_sizeTotal amount of live data stored in the memtable, excluding any data structure overhead.
Table Memtable Switch Count Totalcom.dynatrace.extension.prometheus.azure_cassandra_table_memtable_switch_count_total.countNumber of times that flush has resulted in the memtable being switched out.
Table Pending Compactionscom.dynatrace.extension.prometheus.azure_cassandra_table_pending_compactionsEstimate of number of pending compactions for this table.
Table Pending Flushes Totalcom.dynatrace.extension.prometheus.azure_cassandra_table_pending_flushes_total.countEstimate of number of pending flushes for this table.
Table Read Total Latency Totalcom.dynatrace.extension.prometheus.azure_cassandra_table_read_total_latency_total.countRead latency for this table.
Table Row Cache Hit Totalcom.dynatrace.extension.prometheus.azure_cassandra_table_row_cache_hit_total.countNumber of table row cache hits.
Table Row Cache Miss Totalcom.dynatrace.extension.prometheus.azure_cassandra_table_row_cache_miss_total.countNumber of table row cache misses.
Table Total Disk Space Used Totalcom.dynatrace.extension.prometheus.azure_cassandra_table_total_disk_space_used_totalTotal disk space used by SSTables belonging to this table, including obsolete ones waiting to for GC.
Table Write Total Latency Totalcom.dynatrace.extension.prometheus.azure_cassandra_table_write_total_latency_total.countWrite latency for this table.