Google AI Platform monitoring
Dynatrace GCP integration leverages data collected from the Google Operation API to constantly monitor health and performance of Google Cloud Platform Services. While combining all relevant data into dashboards, it also enables alerting and event tracking.
Prerequisites
Add services and feature sets optional
After integration, Dynatrace automatically monitors a number of preset GCP services and feature sets (metrics). Besides these, you can add more services or feature sets to monitoring. For details, see Add or remove services.
For a list of feature sets available for this service, see Metric table.
View metrics
After deploying the integration, you can see metrics from monitored services in the Metrics browser, the Data explorer, and your dashboard tiles.
Metric table
The following feature sets are available for Google AI Platform.
Feature set | Name | Unit | GCP metric identifier |
---|---|---|---|
cloudml_job/default_metrics | Accelerator memory utilization | Percent | ml.googleapis.com/training/accelerator/memory/utilization |
cloudml_job/default_metrics | Accelerator utilization | Percent | ml.googleapis.com/training/accelerator/utilization |
cloudml_job/default_metrics | CPU utilization | Percent | ml.googleapis.com/training/cpu/utilization |
cloudml_job/default_metrics | Memory utilization | Percent | ml.googleapis.com/training/memory/utilization |
cloudml_job/default_metrics | Network bytes received | Byte | ml.googleapis.com/training/network/received_bytes_count |
cloudml_job/default_metrics | Network bytes sent | Byte | ml.googleapis.com/training/network/sent_bytes_count |
cloudml_model_version/default_metrics | Error count | Count | ml.googleapis.com/prediction/error_count |
cloudml_model_version/default_metrics | Latency | MicroSecond | ml.googleapis.com/prediction/latencies |
cloudml_model_version/default_metrics | Accelerator duty cycle | Percent | ml.googleapis.com/prediction/online/accelerator/duty_cycle |
cloudml_model_version/default_metrics | Accelerator memory usage | Byte | ml.googleapis.com/prediction/online/accelerator/memory/bytes_used |
cloudml_model_version/default_metrics | CPU usage | Percent | ml.googleapis.com/prediction/online/cpu/utilization |
cloudml_model_version/default_metrics | Memory usage | Byte | ml.googleapis.com/prediction/online/memory/bytes_used |
cloudml_model_version/default_metrics | Network bytes received | Byte | ml.googleapis.com/prediction/online/network/bytes_received |
cloudml_model_version/default_metrics | Network bytes sent | Byte | ml.googleapis.com/prediction/online/network/bytes_sent |
cloudml_model_version/default_metrics | Replica count | Count | ml.googleapis.com/prediction/online/replicas |
cloudml_model_version/default_metrics | Replica target | Count | ml.googleapis.com/prediction/online/target_replicas |
cloudml_model_version/default_metrics | Prediction count | Count | ml.googleapis.com/prediction/prediction_count |
cloudml_model_version/default_metrics | Response count | Count | ml.googleapis.com/prediction/response_count |