Troubleshoot issues on Kubernetes/OpenShift
Find out how to troubleshoot issues you might encounter in the following situations.
Debug logs
By default, OneAgent logs are located in /var/log/dynatrace/oneagent
.
To debug Dynatrace Operator issues, run
kubectl -n dynatrace logs -f deployment/dynatrace-operator
oc -n dynatrace logs -f deployment/dynatrace-operator
You might also want to check the logs from OneAgent pods deployed through Dynatrace Operator.
kubectl get pods -n dynatrace
NAME READY STATUS RESTARTS AGE
dynatrace-operator-64865586d4-nk5ng 1/1 Running 0 1d
dynakube-oneagent-<id> 1/1 Running 0 22h
kubectl logs oneagent-<id> -n dynatrace
oc get pods -n dynatrace
NAME READY STATUS RESTARTS AGE
dynatrace-operator-64865586d4-nk5ng 1/1 Running 0 1d
dynakube-classic-8r2kq 1/1 Running 0 22h
oc logs oneagent-66qgb -n dynatrace
Issues occurring when setting up monitoring
Dynatrace Operator
Application-only monitoring
If you get a crash loop on the pods when you install OneAgent, you need to increase the CPU memory of the pods.
DaemonSet
Error when applying the custom resource on GKE
Example error:
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.dynatrace.com": Post "https://dynatrace-webhook.dynatrace.svc:443/validate?timeout=2s (https://dynatrace-webhook.dynatrace.svc/validate?timeout=2s)": context deadline exceeded
If you are getting this error when trying to apply the custom resource on your GKE cluster, the firewall is blocking requests from the Kubernetes API to the Dynatrace Webhook because the required port (8443) is blocked by default.
The default allowed ports (443 and 10250) on GCP refer to the ports exposed by your nodes and pods, not the ports exposed by any Kubernetes services. For example, if the cluster control plane attempts to access a service on port 443 such as the Dynatrace webhook, but the service is implemented by a pod using port 8443, this is blocked by the firewall.
To fix this, add a firewall rule to explicitly allow ingress to port 8443.
For more information about this issue, see API request that triggers admission webhook timing out.
CannotPullContainerError
error
If you get errors like this on your pods when installing Dynatrace OneAgent, your Docker download rate limit has been exceeded.
CannotPullContainerError: inspect image has been retried [X] time(s): httpReaderSeeker: failed open: unexpected status code
For details, consult the Docker documentation.
Connectivity issues between Dynatrace and your cluster
Configuration and monitoring issues
The Kubernetes Monitoring Statistics extension can help you:
- Troubleshoot your Kubernetes monitoring setup.
- Troubleshoot your Prometheus integration setup.
- Get detailed insights into queries from Dynatrace to the Kubernetes API.
- Receive alerts when your Kubernetes monitoring setup experiences issues.
- Get alerted on slow response times of your Kubernetes API.
Potential issues when changing the monitoring mode
- Changing the monitoring mode from
classicFullStack
tocloudNativeFullStack
affects the host ID calculations for monitored hosts, leading to new IDs being assigned and no connection between old and new entities. - If you want to change the monitoring method from
applicationMonitoring
orcloudNativeFullstack
toclassicFullstack
orhostMonitoring
, you need to restart all the pods that were previously instrumented withapplicationMonitoring
orcloudNativeFullstack
.