• Home
  • Setup and configuration
  • Set up Dynatrace on container platforms
  • Kubernetes
  • Troubleshoot issues on Kubernetes/OpenShift

Troubleshoot issues on Kubernetes/OpenShift

Find out how to troubleshoot issues you might encounter in the following situations.

Debug logs

By default, OneAgent logs are located in /var/log/dynatrace/oneagent.

To debug Dynatrace Operator issues, run

bash
kubectl -n dynatrace logs -f deployment/dynatrace-operator
bash
oc -n dynatrace logs -f deployment/dynatrace-operator

You might also want to check the logs from OneAgent pods deployed through Dynatrace Operator.

bash
kubectl get pods -n dynatrace NAME READY STATUS RESTARTS AGE dynatrace-operator-64865586d4-nk5ng 1/1 Running 0 1d dynakube-oneagent-<id> 1/1 Running 0 22h
bash
kubectl logs oneagent-<id> -n dynatrace
bash
oc get pods -n dynatrace NAME READY STATUS RESTARTS AGE dynatrace-operator-64865586d4-nk5ng 1/1 Running 0 1d dynakube-classic-8r2kq 1/1 Running 0 22h
bash
oc logs oneagent-66qgb -n dynatrace

Issues occurring when setting up monitoring

Dynatrace Operator

Unable to retrieve the complete list of server APIs

Example error:

unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request

If the Dynatrace Operator pod logs this error, you need to identify and fix the problematic services. To identify them

  1. Check available resources.
bash
kubectl api-resources
  1. If the command returns this error, list all the API services and make sure there aren't any False services.
bash
kubectl get apiservice

Application-only monitoring

If you get a crash loop on the pods when you install OneAgent, you need to increase the CPU memory of the pods.

DaemonSet

Deployment seems successful but the `dynatrace-oneagent` container doesn't show up as ready
bash
kubectl get ds/dynatrace-oneagent --namespace=kube-system NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR AGE dynatrace-oneagent 1 1 0 1 0 beta.kubernetes.io/os=linux 14mc
bash
kubectl logs -f dynatrace-oneagent-abcde --namespace=kube-system 09:46:18 Started agent deployment as Docker image, PID 1234. 09:46:18 Agent installer can only be downloaded from secure location. Your installer URL should start with 'https': REPLACE_WITH_YOUR_URL

Change the value REPLACE_WITH_YOUR_URL in the dynatrace-oneagent.yml DaemonSet with the Dynatrace OneAgent installer URL.

bash
oc get pods NAME READY STATUS RESTARTS AGE dynatrace-oneagent-abcde 0/1 ErrImagePull 0 3s
bash
oc logs -f dynatrace-oneagent-abcde Error from server (BadRequest): container "dynatrace-oneagent" in pod "dynatrace-oneagent-abcde" is waiting to start: image can't be pulled

This is typically the case if the dynatrace service account hasn't been allowed to pull images from the RHCC.

Deployment seems successful, however the `dynatrace/oneagent` image can't be pulled

Example error:

bash
oc get pods NAME READY STATUS RESTARTS AGE dynatrace-oneagent-abcde 0/1 ErrImagePull 0 3s
bash
oc logs -f dynatrace-oneagent-abcde Error from server (BadRequest): container "dynatrace-oneagent" in pod "dynatrace-oneagent-abcde" is waiting to start: image can't be pulled

This is typically the case if the dynatrace service account hasn't been allowed to pull images from the RHCC.

Deployment seems successful, but the `dynatrace-oneagent` container doesn't produce meaningful logs

Example error:

bash
kubectl get pods --namespace=kube-system NAME READY STATUS RESTARTS AGE dynatrace-oneagent-abcde 0/1 ContainerCreating 0 3s
bash
kubectl logs -f dynatrace-oneagent-abcde --namespace=kube-system Error from server (BadRequest): container "dynatrace-oneagent" in pod "dynatrace-oneagent-abcde" is waiting to start: ContainerCreating
bash
oc get pods NAME READY STATUS RESTARTS AGE dynatrace-oneagent-abcde 0/1 ContainerCreating 0 3s
bash
oc logs -f dynatrace-oneagent-abcde Error from server (BadRequest): container "dynatrace-oneagent" in pod "dynatrace-oneagent-abcde" is waiting to start: ContainerCreating

This is typically the case if the container hasn't yet fully started. Simply wait a few more seconds.

Deployment seems successful, but the `dynatrace-oneagent` container isn't running
bash
oc process -f dynatrace-oneagent-template.yml ONEAGENT_INSTALLER_SCRIPT_URL="[oneagent-installer-script-url]" | oc apply -f - daemonset "dynatrace-oneagent" created

Please note that quotes are needed to protect the special shell characters in the OneAgent installer URL.

bash
oc get pods No resources found.

This is typically the case if the dynatrace service account hasn't been configured to run privileged pods.

bash
oc describe ds/dynatrace-oneagent Name: dynatrace-oneagent Image(s): dynatrace/oneagent Selector: name=dynatrace-oneagent Node-Selector: <none> Labels: template=dynatrace-oneagent Desired Number of Nodes Scheduled: 0 Current Number of Nodes Scheduled: 0 Number of Nodes Misscheduled: 0 Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------------ ------- 6m 3m 17 {daemon-set } Warning FailedCreate Error creating: pods "dynatrace-oneagent-" is forbidden: unable to validate against any security context constraint: [spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used spec.securityContext.hostIPC: Invalid value: true: Host IPC is not allowed to be used spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.containers[0].securityContext.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used spec.containers[0].securityContext.hostIPC: Invalid value: true: Host IPC is not allowed to be used]
Deployment was successful, but monitoring data isn't available in Dynatrace

Example:

bash
kubectl get pods --namespace=kube-system NAME READY STATUS RESTARTS AGE dynatrace-oneagent-abcde 1/1 Running 0 1m
bash
oc get pods NAME READY STATUS RESTARTS AGE dynatrace-oneagent-abcde 1/1 Running 0 1m

This is typically caused by a timing issue that occurs if application containers have started before OneAgent was fully installed on the system. As a consequence, some parts of your application run uninstrumented. To be on the safe side, OneAgent should be fully integrated before you start your application containers. If your application has already been running, restarting its containers will have the very same effect.

Error when applying the custom resource on GKE

Example error:

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.dynatrace.com": Post "https://dynatrace-webhook.dynatrace.svc:443/validate?timeout=2s (https://dynatrace-webhook.dynatrace.svc/validate?timeout=2s)": context deadline exceeded

If you are getting this error when trying to apply the custom resource on your GKE cluster, the firewall is blocking requests from the Kubernetes API to the Dynatrace Webhook because the required port (8443) is blocked by default.

The default allowed ports (443 and 10250) on GCP refer to the ports exposed by your nodes and pods, not the ports exposed by any Kubernetes services. For example, if the cluster control plane attempts to access a service on port 443 such as the Dynatrace webhook, but the service is implemented by a pod using port 8443, this is blocked by the firewall.

To fix this, add a firewall rule to explicitly allow ingress to port 8443.

For more information about this issue, see API request that triggers admission webhook timing out.

CannotPullContainerError error

If you get errors like this on your pods when installing Dynatrace OneAgent, your Docker download rate limit has been exceeded.

CannotPullContainerError: inspect image has been retried [X] time(s): httpReaderSeeker: failed open: unexpected status code

For details, consult the Docker documentation.

Connectivity issues between Dynatrace and your cluster

`ImagePullBackoff` error on OneAgent and ActiveGate pods

The underlying host's container runtime doesn't contain the certificate presented by your endpoint.

Note: The skipCertCheck field in the DynaKube YAML does not control this certificate check.

Example error:

desc = failed to pull and unpack image "<environment>/linux/activegate:latest": failed to resolve reference "<environment>/linux/activegate:latest": failed to do request: Head "<environment>/linux/activegate/manifests/latest": x509: certificate signed by unknown authority Warning Failed ... Error: ErrImagePull Normal BackOff ... Back-off pulling image "<environment>/linux/activegate:latest" Warning Failed ... Error: ImagePullBackOff

In this example, if the description on your pod shows x509: certificate signed by unknown authority, you must fix the certificates on your Kubernetes hosts, or use the private repository configuration to store the images.

There was an error with the TLS handshake

The certificate for the communication is invalid or expired. If you're using a self-signed certificate, check the mitigation procedures for the ActiveGate.

Invalid bearer token

The bearer token is invalid and the request has been rejected by the Kubernetes API. Verify the bearer token. Make sure it doesn't contain any whitespaces. If you're connecting to a Kubernetes cluster API via a centralized external role-based access control (RBAC), consult the documentation of the Kubernetes cluster manager. For Rancher, see the guidelines on the official Rancher website.

Could not check credentials. Process is started by other user

There is already a request pending for this integration with an ActiveGate. Wait for a couple minutes and check back.

Internal error occurred: failed calling webhook (...) x509: certificate signed by unknown authority

If you get this error after applying the DynaKube custom resource, your Kubernetes API server may be configured with a proxy. You need to exclude https://dynatrace-webhook.dynatrace.svc from that proxy.

Configuration and monitoring issues

The Kubernetes Monitoring Statistics extension can help you:

  • Troubleshoot your Kubernetes monitoring setup.
  • Troubleshoot your Prometheus integration setup.
  • Get detailed insights into queries from Dynatrace to the Kubernetes API.
  • Receive alerts when your Kubernetes monitoring setup experiences issues.
  • Get alerted on slow response times of your Kubernetes API.

Potential issues when changing the monitoring mode

  • Changing the monitoring mode from classicFullStackto cloudNativeFullStack affects the host ID calculations for monitored hosts, leading to new IDs being assigned and no connection between old and new entities.
  • If you want to change the monitoring method from applicationMonitoring or cloudNativeFullstack to classicFullstack or hostMonitoring, you need to restart all the pods that were previously instrumented with applicationMonitoring or cloudNativeFullstack.
Related topics
  • Kubernetes/OpenShift monitoring

    Monitor Kubernetes/OpenShift with Dynatrace.