Find out how to troubleshoot issues you might encounter in the following situations.
Debug logs
By default, OneAgent logs are located in /var/log/dynatrace/oneagent
.
To debug Dynatrace Operator issues, run
You might also want to check the logs from OneAgent pods deployed through Dynatrace Operator.
Issues occurring when setting up monitoring
Dynatrace Operator
Unable to retrieve the complete list of server APIsExample error:
unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
If the Dynatrace Operator pod logs this error, you need to identify and fix the problematic services. To identify them
- Check available resources.
kubectl api-resources
- If the command returns this error, list all the API services and make sure there aren't any
False
services.
kubectl get apiservice
Application-only monitoring
If you get a crash loop on the pods when you install OneAgent, you need to increase the CPU memory of the pods.
DaemonSet
Deployment seems successful, but the `dynatrace-oneagent` container isn't runningoc process -f dynatrace-oneagent-template.yml ONEAGENT_INSTALLER_SCRIPT_URL="[oneagent-installer-script-url]" | oc apply -f -
daemonset "dynatrace-oneagent" created
Please note that quotes are needed to protect the special shell characters in the OneAgent installer URL.
oc get pods
No resources found.
This is typically the case if the dynatrace
service account hasn't been configured to run privileged pods.
oc describe ds/dynatrace-oneagent
Name: dynatrace-oneagent
Image(s): dynatrace/oneagent
Selector: name=dynatrace-oneagent
Node-Selector: <none>
Labels: template=dynatrace-oneagent
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Misscheduled: 0
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------------ -------
6m 3m 17 {daemon-set } Warning FailedCreate Error creating: pods "dynatrace-oneagent-" is forbidden: unable to validate against any security context constraint: [spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used spec.securityContext.hostIPC: Invalid value: true: Host IPC is not allowed to be used spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.containers[0].securityContext.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used spec.containers[0].securityContext.hostIPC: Invalid value: true: Host IPC is not allowed to be used]
Deployment was successful, but monitoring data isn't available in DynatraceExample:
This is typically caused by a timing issue that occurs if application containers have started before OneAgent was fully installed on the system. As a consequence, some parts of your application run uninstrumented. To be on the safe side, OneAgent should be fully integrated before you start your application containers. If your application has already been running, restarting its containers will have the very same effect.
No pods scheduled on control-plane nodesKubernetes version 1.24+Taints on master and control plane nodes are changed on Kubernetes versions 1.24+, and the OneAgent DaemonSet is missing appropriate tolerations in the DynaKube custom resource.
To add the necessary tolerations, edit the DynaKube YAML as follows.
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
Error when applying the custom resource on GKE
Example error:
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.dynatrace.com": Post "https://dynatrace-webhook.dynatrace.svc:443/validate?timeout=2s (https://dynatrace-webhook.dynatrace.svc/validate?timeout=2s)": context deadline exceeded
If you are getting this error when trying to apply the custom resource on your GKE cluster, the firewall is blocking requests from the Kubernetes API to the Dynatrace Webhook because the required port (8443) is blocked by default.
The default allowed ports (443 and 10250) on GCP refer to the ports exposed by your nodes and pods, not the ports exposed by any Kubernetes services. For example, if the cluster control plane attempts to access a service on port 443 such as the Dynatrace webhook, but the service is implemented by a pod using port 8443, this is blocked by the firewall.
To fix this, add a firewall rule to explicitly allow ingress to port 8443.
For more information about this issue, see API request that triggers admission webhook timing out.
CannotPullContainerError
error
If you get errors like this on your pods when installing Dynatrace OneAgent, your Docker download rate limit has been exceeded.
CannotPullContainerError: inspect image has been retried [X] time(s): httpReaderSeeker: failed open: unexpected status code
For details, consult the Docker documentation.
Connectivity issues between Dynatrace and your cluster
`ImagePullBackoff` error on OneAgent and ActiveGate podsThe underlying host's container runtime doesn't contain the certificate presented by your endpoint.
Note: The skipCertCheck
field in the DynaKube YAML does not control this certificate check.
Example error:
desc = failed to pull and unpack image "<environment>/linux/activegate:latest": failed to resolve reference "<environment>/linux/activegate:latest": failed to do request: Head "<environment>/linux/activegate/manifests/latest": x509: certificate signed by unknown authority
Warning Failed ... Error: ErrImagePull
Normal BackOff ... Back-off pulling image "<environment>/linux/activegate:latest"
Warning Failed ... Error: ImagePullBackOff
In this example, if the description on your pod shows x509: certificate signed by unknown authority
, you must fix the certificates on your Kubernetes hosts, or use the private repository configuration to store the images.
Invalid bearer tokenThe bearer token is invalid and the request has been rejected by the Kubernetes API. Verify the bearer token. Make sure it doesn't contain any whitespaces. If you're connecting to a Kubernetes cluster API via a centralized external role-based access control (RBAC), consult the documentation of the Kubernetes cluster manager. For Rancher, see the guidelines on the official Rancher website.
OneAgent unable to connect when using IstiocloudNativeFullStack applicationMonitoring
Example error in the logs on the OneAgent pods: Initial connect: not successful - retrying after xs
.
You can fix this problem by increasing the OneAgent timeout. Add the following feature flag to DynaKube:
Note: Be sure to replace the placeholder (<...>
) with the name of your DynaKube custom resource.
kubectl annotate dynakube <name-of-your-DynaKube> feature.dynatrace.com/oneagent-initial-connect-retry-ms=6000 -n dynatrace
Connectivity issues when using CalicoIf you use Calico to handle or restrict network connections, you might experience connectivity issues, such as:
- The operator, webhook, and CSI driver pods are constantly restarting
- The operator cannot reach the API
- The CSI driver fails to download OneAgent
- Injection into pods doesn't work
If you experience these or similar problems, use our GitHub sample policies for common problems.
Notes:
- For the
activegate-policy.yaml
and dynatrace-policies.yaml
policies, if Dynatrace Operator isn't installed in the dynatrace
namespace (Kubernetes) or project (OpenShift), you need to adapt the metadata and namespace properties in the YAML files accordingly.
- The purpose of the
agent-policy.yaml
and agent-policy-external-only.yaml
policies is to let OneAgents that are injected into pods open external connections. Only agent-policy-external-only.yaml
is required, while agent-policy.yaml
allows internal connections to be made, such as pod-to-pod connections, where needed.
- Because these policies are needed for all pods where OneAgent injects, you also need to adapt the
podSelector
property of the YAML files.
Configuration and monitoring issues
The Kubernetes Monitoring Statistics extension can help you:
- Troubleshoot your Kubernetes monitoring setup.
- Troubleshoot your Prometheus integration setup.
- Get detailed insights into queries from Dynatrace to the Kubernetes API.
- Receive alerts when your Kubernetes monitoring setup experiences issues.
- Get alerted on slow response times of your Kubernetes API.
Potential issues when changing the monitoring mode
- Changing the monitoring mode from
classicFullStack
to cloudNativeFullStack
affects the host ID calculations for monitored hosts, leading to new IDs being assigned and no connection between old and new entities.
- If you want to change the monitoring method from
applicationMonitoring
or cloudNativeFullstack
to classicFullstack
or hostMonitoring
, you need to restart all the pods that were previously instrumented with applicationMonitoring
or cloudNativeFullstack
.