• Home
  • Deploy
  • Kubernetes
  • Deployment
  • Troubleshooting
  • Connectivity issues between Dynatrace and Kubernetes cluster

Connectivity issues between Dynatrace and Kubernetes cluster

This guide explores common issues that may arise when monitoring Kubernetes with Dynatrace. It provides troubleshooting steps for various scenarios, such as pods getting stuck in the Terminating state after an upgrade, inability to retrieve the complete list of server APIs, and encountering a CrashLoopBackOff error when trying to downgrade OneAgent.

Problem with ActiveGate token

Example error on the ActiveGate deployment status page:

Problem with ActiveGate token (reason:Absent)

Example error on Dynatrace Operator logs:

bash
{"level":"info","ts":"2022-09-22T06:49:17.351Z","logger":"dynakube-controller","msg":"reconciling DynaKube","namespace":"dynatrace","name":"dynakube"} {"level":"info","ts":"2022-09-22T06:49:17.502Z","logger":"dynakube-controller","msg":"problem with token detected","dynakube":"dynakube","token":"APIToken","msg":"Token on secret dynatrace:dynakube missing scopes [activeGateTokenManagement.create]"}

Example error on DynaKube status:

bash
status: ... conditions: - message: Token on secret dynatrace:dynakube missing scopes [activeGateTokenManagement.create] reason: TokenScopeMissing status: "False" type: APIToken

Starting Dynatrace Operator version 0.9.0, Dynatrace Operator handles the ActiveGate token by default. If you're getting one of these errors, follow the instructions below, according to your Dynatrace Operator version.

  • For Dynatrace Operator versions earlier than 0.7.0: you need to upgrade to the latest Dynatrace Operator version.
  • For Dynatrace Operator version 0.7.0 or later, but earlier than version 0.9.0: you need to create a new API token. For instructions, see Tokens and permissions required: Dynatrace Operator token.

ImagePullBackoff error on OneAgent and ActiveGate pods

The underlying host's container runtime doesn't contain the certificate presented by your endpoint.

The skipCertCheck field in the DynaKube YAML doesn't control this certificate check.

Example error (the error message may vary):

plaintext
desc = failed to pull and unpack image "<environment>/linux/activegate:latest": failed to resolve reference "<environment>/linux/activegate:latest": failed to do request: Head "<environment>/linux/activegate/manifests/latest": x509: certificate signed by unknown authority Warning Failed ... Error: ErrImagePull Normal BackOff ... Back-off pulling image "<environment>/linux/activegate:latest" Warning Failed ... Error: ImagePullBackOff

In this example, if the description on your pod shows x509: certificate signed by unknown authority, you must fix the certificates on your Kubernetes hosts, or use the private repository configuration to store the images.

There was an error with the TLS handshake

The certificate for the communication is invalid or expired. If you're using a self-signed certificate, check the mitigation procedures for the ActiveGate.

Invalid bearer token

The bearer token is invalid and the request has been rejected by the Kubernetes API. Verify the bearer token. Make sure it doesn't contain any whitespaces. If you're connecting to a Kubernetes cluster API via a centralized external role-based access control (RBAC), consult the documentation of the Kubernetes cluster manager. For Rancher, see the guidelines on the official Rancher website.

Could not check credentials. Process is started by other user

There is already a request pending for this integration with an ActiveGate. Wait for a couple minutes and check back.

Internal error occurred: failed calling webhook (…) x509: certificate signed by unknown authority

If you get this error after applying the DynaKube custom resource, your Kubernetes API server may be configured with a proxy. You need to exclude https://dynatrace-webhook.dynatrace.svc from that proxy.

OneAgent unable to connect when using Istio

cloudNativeFullStack applicationMonitoring

Example error in the logs on the OneAgent pods: Initial connect: not successful - retrying after xs.

You can fix this problem by increasing the OneAgent timeout. Add the following feature flag to DynaKube:

bash
kubectl annotate dynakube <name-of-your-DynaKube> feature.dynatrace.com/oneagent-initial-connect-retry-ms=6000 -n dynatrace

Connectivity issues when using Calico

If you use Calico to handle or restrict network connections, you might experience connectivity issues, such as:

  • The operator, webhook, and CSI driver pods are constantly restarting
  • The operator cannot reach the API
  • The CSI driver fails to download OneAgent
  • Injection into pods doesn't work

If you experience these or similar problems, use our GitHub sample policies for common problems.

  • For the activegate-policy.yaml and dynatrace-policies.yaml policies, if Dynatrace Operator isn't installed in the dynatrace namespace (Kubernetes) or project (OpenShift), you need to adapt the metadata and namespace properties in the YAML files accordingly.
  • The purpose of the agent-policy.yaml and agent-policy-external-only.yaml policies is to let OneAgents that are injected into pods open external connections. Only agent-policy-external-only.yaml is required, while agent-policy.yaml allows internal connections to be made, such as pod-to-pod connections, where needed.
  • Because these policies are needed for all pods where OneAgent injects, you also need to adapt the podSelector property of the YAML files.