Requirements
The Extension must be installed both on an ActiveGate and on the Dynatrace Cluster. See Deploy an extension in our full documentation for more details.
- The alertmanager host must access the Activegate on the port you chose to run the listener on
1. Install the Extension on an Environment ActiveGate
Extract the extension zip file to the plugin_deployment
folder at the root of the remote plugin module.
Linux:
- A default installation can be done with the following command:
unzip -o -d /opt/dynatrace/remotepluginmodule/plugin_deployment custom.remote.python.dynatrace_alertmanager_receiver.zip
- Adjust the path if the Activegate was installed somewhere other than
/opt/dynatrace
Windows:
- Unzip the Extension to:
%PROGRAMFILES%\dynatrace\remotepluginmodule\plugin_deployment
- Adjust the path if the Activegate was installed somewhere other than
%PROGRAMFILES%\dynatrace
2. Upload the Extension to the Dynatrace Cluster via UI in the browser
Upload the same extension zip file to your tenant. The Extension configuration and upload UI is located at:
Settings > Monitored technologies > Custom extensions > Upload extension
Testing the webhook
The webhook can be tested locally with a curl command, like so:
curl -i 'http://localhost:9393/webhook' \
-d '{
"receiver":"dynatrace-receiver",
"status":"firing",
"alerts":[
{
"status":"firing",
"labels":{
"alertname":"TargetDown",
"job":"kubelet",
"namespace":"kube-system",
"prometheus":"kubelet",
"service":"kubelet",
"severity":"warning"
},
"annotations":{
"message":"11.11% of the kubelet/kubelet targets in kube-system"
},
"startsAt":"2021-03-19T01:35:45.72Z",
"endsAt":"0001-01-01T00:00:00Z",
"generatorURL":"http://openshift.com",
"fingerprint":"e425bb91067b6c9e"
}
],
"groupKey":"{}:{\"alertname\": \"Test Alert\", \"cluster\": \"Cluster 02\", \"service\": \"Service 01\"}",
"groupLabels":{
"alertname":"Test Alert",
"cluster":"Cluster 02",
"service":"Service 02"
},
"commonLabels":{
"alertname":"Test Alert",
"cluster":"Cluster 02",
"service":"Service 02"
},
"commonAnnotations":{
"annotation_01":"annotation 01",
"annotation_02":"annotation 03"
},
"externalURL":"http://8598cebf58a1:9093"
}'
Details
Details
A binary called dynatrace-receiver will always run on the Activegate machine, listening on the port that was configured on the extension settings.
This binary is responsible for a couple tasks:
- Listening for POST requests from the Alertmanager
- Periodically retrieve Problem IDs from Dynatrace and correlate those IDs with events sent
- Periodically resend events to Dynatrace, to keep problems opened
- Periodically delete stale events (events that have been opened for more than 5 days)
- Maintain a in disk cache of Problems (and alerts) and Custom Devices
- These caches are thread safe
For task number 1, there are several details involved when a new request arrives
- Parse the request to attempt construct a Custom Device Name
- If this cannot be done, a default Custom Device Name will be used
- Calculate the Custom Device ID from the Custom Device Name, check if this Custom Device ID already exists locally in a cache
- This is done without calling the Dynatrace API
- If the Custom Device ID does not exist on the cache, use the Dynatrace API to create the Custom Device ID
- Calculate a hash of the GroupKey of the request
- This will later be used to correlate opened problems with events
- Determine if this event opens a problem or not, based on the severity label
- Send the event to Dynatrace, store it locally on a Problem cache
- Determine if this event closes a problem, based on the Status field (firing or resolved)
- If the event closes a problem, check the cache to see if a Problem ID was already obtained for this GroupKey
- If the Problem ID does not exist yet, attempt to get it from Dynatrace by using the GroupKey hash
- If the Problem ID was found, close the Problem with a comment
- If the Problem ID was not found, nothing can be done, and the event is deleted from the cache
- This can happen for instance if the event that opened the problem was sent while the binary was not running, so we only get a resolved event without a firing one
- After the Problem is closed, the event is deleted from the cache
Tasks number 2, 3 and 4 are implemented as cron jobs, running inside the same binary
- Task number 2 runs every 2 minutes, it updates all events that currently do not have a Problem ID in the cache
- This is an important task because it allows us to later close Problems, this operation can only be done with the Problem ID
- Task number 3 runs every 30 minutes, it resends events in Dynatrace before they expire (they expire after 2 hours if not refreshed)
- Task number 4 runs every 2 hours, it deletes events older than 5 days from the cache
The logs and the caches are stored at
TEMP_FOLDER/dynatrace-receiver
By default, on linux this is /tmp/dynatrace-receiver