CPU cores and memory requirements

Dynatrace AppMon is a sophisticated application that requires resources for data processing, and user activity. Default Deployment Sizes describes certain hardware specifications in order to achieve a given level of throughput and user activity. If these recommendations are not followed, AppMon cannot achieve these throughput numbers and does not behave as desired.

Why AppMon has hardware recommendations?

AppMon is a complex and powerful software solution which performs a huge amount of complex computations. in the background. This includes every single transaction of your environment.

Example 1

Here a simple example to explain better what AppMon is actually doing in the background:

Let’s assume that the server has a load of 1000 transactions/sec. With 1 core the server has 1ms to complete all the required work for 1 transaction (1000ms x 1core/1000 transactions = 1ms/tx). In the case that the box has 24 cores, the server has 24ms to complete 1 transaction (1000ms x 24cores/1000 transactions = 24ms/tx). Differently formulated this could also mean 1 core = 1 transaction/ms or 24 cores = 24 transactions/ms.

Formula:
1000ms x number of cores / transactions/second = max. time for transaction
Transaction time vs core

Frankly, 24 transactions/ms sounds fairly good. But the situation can change immediately under a huge load.

Example 2

The server has a total load of 50 transactions per second. This is really low, but this should not spoil the example. Every transaction needs 1ms to execute. With a total of 50 transactions, a minimum of 50ms is needed to perform all the computation. But a single core can only handle 10 transactions within this timeframe. The server is no longer able to compute all required steps to finalize a transaction and therefore the server starts to skip transactions. In the case of 24 cores, you can distribute the load over all available cores and as a result gain much higher throughput. No more skipped transactions. Transation time vs cores timeline

Example 3

In this example, AppMon runs on a system with one physical core. You only need to focus on a few thread pools (out of 300+ possible “ready” threads) that use heavy amounts of CPU resources. Assume the scheduler allots time in figures of 50ms, and each thread has the same priority. Also ignore the OS threads in this example.

1 Real-Time Analysis Thread(R) / 1 Correlation Thread(C) / 1 Session Writer (SW)
Scheduler overview 1 core

In the previous example, there are times were certain threads are idle waiting for the scheduler to assign them CPU time. The wait time is exaggerated, however there are three threads, and one physical core, so there is idle time for threads since CPU resources are limited.

In an AppMon server running on a system with 24 physical cores, you can also only focus on a few thread pools (Out of 300+ possible “ready” threads) that use heavy amounts of CPU resources. Assume the scheduler allots time in figures of 50ms, and each thread has the same priority, and ignore the OS threads in this example as well.

12 Real-Time Analysis Threads(R) / 12 Correlation Threads(C)/ 3 Session Writer (SW)
Scheduler overview 24 cores

In the previous example, you can see two positive things happened.

  • The overall throughput increased since more threads can be assigned to a CPU in parallel.
  • Time waiting for the CPU has decreased greatly. The only wait time to account for is programmatic wait time, or parking/blocked time due to locking.

Here is a brief overview of all necessary components and computations the server has to accomplish for every single transaction:

  • Correlation & UEM correlation (if enabled)
    • Transaction stitching (find the correct paths for a node)
    • Garbage collection
    • Time calculations such as elapsed time and duration
    • Correlate Auto Sensor data
    • Calculate APIs
  • Postprocessing
    • PurePath categories
    • Application detection
  • Real time analysis
    • Measures
    • Business transactions
    • Baselining
    • Error detection
  • PureLytics
    • Session storage lead
    • PWH access
    • Analyzers
    • Aggregation
  • Storage
    • Serialization
    • Disk write
  • Incident center
    • Timer tasks
    • Alert notifier
  • Communication
    • Collector & agent communication (Compress/Decompress, Encrypt/Decrypt)
    • Client(s)
  • Web servers
    • Service pages
    • WebUI
    • REST interfaces
  • Frontend
    • PWH access
    • Storage access
    • Triggered analyzers
    • Aggregation
  • Data export
    • Business transactions export
    • PureLytics stream
    • Tasks & Jobs
    • Clean up task
    • Reports

The OS threads and the JVM G1 garbage collector threads (for Frontend and Backend) are also needed and are not covered here. Those threads scale with CPU cores too. Combining all threads from the Frontend, Backend, and G1 GC(for both), it is possible to have at least 300 or so threads running.

I upgraded to the next size and you want me to add more cores. This doesn’t make sense for me, because my current CPU is not fully utilized?

Basically, our requirements for CPU cores are threading related, or to be more precise concurrent threading execution of systems. The product is designed to use available resources as best as it can. Therefore, horsepower in terms of CPU cores is mandatory to guarantee the throughput. The following list shows the important parts of the product which requires cores:

  • Sometimes server has to work under peak load and not only in an average scenario. This ensures to have enough headroom and resources for high load scenarios. See Collect Sizing data for more details.
  • As stated in the previous section, AppMon has a lot of different components which require threads and resources. To optimize the overall throughput, a single component is not allowed to consume all CPU cores. If a component like the RTA or correlation is facing a resource limit, it cannot grab more cores to get rid of this bottleneck.

Why is the CPU clock speed also important?

First of all, more clock speed means you can compute more transactions quicker. For example, a transaction on a 2GHz CPU requires 10ms to be completed, it then takes ~6ms for a 2.6GHz CPU to complete it that same transaction.

Secondly, the product has of course a lot of synchronization points. In this case a faster CPU can dramatically reduce the execution time in a critical region. This leads to way less wait/sync times for other blocking threads.

Bare metals vs. virtualization

The previous examples show the product running on a physical machine. Our AppMon product can also run on virtual machines as well. However, with virtual machines you can tweak settings relating to CPU’s, Memory, Disks, and these resources can become “over-committed” and can be shared with other VM’s. Even though over-committing resources is usually fine for most software (assuming CPU, memory, and disk metrics are healthy) our AppMon product requires dedicated resources simply because the product was designed to be as close to real-time as possible. The previous examples only touch on the systems with the most CPU use. There are several other tasks that the AppMon Frontend process and Backend process must complete within milliseconds. Virtualization requirements & best practices can be found at the Virtualization requirements and best practices page.

Conclusion

To ensure a stable and well performing AppMon Server for certain sizes, our recommendations are important to follow. Otherwise the AppMon Server cannot perform under all workloads and scenarios. In addition, when running AppMon in a virtual machine, it is critical to dedicate virtual resources to the VM running the AppMon Server for optimal performance.