Analyzing and Improving Floating-Point Compression for Application Monitoring Time Series
Contributors:
- Markus Weninger
High-volume telemetry from application monitoring systems presents significant data-storage and processing challenges. While lossless floating-point compressors (e.g., Gorilla, Chimp, and Elf) are effective, their performance is typically evaluated on academic benchmarks from domains such as finance or meteorology. We show that these benchmarks do not reflect application monitoring data, which is often highly repetitive and contains long sequences of non-varying values. Thus, when production data deviates from an algorithm's core assumptions, this ''dataset bias'' leads to suboptimal performance. This paper tackles this research-to-practice gap with three contributions. First, we analyze a large-scale application monitoring dataset, quantifying the ''dataset bias'' and identifying high redundancy as its key characteristic. Second, we propose a multi-layered compression strategy combining record-level run-length encoding (RLE) with XOR-based compressors, yielding ≈23% compression ratio improvements and ≈34% speedups over standalone algorithms. Third, we develop a novel, lightweight heuristic that dynamically selects a compressor for each data block based on its statistical profile. This adaptive approach achieves near-optimal compression (within 2% of the optimal ratio) and further improves compression ratio by over 13% compared to the best single RLE-enhanced compressor, with negligible computational overhead. We conclude that data-driven, adaptive strategies are superior to universal algorithms for compressing domain-specific monitoring data.
Get involved


