Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning


Contributors:
  • Kajetan Schweighofer
  • Markus Hofmarcher
  • Philipp Renz
  • Angela Bitto-Nemling
  • Vihang Patil
  • Sepp Hochreiter
NeurIPS Workshop | 2025

In real world, affecting the environment by a weak policy can be expensive or very risky, therefore hampers real world applications of reinforcement learning.
Offline Reinforcement Learning (RL) can learn policies from a given dataset without interacting with the environment. However, the dataset is the only source of information for an Offline RL algorithm and determines the performance of the learned policy. We still lack studies on how dataset characteristics influence different Offline RL algorithms. Therefore, we conducted a comprehensive empirical analysis of how dataset characteristics effect the performance of Offline RL algorithms for discrete action environments.

A dataset is characterized by two metrics:
(1) the average dataset return measured by the Trajectory Quality (TQ)
(2) the coverage measured by the State-Action Coverage (SACo). We found that variants of the off-policy Deep Q-Network family require datasets with high SACo to perform well. Algorithms that constrain the learned policy towards the given dataset perform well for datasets with high TQ or SACo. For datasets with high TQ, Behavior Cloning outperforms or performs similarly to the best Offline RL algorithms.

Meet the contributors

See all publications

Get involved

We enable the best engineers and researchers to work on challenging problems and develop cutting-edge solutions ready to be applied to real-world use cases. If you are curious about the many exciting opportunities waiting for you.
Full wave bg