On the Theory of Offline Reinforcement Learning: Data Diversity, Posterior Sampling and Beyond



Thanh Nguyen-Tang (Johns Hopkins University)

Thanh Nguyen-Tang is a postdoctoral research fellow at the Department of Computer Science, Johns Hopkins University, US. His research focus is on characterizing the statistical and computational aspects of machine learning with the main topics including reinforcement learning, robust machine learning and transfer learning. He finished his PhD at Deakin University, Australia in 2022.



Short Abstract: We seek to understand what empowers sample-efficient learning from historical datasets for sequential decision-making, typically known as offline reinforcement learning (RL), in the context of (value) function approximation and which algorithms guarantee sample efficiency. In this paper, we extend our understanding of these important questions by (i) proposing a notion of data diversity that subsumes the previous notions of coverage measures in offline RL and (ii) using this notion to study three distinct classes of offline RL algorithms that are based on version spaces (VS), regularized optimization (RO), and posterior sampling (PS). We establish that VS-based, RO-based, and PS-based algorithms, under standard assumptions, achieve comparable sample efficiency, which recovers the state-of-the-art bounds when specializing in the finite function class case and linear model case. This is quite surprising, given the prior work showed an unfavourable sample complexity of the RO-based algorithm as compared to the VS-based algorithm, whereas PS was rarely considered in offline RL due to its explorative nature. Notably, the considered (model-free) PS-based algorithm is a novel method we propose.