Maximum-Entropy Exploration Using Intrinsic Rewards
Presenter: Adam Kamoski
Faculty Sponsor: Rahul Kulkarni
School: UMass Boston
Research Area: Physics
ABSTRACT
The problem of efficient exploration has been a central focus in reinforcement learning for several decades, yet it remains an open problem to learn policies that sample novel states. A common approach involves using a maximum-entropy objective to encourage broad coverage of the state space. However, standard methods used to estimate entropy can be computationally expensive. Recent work in the average-reward setting has derived an analytical expression for entropy, suggesting a more efficient approach. Building on this result, we determine a reward function that implicitly induces an entropy-maximizing policy. We show that our experimental results agree with analytical predictions, and we demonstrate our framework's general applicability to the efficient exploration problem.