Hindsight PRIORs for Reward Learning from Human Preferences

AuthorsMudit Verma, Rin Metcalf Susa

Preference based Reinforcement Learning (PbRL) has shown great promise in learning from human preference binary feedback on agent's trajectory behaviors, where one of the major goals is to reduce the number of queried human feedback. While the binary labels are a direct comment on the goodness of a trajectory behavior, there is still a need for resolving credit assignment especially in limited feedback. We propose our work, PRIor On Rewards (PRIOR) that learns a forward dynamics world model to approximate apriori selective attention over states which serves as a means to perform credit assignment over a given trajectory. Further, we propose an auxiliary objective that redistributes the total predicted return according to these PRIORs as a simple, yet effective means of improving reward learning performance. Our experiments on six robot-manipulation and three locomotion PbRL benchmarks demonstrates PRIOR's significant improvements in feedback-sample efficiency and reward recovery. Finally, we present our extensive ablations that study our design decisions and the ease of using PRIOR with existing PbRL methods.

Hindsight PRIORs for Reward Learning from Human Preferences

Related readings and updates.

On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

Symbol Guided Hindsight Priors for Reward Learning from Human Preferences

Discover opportunities in Machine Learning.