Auxiliary tasks

UNREAL paper

Self-Supervised Learning for RL: Auxiliary tasks such as predicting the future conditioned on the past observation(s) and action(s) (Jaderberg et al., 2016; Shelhamer et al., 2016; van den Oord et al., 2018), and predicting the depth image for maze navigation (Mirowski et al., 2016) are a few representative examples of using auxiliary tasks to improve the sample-efficiency of model-free RL algorithms. The future prediction is either done in a pixel space (Jaderberg et al., 2016) or latent space (van den Oord et al., 2018). The sample-efficiency gains from reconstruction-based auxiliary losses have been benchmarked in Jaderberg et al. (2016); Higgins et al. (2017); Yarats et al. (2019). Contrastive learning across has been used to extract reward signals characterized as distance metrics in the latent space by

-- CURL paper

Last update: April 9, 2020