Sample Efficient Actor-Critic with Experience Replay (ACER)

Warning

Under construction! This page is still vastly incomplete.

Tldr

ACER - Actor-Critic with Experience Replay extends the parallel implementation of actor-critic methods described in A3C to the off-policy setting.

Published November 2016 - Influential (251 citations) - arXiv

Summary

"ACER may be understood as the off-policy counterpart of the A3C method."

Key concepts

  • Uses ?? for variance reduction (GAE?)
  • Uses Generalized Advantage Estimation
  • Uses the the off-policy Retrace algorithm
  • Uses parallel-training as in A3C
  • Introduces Truncated Importance Sampling
  • Introduces stochastic dueling network architectures
  • Introduces efficient trust region policy optimization

Legacy

  • Used in PPO (?)

Last update: April 9, 2020