Sample Efficient Actor-Critic with Experience Replay (ACER)¶
Warning
Under construction! This page is still vastly incomplete.
Tldr
ACER - Actor-Critic with Experience Replay extends the parallel implementation of actor-critic methods described in A3C to the off-policy setting.
Published November 2016 - Influential (251 citations) - arXiv
Summary¶
"ACER may be understood as the off-policy counterpart of the A3C method."
Key concepts¶
- Uses ?? for variance reduction (GAE?)
- Uses Generalized Advantage Estimation
- Uses the the off-policy Retrace algorithm
- Uses parallel-training as in A3C
- Introduces Truncated Importance Sampling
- Introduces stochastic dueling network architectures
- Introduces efficient trust region policy optimization
Legacy¶
- Used in PPO (?)
Last update: April 9, 2020