Sample Efficient Actor-Critic with Experience Replay (ACER)¶

Warning

Under construction! This page is still vastly incomplete.

Tldr

ACER - Actor-Critic with Experience Replay extends the parallel implementation of actor-critic methods described in A3C to the off-policy setting.

Published November 2016 - Influential (251 citations) - arXiv

Summary¶

"ACER may be understood as the off-policy counterpart of the A3C method."

Key concepts¶

Uses ?? for variance reduction (GAE?)
Uses Generalized Advantage Estimation
Uses the the off-policy Retrace algorithm
Uses parallel-training as in A3C
Introduces Truncated Importance Sampling
Introduces stochastic dueling network architectures
Introduces efficient trust region policy optimization

Legacy¶

Used in PPO (?)

Last update: April 9, 2020