Policy Gradient Methods

Part of policy-based methods

The existing variants applicable to both continuous and discrete domains, such as the on-policy asynchronous advantage actor critic (A3C) of Mnih et al. (2016), are sample inefficient.

-- ACER

Silver lecture on PG methods: https://www.youtube.com/watch?v=KHZVXao4qXs


Last update: April 9, 2020