Policy Gradient Methods¶

Part of policy-based methods

The existing variants applicable to both continuous and discrete domains, such as the on-policy asynchronous advantage actor critic (A3C) of Mnih et al. (2016), are sample inefficient.

-- ACER

Silver lecture on PG methods: https://www.youtube.com/watch?v=KHZVXao4qXs

Links¶

The best explanation of policy gradient is probably the lecture from David Silver at UCL
This post highlights how policy gradient can be seen as a way to do supervised learning without a true label: https://amoudgl.github.io/blog/policy-gradient/

Last update: April 9, 2020