David Silver at UCL¶
https://www.davidsilver.uk/teaching/
Lecture 7: Policy Gradient Methods¶
https://www.youtube.com/watch?v=KHZVXao4qXs
https://www.davidsilver.uk/wp-content/uploads/2020/03/pg.pdf
Why use policy-based methods?
"The TD error is an unbiaised estimate of the advantage function"
Last update: April 9, 2020