Skip to content

David Silver at UCL¶

https://www.davidsilver.uk/teaching/

Lecture 7: Policy Gradient Methods¶

https://www.youtube.com/watch?v=KHZVXao4qXs

https://www.davidsilver.uk/wp-content/uploads/2020/03/pg.pdf

Why use policy-based methods?

"The TD error is an unbiaised estimate of the advantage function"

Last update: April 9, 2020