Skip to content

RL Insights David Silver at UCL
Type to start searching
    GitHub
    GitHub
    • Home
      • CURL: Contrastive Unsupervised Representations for Reinforcement Learning
      • Asynchronous Methods for Deep Reinforcement Learning
      • Test notebook 1
      • Test notebook 2
      • Notebook Pitfalls
      • Running Long Tasks in Notebooks
    • Lecture 7: Policy Gradient Methods

    David Silver at UCL¶

    https://www.davidsilver.uk/teaching/

    Lecture 7: Policy Gradient Methods¶

    https://www.youtube.com/watch?v=KHZVXao4qXs

    https://www.davidsilver.uk/wp-content/uploads/2020/03/pg.pdf

    Why use policy-based methods?

    "The TD error is an unbiaised estimate of the advantage function"


    Last update: April 9, 2020
    Copyright © 2020 Florian Laurent
    powered by MkDocs and Material for MkDocs