Advanced AI: Deep Reinforcement Learning in Python

5 Hours
You save 86% -

52 Lessons (5h)

  • Introduction and Logistics
    Introduction and Outline9:57
    Where to get the Code3:14
    How to Succeed in this Course8:45
  • Background Review
    Review Intro2:41
    Review of Markov Decision Processes7:47
    Review of Dynamic Programming4:12
    Review of Monte Carlo Methods3:55
    Review of Temporal Difference Learning4:41
    Review of Approximation Methods for Reinforcement Learning2:19
    Review of Deep Learning6:47
  • OpenAI Gym and Basic Reinforcement Learning Techniques
    OpenAI Gym Tutorial5:43
    Random Search5:48
    Saving a Video2:18
    CartPole with Bins (Theory)3:51
    CartPole with Bins (Code)6:25
    RBF Neural Networks
    RBF Networks with Mountain Car (Code)5:28
    RBF Networks with CartPole (Theory)1:54
    RBF Networks with CartPole (Code)3:11
    Theano Warmup3:04
    Tensorflow Warmup2:25
    Plugging in a Neural Network3:39
    OpenAI Gym Section Summary3:28
  • TD Lambda
    N-Step Methods3:14
    N-Step in Code3:40
    TD Lambda7:36
    TD Lambda in Code3:00
    TD Lambda Summary2:21
  • Policy Gradients
    Policy Gradient Methods11:38
    Policy Gradient in TensorFlow for CartPole7:19
    Policy Gradient in Theano for CartPole4:14
    Continuous Action Spaces4:16
    Mountain Car Continuous Specifics4:12
    Mountain Car Continuous Theano7:31
    Mountain Car Continuous Tensorflow8:07
    Mountain Car Continuous Tensorflow (v2)6:11
    Mountain Car Continuous Theano (v2)7:31
    Policy Gradient Section Summary1:36
  • Deep Q-Learning
    Deep Q-Learning Intro3:52
    Deep Q-Learning Techniques9:13
    Deep Q-Learning in Tensorflow for CartPole5:09
    Deep Q-Learning in Theano for CartPole4:48
    Additional Implementation Details for Atari5:36
    Deep Q-Learning in Tensorflow for Breakout5:58
    Deep Q-Learning in Theano for Breakout6:42
    Partially Observable MDPs4:52
    Deep Q-Learning Section Summary4:45
    Course Summary4:57
  • Appendix
    Environment Setup17:32
    How to Code by Yourself (part 1)15:54
    How to Code by Yourself (part 2)9:23
    Where to get Udemy coupons and FREE deep learning material2:20

The Complete Guide to Mastering AI Using Deep Learning & Neural Networks

Lazy Programmer

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.


This course is all about the application of deep learning and neural networks to reinforcement learning. The combination of deep learning with reinforcement learning has led to AlphaGo beating a world champion in the strategy game Go, it has led to self-driving cars, and it has led to machines that can play video games at a superhuman level. Unlike supervised and unsupervised learning algorithms, reinforcement learning agents have an impetus—they want to reach a goal. In this course, you'll work with more complex environments, specifically, those provided by the OpenAI Gym.

  • Access 52 lectures & 5 hours of content 24/7
  • Extend your knowledge of temporal difference learning by looking at the TD Lambda algorithm
  • Explore a special type of neural network called the RBF network
  • Look at the policy gradient method
  • Examine Deep Q-Learning


Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but knowledge of calculus, probability, object-oriented programming, Python, Numpy, linear regression, gradient descent, how to build a feedforward and convolutional neural network in Theano and TensorFlow, Markov Decision Processes, and how to implement Dynamic Programming, Monte Carlo, and Temporal Difference is expected
  • All code for this course is available for download here, in the directory rl2


  • Internet required


  • Unredeemed licenses can be returned for store credit within 30 days of purchase. Once your license is redeemed, all sales are final.