Welcome to Reinforcement Learning and Dynamic Programming


An intelligent system is expected to generate policies autonomously in order to achieve a goal, which is mostly to maximize a given reward function or minimize a given cost function.  Reinforcement learning is a set of methods in machine learning that can produce such policies. In order to learn optimal actions in an environment that is not fully comprehensible to itself, an intelligent system can use reinforcement algorithms to leverage its experience to figure out optimal policies. Nowadays, reinforcement learning techniques are successfully applied in various engineering fields, including robotics (DeepMind’s walking robot) and computers playing games (AlphaGo and TD-Gammon). Developed independently from reinforcement learning, dynamic programming is a set of algorithms in optimal control theory that generate policies assuming that the environment is fully comprehensible to the intelligent system. Therefore, dynamic programming provides an essential base to learn reinforcement learning. The course aims at building a fundamental understanding of both methods based on their intimate relations to each other and on their applications to similar problems.

The course consists of the following topics:

• a short introduction to machine learning
• an introduction to Markov decision processes
• basics of dynamic programming
• basics of reinforcement learning

• approximate dynamic programming for high dimensional problems

• approximate reinforcement learning for high dimensional problems
• an introduction to quantum decision models

Throughout the course, the source book

Reinforcement Learning and Dynamic Programming Using Function Approximators, Lucian Busoniu, Robert Babuska, Bart De Schutter, Damien Ernst, CRC Press, 2010

will be used for which MATLAB codes for numerical applications are available online on http://rlbook.busoniu.net/.

The goal of this course is to make doctoral students able to apply reinforcement learning and dynamic programming techniques to high dimensional machine learning problems.


Keywords: Machine learning, reinforcement learning, dynamic programming, Markov decision processes, policy iteration, value iteration, policy search, approximate reinforcement, approximate dynamic programming, approximate policy iteration, approximate value iteration, approximate policy search, quantum decision theory.


Prerequisite: mathematics on the level of Master in science and engineering

Organizer: Rafael Wisniewski, Øzkan Karabacak, Zheng-Hua Tan



Time: September 30th to October 4th, 2019


Zip code: 


Number of seats:

Deadline: September 9th 2019