5  Introduction

Reinforcement learning is

  1. a problem
  2. a class of solution methods
  3. the field that studies this problem and these methods.

The problem of reinforcement learning can be formulated as a Markov decision process.

Reinforcement learning is different from supervised learning and unsupervised learning.

The features of reinforcement learning include:

The elements of reinforcement learning include:

The limitations of reinforcement learning include:

Exercise 5.1 (Self-Play) Suppose, instead of playing against a random opponent, the reinforcement learning algorithm described above played against itself, with both sides learning. What do you think would happen in this case? Would it learn a different policy for selecting moves?

Exercise 5.2 (Symmetries) Many tic-tac-toe positions appear different but are really the same because of symmetries. How might we amend the learning process described above to take advantage of this? In what ways would this change improve the learning process? Now think again. Suppose the opponent did not take advantage of symmetries. In that case, should we? Is it true, then, that symmetrically equivalent positions should necessarily have the same value?