6 Introduction

Reinforcement learning is

a problem
a class of solution methods
the field that studies this problem and these methods.

The problem of reinforcement learning can be formulated as a Markov decision process.

Reinforcement learning is different from supervised learning and unsupervised learning.

The features of reinforcement learning include:

the trade-off between exploration and exploitation
the whole problem of a goal-directed agent interacting with an uncertain environment

The elements of reinforcement learning include:

policy
reward signal
value function
model of the environment (optional)

The limitations of reinforcement learning include:

heavily rely on the concept of state
evolutionary methods can be effective for problems with small policy spaces, or the policy is well-structured.

Exercise 6.1 (Self-Play) Suppose, instead of playing against a random opponent, the reinforcement learning algorithm described above played against itself, with both sides learning. What do you think would happen in this case? Would it learn a different policy for selecting moves?

Exercise 6.2 (Symmetries) Many tic-tac-toe positions appear different but are really the same because of symmetries. How might we amend the learning process described above to take advantage of this? In what ways would this change improve the learning process? Now think again. Suppose the opponent did not take advantage of symmetries. In that case, should we? Is it true, then, that symmetrically equivalent positions should necessarily have the same value?