Introduction

Sequential Decision Making

There are many tasks in which an agent needs to "sequentially" decide which one of the many possible actions it should take. For example consider the following cliff-problem:

Grid with Cliff

There is a 4x12 grid. The task is simple: the agent (that little elf-looking guy) needs to move from the starting point at [3, 0] to the goal at [3, 11] without falling off the cliff.

Clearly, to solve the challenge, the agent must "decide" at each step "which one direction to move in" out of all possible directions. The sequence formed by the directions chosen by the agent at each step, during its journey from initial to final position, constitutes a solution path. Obviously, there can be many such solution-paths (all not necessarily optimal[1]) One such path is shown below:

Solution

Reinforcement Learning is one way to solve such kind of 'sequential decision making' problems.

Methods of Solving Sequential Decision Making Problems

Explicit Programming
Supervised learning
Optimization
Planning
Reinforcement Learning

What is Reinforcement Learning (RL)?

In the layman terms, RL is a way to learn sequential decision making using experience (or trial & error)

RL has the following characteristics:

No supervision is available to get the training. Learning happens via trial & error.
Decisions are to be taken one after the other. These are called 'sequential decisions'.
A decision taken in the very beginning can lead to a sub-optimal solution at the end. We call this as 'delayed reward'.

[1] optimality can be defined in different ways like least time taken, least steps taken, farthest from the cliff etc.