Introduction
Sequential Decision Making
There are many tasks in which an agent needs to "sequentially" decide which one of the many possible actions it should take. For example consider the following cliff-problem:
There is a 4x12 grid. The task is simple: the agent (that little elf-looking guy) needs to move from the starting point at [3, 0] to the goal at [3, 11] without falling off the cliff.
Clearly, to solve the challenge, the agent must "decide" at each step "which one direction to move in" out of all possible directions. The sequence formed by the directions chosen by the agent at each step, during its journey from initial to final position, constitutes a solution path. Obviously, there can be many such solution-paths (all not necessarily optimal[1]) One such path is shown below:
Reinforcement Learning is one way to solve such kind of 'sequential decision making' problems.
Methods of Solving Sequential Decision Making Problems
- Explicit Programming
- Supervised learning
- Optimization
- Planning
- Reinforcement Learning
What is Reinforcement Learning (RL)?
In the layman terms, RL is a way to learn sequential decision making using experience (or trial & error)
- No supervision is available to get the training. Learning happens via trial & error.
- Decisions are to be taken one after the other. These are called 'sequential decisions'.
- A decision taken in the very beginning can lead to a sub-optimal solution at the end. We call this as 'delayed reward'.