What is predicted and controlled in reinforcement Learning?

In reinforcement learning, I saw many notions with respect to control and prediction, like Monte Carlo prediction and Monte Carlo control.

But what are we actually predicting and controlling?


The difference between prediction and control is to do with goals regarding the policy. The policy describes the way of acting depending on current state, and in the literature is often noted as $\pi(a|s)$, the probability of taking action $a$ when in state $s$.

So, my question is for prediction, predict what?

A prediction task in RL is where the policy is supplied, and the goal is to measure how well it performs. That is, to predict the expected total reward from any given state assuming the function $\pi(a|s)$ is fixed.

for control, control what?

A control task in RL is where the policy is not fixed, and the goal is to find the optimal policy. That is, to find the policy $\pi(a|s)$ that maximises the expected total reward from any given state.

A control algorithm based on value functions (of which Monte Carlo Control is one example) usually works by also solving the prediction problem, i.e. it predicts the values of acting in different ways, and adjusts the policy to choose the best actions at each step. As a result, the output of the value-based algorithms is usually an approximately optimal policy and the expected future rewards for following that policy.

Source : Link , Question Author : GoingMyWay , Answer Author : Neil Slater

Leave a Comment