Reinforcement Learning: An Introduction.

Second edition, in progress., Richard S. Sutton and Andrew G. Barto (c) 2012, pp. 67-68.Solving a reinforcement learning task means, roughly, finding a policy

that achieves a lot of reward over the long run. For finite MDPs, we

can precisely define an optimal policy in the following way. Value

functions define a partial ordering over policies. A policy π is

defined to be better than or equal to a policy π′ if its expected

return is greater than or equal to that of π′, for all states. In other

words, π≥π′ if and only if vπ(s)≥vπ′(s), for all s∈S.There isThis is an optimal policy.

always at least one policy that is better than or equal to all other

policies.Why is there always at least one policy that is better than or equal to all other policies?

**Answer**

Just past the quoted part, the same paragraph actually tells you what this policy is: it is the one that takes the best action in every state. In an MDP, the action we take in one state does not affect rewards for actions taken in others, so we can simply maximize the policy state-by-state.

**Attribution***Source : Link , Question Author : sh1ng , Answer Author : Don Reba*