Why the letter Q was chosen in the name of Q-learning?
Most letters are chosen as an abbreviation, such as $\pi$ standing for policy and $v$ stands for value. But I don’t think Q is an abbreviation of any word.
I’m sorry to disappoint everyone, but Q doesn’t stand for anything 🙂
Q-learning was proposed by Watkins in his PhD thesis in 1989, see p.96. The Q in the equation on that page is updated in certain way at each step. The Q is the expected return from action at a given state, see the definition of Q on p.46. The return is in a economic or game theory sense, i.e. discounted probability weighted rewards, not a computer science term like a return from a function.
Notice, how he already used P for probability and R for reward, so he grabbed Q for the return. That’s it. There’s no deeper meaning for a choice of a letter Q.