How exactly to compute Deep Q-Learning Loss Function?

I have a doubt about how exactly the loss function of a Deep Q-Learning Network is trained. I am using a 2 layer feedforward network with linear output layer and relu hidden layers.

  1. Let’s suppose I have 4 possible actions. Thus, the output of my
    network for the current state st is Q(st)R4.
    To make it more concrete let’s assume Q(st)=[1.3,0.4,4.3,1.5]
  2. Now I take the action at=2 corresponding to the value 4.3 i.e
    the 3rd action, and reach a new state st+1.
  3. Next, I compute the forward pass with state st+1 and lets say I
    obtain the following values at the output layer Q(st+1)=[9.1,2.4,0.1,0.3]. Also let’s say the reward rt=2, and γ=1.0.
  4. Is the loss given by:

    L=(11.14.3)2

    OR

    L=143i=0([11.1,11.1,11.1,11.1][1.3,0.4,4.3,1.5])2

    OR

    L=143i=0([11.1,4.4,2.1,2.3][1.3,0.4,4.3,1.5])2

Thank you, sorry I had to write this out in a very basic way… I am confused by all the notation. ( I think the correct answer is the second one…)

Answer

After reviewing the equations a few more times. I think the correct loss is the following:

\mathcal{L} = (11.1 – 4.3)^2

My reasoning is that the q-learning update rule for the general case is only updating the q-value for a specific state,action pair.

Q(s,a) = r + \gamma \max_{a*}Q(s’,a*)

This equation means that the update happens only for one specific state,action pair and for the neural q-network that means the loss is calculated only for one specific output unit which corresponds to a specific action.

In the example provided Q(s,a) = 4.3 and the target is r + \gamma \max_{a*}Q(s’,a*) = 11.1.

Attribution
Source : Link , Question Author : A.D , Answer Author : A.D

Leave a Comment