Difference between Advantage Actor Critic and TD Actor Critic?

I have a question concerning actor critic methods in reinforcement learning.

In these slides (https://hadovanhasselt.files.wordpress.com/2016/01/pg1.pdf) different types of actor-critics are explained. Advantage actor critic and TD actor critic are mentioned in the last slide:

enter image description here

But when I look at the slide “Estimating the advantage function (2)”, it is said, that the advantage function can be approximated by the td error. Then the update rule includes the td error the same way as in TD actor critic.

So is advantage actor critic and td actor critic actually the same? Or is there a difference I don’t see?

Answer

Advantage can be approximated by TD error. This may be helpful especially if you want to update θ after each transition.

For the batch approaches, you can calculate Qw(A,S) e.g. by means of fitted Q-iteration and subsequently V(S). Using this, you have the general advantage function and your gradient change of the policy may be much more stable because it will be closer to global/actual advantage function.

Attribution
Source : Link , Question Author : needRhelp , Answer Author : Karel Macek

Leave a Comment