Determine the optimum learning rate for gradient descent in linear regression

How can one determine the optimum learning rate for gradient descent? I’m thinking that I could automatically adjust it if the cost function returns a greater value than in the previous iteration (the algorithm will not converge), but I’m not really sure what new value should it take.


(Years later) look up the Barzilai-Borwein step size method; has a nice 3-page description. The author says

this approach works well, even for large dimensional problems

but it’s terrible for his applet of the 2d Rosenbrock function.
If anyone uses Barzilai-Borwein, please comment.

Source : Link , Question Author : Rad’Val , Answer Author : denis

Leave a Comment