How can one determine the optimum learning rate for gradient descent? I’m thinking that I could automatically adjust it if the cost function returns a greater value than in the previous iteration (the algorithm will not converge), but I’m not really sure what new value should it take.

**Answer**

(Years later) look up the Barzilai-Borwein step size method;

onmyphd.com has a nice 3-page description. The author says

this approach works well, even for large dimensional problems

but it’s terrible for his applet of the 2d Rosenbrock function.

If anyone uses Barzilai-Borwein, please comment.

**Attribution***Source : Link , Question Author : Rad’Val , Answer Author : denis*