Why is gradient descent required?

When we can differentiate the cost function and find parameters by solving equations obtained through partial differentiation with respect to every parameter and find out where the cost function is minimum.
Also I think its possible to find multiple places where the derivatives are zero, thereby we can check for all such places and can find global minima

why is gradient descent performed instead?


Even in the case of, say, linear models, where you have an analytical solution, it may still be best to use such an iterative solver.

As an example, if we consider linear regression, the explicit solution requires inverting a matrix which has complexity $O(N^3)$. This becomes prohibitive in the context of big data.

Also, a lot of problems in machine learning are convex, so using gradients ensure that we will get to the extrema.

As already pointed out, there are still relevant non-convex problems, like neural networks, where gradient methods (backpropagation) provide an efficient solver. Again this is specially relevant for the case of deep learning.

Source : Link , Question Author : Niranjan Kotha , Answer Author : Danica

Leave a Comment