I have been reading a lot about convoloutional neural networks and was wondering how they avoid the vanishing gradient problem. I know deep belief networks stack single level auto-encoders or other pre-trained shallow networks and can thus avoid this problem but I don’t know how it is avoided in CNNs.

According to Wikipedia:

“despite the above-mentioned “vanishing gradient problem,” the

superior processing power of GPUs makes plain back-propagation

feasible for deep feedforward neural networks with many layers.”I don’t understand why GPU processing would remove this problem?

**Answer**

The vanishing gradient problem requires us to use small learning rates with gradient descent which then needs many small steps to converge. This is a problem if you have a slow computer which takes a long time for each step. If you have a fast GPU which can perform many more steps in a day, this is less of a problem.

There are several ways to tackle the vanishing gradient problem. I would guess that the largest effect for CNNs came from switching from sigmoid nonlinear units to rectified linear units. If you consider a simple neural network whose error E depends on weight w_{ij} only through y_j, where

y_j = f\left( \sum_iw_{ij}x_i \right),

its gradient is

\begin{align}

\frac{\partial}{\partial w_{ij}} E

&= \frac{\partial E}{\partial y_j} \cdot \frac{\partial y_j}{\partial w_{ij}} \\

&= \frac{\partial E}{\partial y_j} \cdot f’\left(\sum_i w_{ij} x_i\right) x_i.

\end{align}

If f is the logistic sigmoid function, f’ will be close to zero for large inputs as well as small inputs. If f is a rectified linear unit,

\begin{align}

f(u) = \max\left(0, u\right),

\end{align}

the derivative is zero only for negative inputs and 1 for positive inputs. Another important contribution comes from properly initializing the weights. This paper looks like a good source for understanding the challenges in more details (although I haven’t read it yet):

http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf

**Attribution***Source : Link , Question Author : Aly , Answer Author : Lucas*