# Can neural network (e.g., convolutional neural network) have negative weights?

Is it possible to have negative weights (after enough epochs) for deep convolutional neural networks when we use ReLU for all the activation layers?

1. the regularization of the parameters (a.k.a. the weight decay); the variation in the parameter values makes prediction possible, and if the parameters are centered around zero (i.e. their mean is close to zero), then their $\ell 2$ norm (which is a standard regularizer) is low.
2. although the gradients of the output of a layer with respect to the layer parameters depend on the input to the layer (which are always positive assuming that the previous layer passes its outputs through a ReLU), however, the gradient of the error (which comes from the layers closer to the final output layers) may be positive or negative, making it possible for SGD to make some of the parameter values negative after taking the next gradient step. More specifically, let $I$, $O$, and $w$ denote the input, output, and parameters of a layer in a neural network. Also, let $E$ be the final error of the network induced by some training sample. The gradient of the error with respect to $w$ is computed as $\frac{\partial E}{\partial w} = \left( \sum_{k=1}^K\frac{\partial E}{\partial O_k} \right) \cdot \frac{\partial O_k}{\partial w}$; note that $O_k = O, \forall k$ (see picture below):