What would be the output distribution of ReLu activation?

Suppose my data has a normal distribution and I am using an NN as a model, wherein I am applying ReLu, non-linearity to it. I am curious to know how the output distribution of the ReLu looks like?


Suppose π‘‹βˆΌπ‘(πœ‡,𝜎2). What is the distribution of π‘Œ=ReLU(𝑋)=max{0,𝑋}?

Also, it would be great if anyone can add a visualization(Hand drawn would be fine!). it will help me understand in a better way. Furthermore, Any comments(Visualization) on how affine transformation would change the distribution & how ReLU afterward with hand-drawn figures would greatly help!
How I am thinking of is :

Please correct if I am wrong!


Your question seems to boil down to the following:

Suppose $X \sim N(\mu, \sigma^2)$.
What is the distribution of $Y = \operatorname{ReLU}(X) = \max\{0, X\}$?

Let $F_X$ and $F_Y$ denote the cumulative distribution functions of $X$ and $Y$, respectively.
Let $\Phi$ be the standard normal cumulative distribution function:
\Phi(z) = \int_{-\infty}^z \frac{1}{\sqrt{2 \pi}} e^{-z^2 / 2} \, dz,

so that
= \Phi\left(\frac{x – \mu}{\sigma}\right)

for all $x \in \mathbb{R}$.
If $y \in \mathbb{R}$, then
&= P(Y \leq y) \\
&= P(\max\{0, X\} \leq y) \\
&= P(0 \leq y, X \leq y) &&\text{(*)} \\
&= \begin{cases}
0, & \text{if $y < 0$}, \\
P(X \leq y), & \text{if $y \geq 0$}
\end{cases} \\
&= \begin{cases}
0, & \text{if $y < 0$}, \\
F_X(y), & \text{if $y \geq 0$}
\end{cases} \\
&= \begin{cases}
0, & \text{if $y < 0$}, \\
\Phi\left(\frac{y – \mu}{\sigma}\right), & \text{if $y \geq 0$}

(*) Here we used the fact that $\max\{a, b\} \leq c$ if and only if $a \leq c$ and $b \leq c$ (for any $a, b, c \in \mathbb{R}$).

It’s worth emphasizing that $F_Y$ is the cumulative distribution function.

I don’t know if this distribution has a name off the top of my head, but knowing the cumulative distribution function allows you to say everything there is to say about the distribution of $Y$.


Here is a plot of the cumulative distribution function of $Y$ for various distributions of $X$:

enter image description here

Note: the distribution of $Y$ is neither discrete nor continuous!
You can see that the distribution of $Y$ is not continuous since continuous distributions have continuous cumulative distribution functions (and $Y$ clearly does not), and $Y$ is not discrete because discrete distributions have piecewise constant cumulative distribution functions (which again $Y$ does not).
In particular, this means that $Y$ does not have a density function.

Effect of Affine Transformations

Suppose your neural network has $p$-dimensional $\mathbf{X} = (X_1, \ldots, X_p) \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ (multivariate normal with mean $\boldsymbol{\mu} \in \mathbb{R}^p$ and covariance matrix $\boldsymbol{\Sigma} \in \mathbb{R}^{p \times p}$).
Suppose that the next layer consists of $q$ units $\mathbf{Y} = (Y_1, \ldots, Y_q) \in \mathbb{R}^q$ given by an affine transformation followed by ReLU:
Y_i = \operatorname{ReLU}\left(b_i + \sum_{j=1}^p w_{i, j} X_j\right).

Let $\mathbf{X}^\prime = (X_1^\prime, \ldots, Y_q^\prime)$ denote the pre-activations:
X_i^\prime = b_i + \sum_{j=1}^p w_{i, j} X_j.

More concisely,
\mathbf{X}^\prime = \mathbf{b} + \mathbf{W} \mathbf{X},

where $\mathbf{b} = (b_1, \ldots, b_q)$ and $\mathbf{W}$ is the matrix of the $w_{i, j}$‘s.
Since $\mathbf{X}$ is multivariate normal, so is $\mathbf{X}^\prime$, and we have
\sim N_q(\mathbf{b} + \mathbf{W}\boldsymbol{\mu}, \mathbf{W} \boldsymbol{\Sigma} \mathbf{W}^\top).

In particular, each component $X_i^\prime$ of $\mathbf{X}^\prime$ is itself univariate normal with some mean and variance that can be read off from the joint mean and variance
Then we can apply the argument at the top of this answer to figure out the distribution of each activation $Y_i = \operatorname{ReLU}(X_i^\prime)$.

Source : Link , Question Author : Anu , Answer Author : Sycorax

Leave a Comment