# Expected value of softmax transformation of Gaussian random vector

Let $\mathbf w_1,\mathbf w_2,\ldots,\mathbf w_n \in \mathbb R^p$ and $\mathbf v \in \mathbb R^n$ be fixed vectors, and $\mathbf x \sim \mathcal N_p(\boldsymbol{\mu}, \mathbf{\Sigma})$ be an $p$-dimensional Gaussian random vector. Consider the random variable $s(\mathbf{x})$ defined by

$$sm(\mathbf{x}) := \frac{1}{1 + \sum_{i=1}^n\exp(-(\mathbf{w}_i^T\mathbf{x} + v_i))} = \frac{1}{1 -n + \sum_{i=1}^n\dfrac{1}{s(\mathbf w_i^T\mathbf x + v_i)}},$$

where $s(a) := (1 + \exp(-a))^{-1}$ is the sigmoid function.

Question: How to (approximately) compute the expected value of $s(\mathbf x)$.

Observations: $\mathbf w_i^T \mathbf x + v_i\sim \mathcal N_1 (\mathbf w_i^T\boldsymbol{\mu} + v_i, \mathbf w_i^T \mathbf \Sigma \mathbf w_i)$, and so by virtue of this post, one has

$$\mathbb E [s(\mathbf w_i^T \mathbf x + v_i)] \approx \Phi\left(\frac{\lambda\mathbf w_i^T\boldsymbol{\mu}+ v_i}{(1 + \lambda^2 \mathbf w_i^T \mathbf \Sigma \mathbf w_i)^{1/2}}\right),$$

for some fine-tuned $\lambda > 0$ (e.g $\lambda = \pi^2 / 8$ is claimed to be “good enough”), and $\Phi(z) := \frac{1}{\sqrt{2\pi}}\int_{-\infty}^z\exp(-t^2/2)dt$ is cdf of the standard unit Gaussian $\mathcal N_1(0,1)$.

Thus one can get a (potentially very crude approximation as follows):

$$\mathbb E [sm(\mathbf x)] \approx \frac{1}{1 -n + \sum_{i=1}^n\dfrac{1}{\mathbb E [s(\mathbf w_i^T\mathbf x + v_i)]}} \approx \frac{1}{1 -n + \sum_{i=1}^n\dfrac{1}{\Phi\left(\dfrac{\lambda\mathbf w_i^T\boldsymbol{\mu}+ v_i}{(1 + \lambda^2 \mathbf w_i^T \mathbf \Sigma \mathbf w_i)^{1/2}}\right)}}.$$