Approximating the mathematical expectation of the argmax of a Gaussian random vector

Let $X = \left( {{X_1},…,{X_n}} \right) \sim \mathcal{N}\left( {{\mathbf{\mu }},{\mathbf{\Sigma }}} \right)$ be a Gaussian random vector and $I = \mathop {\arg \max }\limits_{i = 1,n} {X_i}$.

$I$ has probability mass function

$\mathbb{P}\left( {I = i} \right) = \mathbb{P}\left( {{X_i} = \mathop {\max {X_j}}\limits_{j = 1,n} } \right) = \mathbb{P}\left( {{X_i} – \mathop {\max {X_j}}\limits_{j \ne i} > 0} \right)$

and mathematical expectation

$\mathbb{E}I = \sum\limits_{i = 1}^n {i\mathbb{P}\left( {I = i} \right)} $

Generally speaking, for large $n$ and arbitrary covariance matrix ${\mathbf{\Sigma }}$ , computing $\mathbb{E}I$ is very difficult because it requires the numerical evaluation of high-dimensional normal orthant integrals. So, apart from the IID and INID cases with a diagonal covariance matrix ${\mathbf{\Sigma }}$, banded covariance matrices and degenerate cases such as ${\mu _j} \gg {\mu _{i \ne j}}$ , under which conditions on ${\mathbf{\Sigma }}$ (e.g. correlation decay) can we get simple, easy-to-evaluate numerical approximations to $\mathbb{E}I$ (and $\mathbb{V}I$ as well)?

The covariance matrices ${\mathbf{\Sigma }}$ I’m interested in look like this:

Covariance matrix

Until now, I’ve not been able to find anything about this problem.

Related questions:

https://mathoverflow.net/questions/153039/maximal-component-of-a-multivariate-gaussian-distribution
Expectation of the softmax transform for Gaussian multivariate variables

Answer

You can use the law of large numbers to approximate your expectation pretty easily.

Edit: Analyticaly you can multiply a bunch of normal cdf evaluations together. For $i > 0$

\begin{align*}
\mathbb{P}\left( {I = i} \right) &= \mathbb{P}\left( {{X_i} = \mathop {\max {X_j}}\limits_{j = 1,n} } \right) \\
&= \prod_{j \ne i} \mathbb{P}\left( {{X_i} > X_j } \right) \\
&= \prod_{j \ne i} \mathbb{P}\left( Y_j \le 0 \right)
\end{align*}

where $Y_j \sim \text{Normal}(\mu_j – \mu_i, \Sigma_j + \Sigma_i)$.

If we code $i=0$ as the event that no data point is clearly a maximum, then $\mathbb{P}\left( I = 0 \right) = 1 – \sum_{k=1}^n \mathbb{P}\left( I = k \right)$

Attribution
Source : Link , Question Author : Community , Answer Author : Taylor

Leave a Comment