# Distribution of ‘unmixed’ parts based on order of the mix

Suppose I have paired observations drawn i.i.d. as $X_i \sim \mathcal{N}\left(0,\sigma_x^2\right), Y_i \sim \mathcal{N}\left(0,\sigma_y^2\right),$ for $i=1,2,\ldots,n$. Let $Z_i = X_i + Y_i,$ and denote by $Z_{i_j}$ the $j$th largest observed value of $Z$. What is the (conditional) distribution of $X_{i_j}$? (or equivalently, that of $Y_{i_j}$)

That is, what is the distribution of $X_i$ conditional on $Z_i$ being the $j$th largest of $n$ observed values of $Z$?

I am guessing that as $\rho = \frac{\sigma_x}{\sigma_y} \to 0$, the distribution of $X_{i_j}$ converges to just the unconditional distribution of $X$, while as $\rho \to \infty$, the distribution of $X_{i_j}$ converges to the unconditional distribution of the $j$th order statistic of $X$. In the middle, though, I am uncertain.

Observe that the random variable $i_j$ is a function of $\mathbf{Z} = (Z_1, \ldots, Z_n)$ only. For an $n$-vector, $\mathbf{z}$, we write $i_j(\mathbf{z})$ for the index of the $j$th largest coordinate. Let also $P_z(A) = P(X_1 \in A \mid Z_1 = z)$ denote the conditional distribution of $X_1$ given $Z_1$.

If we break probabilities down according to the value of $i_j$ and desintegrate w.r.t. $\mathbf{Z}$ we get

This argument is quite general and relies only on the stated i.i.d. assumptions, and $Z_k$ could be any given function of $(X_k, Y_k)$.

Under the assumptions of normal distributions (taking $\sigma_y = 1$) and $Z_k$ being the sum, the conditional distribution of $X_1$ given $Z_1 = z$ is

and @probabilityislogic shows how to compute the distribution of $Z_{i_j}$, hence we have explicit expressions for both the distributions that enter in the last integral above. Whether the integral can be computed analytically is another question. You might be able to, but off the top of my head I can’t tell if it is possible. For asymptotic analysis when $\sigma_x \to 0$ or $\sigma_x \to \infty$ it might not be necessary.

The intuition behind the computation above is that this is a conditional independence argument. Given $Z_{k} = z$ the variables $X_{k}$ and $i_j$ are independent.