Distribution of ‘unmixed’ parts based on order of the mix

Suppose I have paired observations drawn i.i.d. as X_i \sim \mathcal{N}\left(0,\sigma_x^2\right), Y_i \sim \mathcal{N}\left(0,\sigma_y^2\right), for i=1,2,\ldots,n. Let Z_i = X_i + Y_i, and denote by Z_{i_j} the jth largest observed value of Z. What is the (conditional) distribution of X_{i_j}? (or equivalently, that of Y_{i_j})

That is, what is the distribution of X_i conditional on Z_i being the jth largest of n observed values of Z?

I am guessing that as \rho = \frac{\sigma_x}{\sigma_y} \to 0, the distribution of X_{i_j} converges to just the unconditional distribution of X, while as \rho \to \infty, the distribution of X_{i_j} converges to the unconditional distribution of the jth order statistic of X. In the middle, though, I am uncertain.


Observe that the random variable i_j is a function of \mathbf{Z} = (Z_1, \ldots, Z_n) only. For an n-vector, \mathbf{z}, we write i_j(\mathbf{z}) for the index of the jth largest coordinate. Let also P_z(A) = P(X_1 \in A \mid Z_1 = z) denote the conditional distribution of X_1 given Z_1.

If we break probabilities down according to the value of i_j and desintegrate w.r.t. \mathbf{Z} we get

P(X_{i_j} \in A) & = & \sum_{k} P(X_k \in A, i_j = k) \\
& = &\sum_k \int_{(i_j(z) = k)} P(X_k \in A \mid \mathbf{Z} = \mathbf{z}) P(\mathbf{Z} \in d\mathbf{z}) \\
& = & \sum_k \int_{(i_j(z) = k)} P(X_k \in A \mid Z_k = z_k) P(\mathbf{Z} \in d\mathbf{z}) \\
& = & \sum_k \int_{(i_j(z) = k)} P_{z_k}(A) P(\mathbf{Z} \in d\mathbf{z}) \\
& = & \int P_{z}(A) P(Z_{i_j} \in dz) \\

This argument is quite general and relies only on the stated i.i.d. assumptions, and Z_k could be any given function of (X_k, Y_k).

Under the assumptions of normal distributions (taking \sigma_y = 1) and Z_k being the sum, the conditional distribution of X_1 given Z_1 = z is
N\left(\frac{\sigma_x^2}{1+\sigma_x^2} z, \sigma_x^2\left(1 – \frac{\sigma_x^2}{1+\sigma_x^2}\right)\right)
and @probabilityislogic shows how to compute the distribution of Z_{i_j}, hence we have explicit expressions for both the distributions that enter in the last integral above. Whether the integral can be computed analytically is another question. You might be able to, but off the top of my head I can’t tell if it is possible. For asymptotic analysis when \sigma_x \to 0 or \sigma_x \to \infty it might not be necessary.

The intuition behind the computation above is that this is a conditional independence argument. Given Z_{k} = z the variables X_{k} and i_j are independent.

Source : Link , Question Author : shabbychef , Answer Author : NRH

Leave a Comment