Finding most likely permutation

[Hoping that this is the right Stackexchange site; inspired from a true story seen at work]

Joe has a measuring instrument and n objects to be measured (say, a scale and n weights). He measures each one, obtaining a list of measurements X=[x1,xn]Rn.

Later on, he sends me the objects. I want to find the correspondence between each object and its respective measured value xi, but Joe forgot to number the objects or to sort them in any way that allows me to find which one is the i-th object. I therefore measure them again with a similar instrument, obtaining a list of values Y=[y1,yn]Rn.

If our instruments were perfectly accurate, then Y would be a permutation of X. However, our instruments are not perfect; while they both have perfect trueness, they have imperfect precision. In other words, if we measure the same object many times, the average value of the repeated measurements tends to the true value, but the results have (known) standard deviations σJ and σI (for Joe’s instrument and for mine, respectively). Therefore, the values in X will in general be different from the values in Y.

In the limiting case where all values are distinct from each other (that is, minxi,xjX{|xixj|}σJ and similarly \displaystyle\min_{y_i,y_j\in Y}\{|y_i-y_j|\}\gg\sigma_I), finding the correct permutation (that is, the correspondence between a value in X and the corresponding value in Y) is trivial. When this is not the case, however, how would one go to find the most likely permutation from X to Y from the data available?

Bonus questions: does the answer change if I no longer assume perfect trueness? Is the case \sigma_J=\sigma_I easier?

EDIT Forgot to ask: how do I compute the probability of a given permutation, that is, the probability that it is the correct one among the space of n! possible permutations? Is there a simple (preferably closed-form) expression for the probability of the optimal permutation (which appears to be the one corresponding to sorting both vectors, see whuber’s solution below – at least if the errors are normally distributed)?

EDIT 2 Per Aksakal observation (see comments to the question): assume that all true weights are strictly distinct (the measurements, both for me and for Joe, can be non-distinct values due to measurement error).


Provided the measurement errors are independent and identically Normally distributed for each instrument, the solution is to match the two sets of measurements in sorted order. Although this is intuitively obvious (comments posted shortly after the question was posted state this solution), it remains to prove it.

To this end, let the first set of measurements in sorted order be x_1\le x_2\le \cdots \le x_n and let the second set of measurements in sorted order be y_1\le y_2\le \cdots \le y_n. Let the error distributions have zero means and variances \sigma^2 for the X instrument and \tau^2 for the Y instrument. (I find this notation a little more congenial than the subscripting in the question.)

To find the most likely permutation, we solve the maximum likelihood problem. Its parameters are (a) the n true weights \theta_i corresponding to the objects measured by each x_i and (b) the permutation s that makes y_{s(i)} the second measurement of object i. Insofar as the likelihood depends on (\theta) and s, the likelihood of these observations is proportional to the exponential of

\mathcal{L}(\theta,s) = -\frac{1}{2}\sum_{i=1}^n \left(\frac{x_i-\theta_i}{\sigma}\right)^2 + \left(\frac{y_{s(i)}-\theta_i}{\tau}\right)^2.

For any given s, this expression (and therefore its exponential) is maximized term by term by taking

\hat\theta_i = \frac{\tau^2 x_i + \sigma^2 y_{s(i)}}{\sigma^2 + \tau^2}.

For these optimal values of \theta, the value of -2\mathcal{L} (which we wish to minimize) is

-2\mathcal{L}(\hat\theta,s) = \frac{1}{\sigma^2+\tau^2}\sum_{i=1}^n \left(x_i – y_{s(i)}\right)^2.

When each squared expression is expanded we obtain (a) a sum of the x_i^2, (b) a sum of the y_{s(i)}^2 (which equals the sum of the y_i^2 because s is a permutation), and (c) the cross terms,

-2\sum_{i=1}^n x_i y_{s(i)}.

The Rearrangement Inequality states that such sums of products are maximized (thereby maximizing \mathcal{L}(\hat\theta, s)) when the y_{s(i)} are in increasing order, QED.

This analysis relies on the Normality assumption. Although it can be relaxed, some distributional assumption is needed, as @fblundun perceptively points out in a comment to the question.

Source : Link , Question Author : GioMott , Answer Author : whuber

Leave a Comment