Generate uniformly distributed weights that sum to unity?

It is common to use weights in applications like mixture modeling and to linearly combine basis functions. Weights w_i must often obey w_i ≥ 0 and \sum_{i} w_i=1. I’d like to randomly choose a weight vector \mathbf{w} = (w_1, w_2, …) from a uniform distribution of such vectors.

It may be tempting to use w_i = \frac{\omega_i}{\sum_{j} \omega_j} where \omega_i \sim U(0, 1), however as discussed in the comments below, the distribution of \mathbf{w} is not uniform.

However, given the constraint \sum_{i} w_i=1, it seems that the underlying dimensionality of the problem is n-1, and that it should be possible to choose a \mathbf{w} by choosing n-1 parameters according to some distribution and then computing the corresponding \mathbf{w} from those parameters (because once n-1 of the weights are specified, the remaining weight is fully determined).

The problem appears to be similar to the sphere point picking problem (but, rather than picking 3-vectors whose ℓ_2 norm is unity, I want to pick n-vectors whose ℓ_1 norm is unity).

Thanks!

Answer

Choose \mathbf{x} \in [0,1]^{n-1} uniformly (by means of n-1 uniform reals in the interval [0,1]). Sort the coefficients so that 0 \le x_1 \le \cdots \le x_{n-1}. Set

\mathbf{w} = (x_1, x_2-x_1, x_3 – x_2, \ldots, x_{n-1} – x_{n-2}, 1 – x_{n-1}).

Because we can recover the sorted x_i by means of the partial sums of the w_i, the mapping \mathbf{x} \to \mathbf{w} is (n-1)! to 1; in particular, its image is the n-1 simplex in \mathbb{R}^n. Because (a) each swap in a sort is a linear transformation, (b) the preceding formula is linear, and (c) linear transformations preserve uniformity of distributions, the uniformity of \mathbf{x} implies the uniformity of \mathbf{w} on the n-1 simplex. In particular, note that the marginals of \mathbf{w} are not necessarily independent.

3D point plot

This 3D point plot shows the results of 2000 iterations of this algorithm for n=3. The points are confined to the simplex and are approximately uniformly distributed over it.


Because the execution time of this algorithm is O(n \log(n)) \gg O(n), it is inefficient for large n. But this does answer the question! A better way (in general) to generate uniformly distributed values on the n-1-simplex is to draw n uniform reals (x_1, \ldots, x_n) on the interval [0,1], compute

y_i = -\log(x_i)

(which makes each y_i positive with probability 1, whence their sum is almost surely nonzero) and set

\mathbf w = (y_1, y_2, \ldots, y_n) / (y_1 + y_2 + \cdots + y_n).

This works because each y_i has a \Gamma(1) distribution, which implies \mathbf w has a Dirichlet(1,1,1) distribution–and that is uniform.

[3D point plot 2]

Attribution
Source : Link , Question Author : Chris , Answer Author : whuber

Leave a Comment