# Distribution of argmax of beta-distributed random variables

Let $$xi∼Beta(αi,βi)x_i \sim \text{Beta}(\alpha_i, \beta_i)$$ for $$i∈Ii \in I$$. Let $$j=argmaxi∈Ixij = \operatorname*{argmax}_{i \in I} x_i$$ (ties broken arbitrarily). What is the distribution of $$jj$$ in terms of $$α\alpha$$ and $$β\beta$$? Is there an efficient way to compute it, beyond pure sampling?

When the $$xix_i$$ are independent for $$1≤i≤d1\le i \le d$$ with distribution functions $$FiF_i$$ and density functions $$fi,f_i,$$ respectively, the chance that $$xjx_j$$ is the largest is (by the very definition of the distribution functions)

Pr(xj=max(xi,i∈I))=Pr(x1≤xj,x2≤xj,…,xd≤xj)=E[F1(xj)F2(xj)⋯Fj−1(xj)(1)Fj+1(xj)⋯Fd(xj)]=∫R[F1(xj)⋯Fj−1(xj) Fj+1(xj)⋯Fd(xj)]fj(xj)dxj.\begin{aligned} \Pr(x_j=\max(x_i,i\in\mathcal I)) &= \Pr(x_1 \le x_j, x_2 \le x_j, \ldots, x_d\le x_j) \\ &= E\left[F_1(x_j)F_2(x_j)\cdots F_{j-1}(x_j)(1) F_{j+1}(x_j) \cdots F_d(x_j)\right] \\ &= \int_{\mathbb{R}}\left[F_1(x_j)\cdots F_{j-1}(x_j)\ F_{j+1}(x_j) \cdots F_d(x_j)\right]f_j(x_j)\,\mathrm{d}x_j. \end{aligned}

Provided none of the $$\alpha_i\alpha_i$$ and $$\beta_i\beta_i$$ are really tiny, this is straightforward to obtain through numerical integration, as shown in the R function beta.argmax below. (When there is some possibility of tiny values, fussier code will be needed because the regions of highest density may have density values that overflow double precision arithmetic. As a practical matter, “tiny” means closer to $$00$$ than to $$1.1.$$)

As an example of its use, I generated $$d=8d=8$$ values for $$\alpha_i\alpha_i$$ and $$\beta_i.\beta_i.$$

d <- 8
set.seed(17)
alpha <- rexp(d) + 0.1
beta <- rexp(d) + 0.1


I then computed the probability distribution and double-checked it with a simulation of 100,000 iterations:

p <- beta.argmax(alpha, beta, stop.on.error=FALSE) # The calculation

x <- matrix(rbeta(d * 1e5, alpha, beta), nrow=d)   # The simulated x_j
p.hat <- tabulate(apply(x, 2, which.max), nbins=d) # Summary of the argmaxes

(signif(rbind(Calculated=p, Simulated=p.hat/sum(p.hat)), 3)) # Comparison
chisq.test(p.hat, p=p)                                       # Formal comparison


The output is

             [,1]   [,2]    [,3]  [,4]  [,5]   [,6]   [,7]  [,8]
Calculated 0.0247 0.0218 0.00230 0.124 0.451 0.0318 0.0341 0.311
Simulated  0.0245 0.0217 0.00225 0.125 0.451 0.0312 0.0346 0.311

Chi-squared test for given probabilities

data:  p.hat
X-squared = 2.468, df = 7, p-value = 0.9295


The agreement between calculation and simulation, shown in the first array, is excellent, as confirmed by the chi-squared test that follows it.

I did other tests with $$dd$$ as large as $$200,200,$$ keeping all $$\alpha_i\alpha_i$$ and $$\beta_i\beta_i$$ above $$0.5,0.5,$$ and the results have been consistent with the calculations. For even larger values of $$dd$$ the results worsen, indicating numerical problems. (I tested up to $$d=500.d=500.$$) These were cured (at some cost in computation time, which reached one minute) by improving the error tolerances in the numerical integration.

Here is the code.

beta.argmax <- function(alpha, beta, ...) {
lower <- min(qbeta(1e-9, alpha, beta))
upper <- max(qbeta(1-1e-9, alpha, beta))
p <- rep(NA_real_, length(alpha))
for (i in seq_along(p)) {
ff <- function(x) dbeta(x, alpha[i], beta[i], log=TRUE)
f <- Vectorize(function(x) sum(pbeta(x, alpha[-i], beta[-i], log.p=TRUE)))
h <- function(x) exp(ff(x) + f(x))
p[i] <- integrate(h, lower, upper, ...)\$value
}
cat(sum(p), "\n") # Optional check: see how close to 1.000000 the sum is
p / sum(p)
}