Correlated Bernoulli trials, multivariate Bernoulli distribution?

I’m simplifying a research question that I have at work. Imagine that I have 5 coins and let’s call heads a success. These are VERY biased coins with probability of success p=0.1. Now, if the coins were independent, then getting the probability of at least 1 head or more is very simple, 1(11/10)5. In my scenario, my Bernoulli trials (coin tosses) are not independent. The only information I have access to are the probability of successes (each one is p=.1) and the theoretical Pearson correlations among the binary variables.

Is there any way to calculate the probability of one success or more only with this information? I’m trying to avoid a simulation-based approach because these theoretical results will be used to guide the accuracy of a simulation study. I have been looking into the multivariate Bernoulli distribution but I don’t think that I can fully specify it only with correlations and marginal probabilities of success. A friend of mine recommended constructing a Gaussian copula with bernoulli marginals (using the R package copula) and then using the pMvdc() function on a large sample to get the probability I want but I’m not exactly sure how to go about it with it.


No, this is impossible whenever you have three or more coins.

The case of two coins

Let us first see why it works for two coins as this provides some intuition about what breaks down in the case of more coins.

Let X and Y denote the Bernoulli distributed variables corresponding to the two cases, XBer(p), YBer(q). First, recall that the correlation of X and Y is


and since you know the marginals, you know E[X], E[Y], Var(X), and Var(Y), so by knowing the correlation, you also know E[XY]. Now, XY=1 if and only if both X=1 and Y=1, so

By knowing the marginals, you know p=P(X=1,Y=0)+P(X=1,Y=1), and q=P(X=0,Y=1)+P(X=1,Y=1). Since we just found that you know P(X=1,Y=1), this means that you also know P(X=1,Y=0) and P(X=0,Y=0), but now you’re done, as the probability you are looking for is


Now, I personally find all of this easier to see with a picture. Let Pij=P(X=i,Y=j). Then we may picture the various probabilities as forming a square:

Here, we saw that knowing the correlations meant that you could deduce P11, marked red, and that knowing the marginals, you knew the sum for each edge (one of which are indicated with a blue rectangle).

The case of three coins

This will not go as easily for three coins; intuitively it is not hard to see why: By knowing the marginals and the correlation, you know a total of 6=3+3 parameters, but the joint distribution has 23=8 outcomes, but by knowing the probabilities for 7 of those, you can figure out the last one; now, 7>6, so it seems reasonable that one could cook up two different joint distributions whose marginals and correlations are the same, and that one could permute the probabilities until the ones you are looking for will differ.

Let X, Y, and Z be the three variables, and let


In this case, the picture from above becomes the following:

enter image description here

The dimensions have been bumped by one: The red vertex has become several coloured edges, and the edge covered by a blue rectangle have become an entire face. Here, the blue plane indicates that by knowing the marginal, you know the sum of the probabilities within; for the one in the picture,


and similarly for all other faces in the cube. The coloured edges indicate that by knowing the correlations, you know the sum of the two probabilities connected by the edge. For example, by knowing corr(X,Y), you know E[XY] (exactly as above), and


So, this puts some limitations on possible joint distributions, but now we’ve reduced the exercise to the combinatorial exercise of putting numbers on the vertices of a cube. Without further ado, let us provide two joint distributions whose marginals and correlations are the same:

enter image description here

Here, divide all numbers by 100 to obtain a probability distribution. To see that these work and have the same marginals/correlations, simply note that the sum of probabilities on each face is 1/2 (meaning that the variables are Ber(1/2)), and that the sums for the vertices on the coloured edges agree in both cases (in this particular case, all correlations are in fact the same, but that’s doesn’t have to be the case in general).

Finally, the probabilities of getting at least one head, 1P000 and 1P000, are different in the two cases, which is what we wanted to prove.

For me, coming up with these examples came down to putting numbers on the cube to produce one example, and then simply modifying P111 and letting the changes propagate.

Edit: This is the point where I realized that you were actually working with fixed marginals, and that you know that each variable was Ber(1/10), but if the picture above makes sense, it is possible to tweak it until you have the desired marginals.

Four or more coins

Finally, when we have more than three coins it should not be surprising that we can cook up examples that fail, as we now have an even bigger discrepancy between the number of parameters required to describe the joint distribution and those provided to us by marginals and correlations.

Concretely, for any number of coins greater than three, you could simply consider the examples whose first three coins behave as in the two examples above and for which the outcomes of the final two coins are independent from all other coins.

Source : Link , Question Author : S. Punky , Answer Author : fuglede

Leave a Comment