Assume you had an alien year with an unknown length N. If you have a random sample of said aliens and some of them share birthdays, can you use this data to estimate the length of the year?
For example, in a sample of 100, you could have two triplets (ie. two birthdays each shared by three aliens) and five pairs and eighty-four singletons. In estimating N, the absolute minimum is 91 and the maximum is unbounded, but how would I find a reasonable expected value?
Assumptions include things like “all birthdays are equally likely”.
Unlike another question answered here, there is are known collisions in the room. Any sufficiently long year will have a strong likelihood of no collisions for a room of aliens. But very long years will have low odds of any collisions, and short years will have low odds of few collisions, thus providing a (theoretical) range for most-likely year lengths.
The expectation value of a distribution is calculated as E(X)=∑pixi. For this problem, we want to calculate the distribution of N given some collision criteria, or find E(N)=∑∞n=0pnn given some collision criteria, where pn=P(N=n).
Suppose you have some collision criteria as stated above, and let qn be the probability that the collision criteria is met given the length of the year is n. Then qn can be found by simply dividing the number of ways the collision criteria can be met by the number of ways birthdays can be arranged in general. Once qn is found for each possible n, then the only piece that is missing is translating qn to pn.
If we assume that pn is proportional to qn, then pn=αqn. Since ∑∞n=0pn=1, α∑∞n=0qn=1 and α=1∑∞n=0qn. Therefore, we just need a formula for qn to solve this problem.
For your example, let us first find the number of ways the collision criteria can happen given N=n. The first alien singleton can land on any day, so there are n possibilities. The next singleton can land on any day but the birthday of the first alien, so there are n−1 possibilities. Completing this for the first 84 singletons, we get n(n−1)(n−2)...(n−83) possible ways this can happen. Notice we also have 5 pairs and 2 triplets, so the “first” alien for each group must not land on the singleton pairs either. This leads to a n(n−1)(n−2)...(n−84−5−2+1) ways these aliens don’t collide (the clumsy syntax is for easier generalization later).
Next, the second alien for a given pair or triplet has 91 choices, the next has 90, etc., the total number of ways this can happen given the birthdays of the first 91 aliens is 91(91−1)(91−2)...(91−7+1). The remaining members of the triplets must fall on birthdays of the pairs, and the probability of that happening is 7∗6. We multiply the probabilities for these all together to get a total number of possible ways for the collision criteria to be met as:
At this point the pattern is clear, if we have a singletons, b pairs, and c triplets, we replace 84 with a, 5 with b, and 2 with c to get a generalized formula. I think it is also clear that the number of possible ways for the birthdays to be arranged in general is nm, where m is the total number of aliens in the problem. Therefore, the probability of meeting the collision criteria is the number of ways to meet the collision criteria divided by the number of ways the aliens could be born, or qn=rnnm.
Another interesting thing appeared in the formula of rn. Let yn=n(n−1)...(n−(a+b+c)+1)=n!(n−(a+b+c))!, and let zn be the remaining portion of rn so that rn=ynzn. Note that zn is independent of n, so we can simply write zn=z as a constant! Since pn=qn/∑∞i=0qi, and qn=zynnm, we can actually factor z out of the sum in the denominator. At this point, it cancels out with the portion from the numerator to get pn=ynnm/∑∞i=0(yiim). We can simplify yn further if we let s=a+b+c (or this can be thought of as the number of unique birthdays in the group of aliens), so that we get:
Now we have a (fairly) simple formula for pn, and therefore a (fairly) simple formula for E(N), where the only assumption made was that P(N=n) is proportional to qn (the probability of meeting the collision criteria given that N=n). I think this is a fair assumption to make, and someone smarter than me might even be able to prove that this assumption is associated with P(N=n) following a multinomial distribution. At this point we can calculate E(N) using numerical methods or make some approximation assumptions, as pn will approach 0 as n approaches ∞.