I recently started reading Gelman and Hill’s, “Data Analysis Using Regression and Multilevel/Hierarchical Models” and the question is based on that:

The sample contains 6 observations on proportions: $p_{1}, p_{2}, \dots, p_{6}$

Each $p_{i}$ has mean $\pi_{i}$ and variance $\frac{\pi_{i}(1-\pi_{i})}{n_i}$, where $n_{i}$ is the number of observations used to compute proportion $p_{i}$.

The test statistic is $T_{i} = $ sample standard deviation of these proportions.

The book says that Expected value of the sample variance of the six proportions, $p_{1}, p_{2}, \dots, p_{6}$, is $(1/6)\sum_{i=1}^{6} \pi_{i}(1-\pi_{i})/n_{i}$. I understand all this.

What I want to know is the distribution of $T_{i}$ and its variance? Would appreciate if someone could let me know what it is, or guide me to a book or article that contains this information.

Thanks a ton.

**Answer**

The exact distributions for the proportions is $p_i \text{ ~ Bin}(n_i, \pi_i)/n_i$, and the proportions can take on values $p_i = 0, \frac{1}{n_i}, \frac{2}{n_i}, …, \frac{n_i-1}{n_i}, 1$. The resulting distribution of the sample standard deviation $T$ is a complicated discrete distribution. Letting $\boldsymbol{p} \equiv (p_1, p_2, …, p_6)$, it can be written in its most trivial form as:

$$F_T(t) \equiv \mathbb{P}(T \leqslant t) = \sum_{\boldsymbol{p \in \mathcal{P}(t)}} \prod_{i=1}^6 \text{Bin}( n_i p_i|n_i, \pi_i),$$

where $\mathcal{P}(t) \equiv \{ \boldsymbol{p}| T \leqslant t \}$ is the set of all proportion vectors that lead to a sample variance no greater than $t$. There is really no way to simplify this in the general case. Getting an exact probability from this distribution would require you to enumerate the proportion vectors that yield a sample variance in the range of interest, and then sum the binomial products over that enumerated range. It would be an onerous calculation exercise for even moderately large values of $n_1, …, n_6$.

Now, obviously the above distribution is not a very helpful form. All it really tells you is that you need to enumerate the outcomes of interest and then sum their probabilities. That is why it would be unusual to calculate exact probabilities in this case, and it is much easier to appeal to an asymptotic form for the distribution of the sample variance.

**Attribution***Source : Link , Question Author : Curious2learn , Answer Author : Ben*