Leaving aside the obvious issue of the low power of the chisquare in this sort of circumstance, imagine doing a chisquare goodness of test for some density with unspecified parameters, by binning the data.
For concreteness, let’s say an exponential distribution with unknown mean and a sample size of say 100.
In order to get a reasonable number of expected observations per bin some account would need to be taken of the data (e.g. if we chose to put 6 bins below the mean and 4 above it, that would still be using databased bin boundaries).
But this use of bins based on seeing the data would presumably affect the distribution of the test statistic under the null.
I have seen plenty of discussion about the fact that – if the parameters are estimated by maximum likelihood from the binned data – you lose 1 d.f per estimated parameter (an issue dating right back to Fisher vs Karl Pearson) – but I don’t recall reading anything about finding the bin boundaries themselves based on the data. (If you estimate them from the unbinned data, then with k bins the distribution of the test statistic lies somewhere between a χ2k and a χ2k−p.)
Does this databased choice of bins substantively impact significance level or power? Are there some approaches that matter more than others? If there is much of an effect, is it something that goes away in large samples?
If it does have a substantive impact, this would seem to make the use of a chisquared test when parameters are unknown almost useless in many cases (in spite of still being advocated in quite a few texts), unless you had a good apriori estimate of the parameter.
Discussion of the issues or pointers to references (preferably with a mention of their conclusions) would be useful.
Edit, pretty much an aside to the main question:
It occurs to me that there are potential solutions for the specific case of the exponential* (and the uniform come to think of it), but I am still interested in the more general issue of the impact choosing bin boundaries.
* For example, for the exponential, one might use the smallest observation (say it is equal to m) to get a very rough idea of where to place the bins (since the smallest observation is exponential with mean μ/n), and then test the remaining n−1 differences (xi−m) for exponentiality. Of course that might yield a very poor estimate of μ, and hence poor bin choices, though I suppose one might use the argument recursively in order to take the lowest two or three observations from which to choose reasonable bins and then test the differences of the remaining observations above the largest of those smallest order statistics for exponentiality)
Answer
The basic results of chisquare goodnessoffit testing can be understood hierarchically.
Level 0. The classical Pearson’s chisquare test statistic for testing a multinomial sample against a fixed probability vector p is
X2(p)=k∑i=1(X(n)i−npi)2npid→χ2k−1,
where X(n)i denotes the number of outcomes in the ith cell out of a sample of size n. This can be fruitfully viewed as the squared norm of the vector Yn=(Y(n)1,…,Y(n)k) where Y(n)i=(X(n)i−npi)/√npi which, by the multivariate central limit theorem converges in distribution as
Ynd→N(0,I−√p√pT).
From this we see that X2=‖ since \mathbf I – \sqrt{p}\sqrt{p}^T is idempotent of rank k1.
Level 1. At the next level of the hierarchy, we consider composite hypotheses with multinomial samples. Since the exact p of interest is unknown under the null hypothesis, we have to estimate it. If the null hypothesis is composite and composed of a linear subspace of dimension m, then maximum likelihood estimates (or other efficient estimators) of the p_i can be used as “plugin” estimators. Then, the statistic
X^2_1 = \sum_{i=1}^k \frac{(X^{(n)}_i – n \hat{p}_i)^2}{n \hat{p}_i} \stackrel{d}{\to} \chi_{km – 1}^2 \>,
under the null hypothesis.
Level 2. Consider the case of goodness of fit testing of a parametric model where the cells are fixed and known in advance: For example, we have a sample from an exponential distribution with rate \lambda and from this we produce a multinomial sample by binning over k cells, then the above result still holds provided that we use efficient estimates (e.g., MLEs) of the bin probabilities themselves using only the observed frequencies.
If the number of parameters for the distribution is m (e.g., m = 1 in the exponential case), then
X^2_2 = \sum_{i=1}^k \frac{(X^{(n)}_i – n \hat{p}_i)^2}{n \hat{p}_i} \stackrel{d}{\to} \chi_{km – 1}^2 \>,
where here \hat{p}_i can be taken to be the MLEs of the cell probabilities of the fixed, known cells corresponding to the given distribution of interest.
Level 3. But, wait! If we have a sample Z_1,\ldots,Z_n \sim F_\lambda, why shouldn’t we estimate \lambda efficiently first, and then use a chisquare statistic with our fixed, known cells? Well, we can, but in general we no longer get a chisquare distribution for the corresponding chisquare statistic. In fact, Chernoff and Lehmann (1954) showed that using MLEs to estimate the parameters and then plugging them back in to get estimates of the cell probabilities results in a nonchisquare distribution, in general. Under suitable regularity conditions, the distribution is (stochastically) between a \chi_{km1}^2 and a \chi_{k1}^2 random variable, with the distribution depending on the parameters.
Untuitively, this means that the limiting distribution of \mathbf Y_n is \mathcal N(0, \mathbf I – \sqrt{p_\lambda}\sqrt{p_\lambda}^T – \mathbf A(\lambda)).
We haven’t even talked about random cell boundaries yet, and we’re already in a bit of a tight spot! There are two ways out: One is to retreat back to Level 2, or at the very least not use efficient estimators (like MLEs) of the underlying parameters \lambda. The second approach is to try to undo the effects of \mathbf A(\lambda) in such a way as to recover a chisquare distribution.
There are several ways of going the latter route. They basically amount to premultiplying \mathbf Y_n by the “right” matrix \mathbf B(\hat{\lambda}). Then, the quadratic form
\mathbf Y_n^T \mathbf B^T \mathbf B \mathbf Y_n \stackrel{d}{\to} \chi_{k1}^2 \>,
where k is the number of cells.
Examples are the Rao–Robson–Nikulin statistic and the Dzhaparidze–Nikulin statistic.
Level 4. Random cells. In the case of random cells, under certain regularity conditions, we end up in the same situation as in Level 3 if we take the route of modifying the Pearson chisquare statistic. Locationscale families, in particular, behave very nicely. One common approach is to take our k cells each to have probability 1/k, nominally. So, our random cells are intervals of the form \hat{I}_j = \hat \mu + \hat\sigma I_{0,j} where I_{0,j} = [F^{1}((j1)/k), F^{1}(j/k)). This result has been further extended to the case where the number of random cells grows with the sample size.
References

A W. van der Vaart (1998), Asymptotic Statistics, Cambridge University Press. Chapter 17: ChiSquare Tests.

H. Chernoff and E. L. Lehmann (1954), The use of maximum likelihood estimates in \chi^2 tests for goodness of fit, Ann. Math. Statist., vol. 25, no. 3, 579–586.

F. C. Drost (1989), Generalized chisquare goodnessoffit tests for locationscale models when the number of classes tends to infinity, Ann. Stat, vol. 17, no. 3, 1285–1300.

M. S. Nikulin, M.S. (1973), Chisquare test for continuous distribution with
shift and scale parameters, Theory of Probability and its Application, vol. 19, no. 3, 559–568. 
K. O. Dzaparidze and M. S. Nikulin (1973), On a modification of the standard statistics of Pearson, Theory of Probability and its Application, vol. 19, no. 4, 851–853.

K. C. Rao and D. S. Robson (1974), A chisquare statistic for goodness of fit tests within exponential family, Comm. Statist., vol 3., no. 12, 1139–1153.

N. Balakrishnan, V. Voinov and M. S. Nikulin (2013), ChiSquared Goodness of Fit Tests With Applications, Academic Press.
Attribution
Source : Link , Question Author : Glen_b , Answer Author : cardinal