I currently try to understand Likelihood Principle and I frankly don’t get it at all. So, I will write all my question as a list, even if those might be pretty basic questions.
- What exactly does “all of the information” phrase mean in the context of this principle? (as in all of the information in a sample is contained in the likelihood function.)
- Is the principle somehow connected to the very provable fact, that $p(x|y)\propto p(y|x)p(x)$? Is the “likelihood” in the principle the same thing, as $p(y|x)$, or not?
- How can a mathematical theorem be “controversial”? My (weak) understanding of math is that a theorem is either proven, or is not proven. To what category does Likelihood Principle fall?
- How is the Likelihood Principle important for Bayesian inference, which is based on $p(x|y)\propto p(y|x)p(x)$ formula?
The likelihood principle has been stated in many different ways, with variable meaning and intelligibility. A.W.F. Edwards’s book Likelihood is both an excellent introduction to many aspects of likelihood and still in print. This is how Edwards defines the likelihood principle:
“Within the framework of a statistical model, all of the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses.” (Edwards 1972, 1992 p. 30)
So now to answers.
“All of the information in the sample”, as you quote, is simply an inadequate expression of the relevant part of the likelihood principle. Edwards says it much better: the model matters and the relevant information is the information relating to the relative merits of hypotheses. It is useful to note that the likelihood ratio only makes sense where the hypotheses in question come from the same statistical model and are mutually exclusive. In effect, they have to be points on the same likelihood function for the ratio to be useful.
The likelihood principle is related to Bayes theorem, as you can see, but it is provable without reference to Bayes theorem. Yes, p(x|y) is (proportional to) a likelihood as long as x is data and y is a hypothesis (which might just be a hypothesised parameter value).
The likelihood principle is controversial because its proof has been contested. In my opinion the disproofs are faulty, but nonetheless it is controversial. (At a different level, it can be said that the likelihood principle is controversial because it implies that frequentist methods for inference are in some ways faulty. Some people don’t like that.) The likelihood principle has been proved, but its scope of relevance may be more constrained than its critics imagine.
The likelihood principle is important for Bayesian methods because the data enter into Bayes equation by way of the likelihoods. Most Bayesian methods are compliant with the likelihood principle, but not all. Some people, like Edwards and Royall, contend that inferences can be made on the basis of likelihood functions without use of Bayes theorem, “pure likelihood inference”. That is controversial as well. In fact, it is probably more controversial than the likelihood principle because Bayesians tend to agree with frequentists that pure likelihood methods are inappropriate. (My enemy’s enemy…)