Why even have non-informative priors? They don’t provide information about θ. So why use them? Why not only use informative priors? For example, suppose θ∈[0,1]. Then is θ∼U(0,1) a non-informative prior for θ?
The debate about non-informative priors has been going on for ages, at least since the end of the 19th century with criticism by Bertrand and de Morgan about the lack of invariance of Laplace’s uniform priors (the same criticism reported by Stéphane Laurent in the above comments). This lack of invariance sounded like a death stroke for the Bayesian approach and, while some Bayesians were desperately trying to cling to specific distributions, using less than formal arguments, others had a vision of a larger picture where priors could be used in situations where there was hardly any prior information, besides the shape of the likelihood itself.
This vision is best represented by Jeffreys’ distributions, where the information matrix of the sampling model, I(θ), is turned into a prior distribution
which is most often improper, i.e. does not integrate to a finite value. The label “non-informative” associated with Jeffreys’ priors is rather unfortunate, as they represent an input from the statistician, hence are informative about something! Similarly, “objective” has an authoritative weight I dislike… I thus prefer the label “reference prior”, used for instance by José Bernado.
Those priors indeed give a reference against which one can compute either the reference estimator/test/prediction or one’s own estimator/test/prediction using a different prior motivated by subjective and objective items of information. To answer directly the question, “why not use only informative priors?”, there is actually no answer. A prior distribution is a choice made by the statistician, neither a state of Nature nor a hidden variable. In other words, there is no “best prior” that one “should use”. Because this is the nature of statistical inference that there is no “best answer”.
Hence my defence of the noninformative/reference choice! It is providing the same range of inferential tools as other priors, but gives answers that are only inspired by the shape of the likelihood function, rather than induced by some opinion about the range of the unknown parameters.