# Inference for the skeptical (but not math-averse) reader

I just watched a lecture on statistical inference (“comparing proportions and means”), part of an intro to stats online course. The material made as little sense to me as it always does (by now I must have seen this stuff dozens of times, spread out over the last three decades).

I’m looking for a book on “basic Stats-101” (point estimation, estimate assessment, statistical inference, hypothesis testing, study design) that takes seriously the problem of convincing a skeptical reader…

Below I give some examples of the type of question that the author I’m searching for would take seriously and know how to address convincingly.

But first let me take a minute to stress that in this post I’m not asking these questions. Please, do not answer them! I give them just as examples, and by way of “litmus test” (for the type of author of searching for).

1. If a “proportion” is simply the mean of a Boolean variable (i.e. one that takes only the values 0 and 1), why are different procedures taught for doing statistical inference with “proportions” and with “means”?

2. If the normal distribution is so robust that assuming normality gives good results even in cases when that data is not quite normally distributed, and if the t-distribution is so normal-looking, why all the fuss about using the t-distribution instead of the normal?

3. What exactly are “degrees of freedom”, and why do we worry about them?

4. What does it mean to speak of the “true” value of a parameter, considering that we are just using distributions that happen to look similar to the data?

5. How come “exploratory data analysis” is a good thing, while “data snooping” is an evil thing?

As I’ve said, I’m put off by the attitude that’s implied by a neglect of such questions. It’s not the “epistemological stance” that I want to see in someone who’s teaching me something. I’m looking for authors who respect the reader’s skepticism and rationality, and who know how to address them (without necessarily going off into pages and pages of formalisms and technicalities).

I realize that this is a tall order, and maybe especially so when it comes to statistics. Therefore, I don’t expect that many authors will have succeeded at it. But at the moment I’d be content with finding just one.

Let me add that I’m not math-averse. On the contrary, I love math. (I’m comfortable with analysis [aka “advanced calculus”], linear algebra, probability theory, even basic measure theory.)

That said, my interest at the moment is in “applied”, “practical”, “everyday”, “real-world” statistics (as opposed to theoretical niceties). (But I don’t want a cookbook either!)

FWIW, I have read the first few chapters of Data analysis using regression and multilevel/hierarchical models by Gelman and Hill, and I like the authors’ tone. Their focus is practical, but go into the theory when needed. They also often step back, and assess standard practices critically, and offer candid opinions that appeal to a skeptical reader’s commonsense. Unfortunately, these authors have not written a book devoted to the subject I’m asking about in this post (“Stats 101” stuff, as described above). I’m also aware that one of these authors (Gelman) co-authored the highly-regarded Bayesian data analysis, but, again, this is not what I’m looking for at the moment.

EDIT:

Dikran Marsupial raises the following objection:

I don’t think there is necessarily anything wrong with neglecting questions, there comes a point where addressing every question detracts from the exposition of the basic concepts which is often more important (especially in a stats 101 book!).

I agree with that. It would be more accurate for me to say that I’m looking for a “second look at basic stats.” In fact, with this as my motivation, I looked at the textbooks used at graduate courses on inference (say), and found that they too neglected questions like the ones I’ve listed. If anything, they seemed even less inclined to delve into such questions (so that they can focus on matters like the conditions for some convergence-or-other of this-or-that…).

The problem is that the more advanced books are addressed to a radically different population of readers, one where the “skepticism of the outsider” has been drastically depleted. IOW, those who are taking graduate-level statistics are past the point of being bothered by the questions that bother me. They’re not skeptical about any of this stuff anymore. (How did they get over the skepticism hump? Maybe some were never too critical in the first place, especially if they learned their stats fairly early on–I know I was not a particularly critical freshman myself, for example, though I did not take stats then. Others may have had teachers who filled in where their textbooks fell short. Some may have been clever enough to figure out the answers to such questions for themselves. Who knows.)

You have already got some good suggestions. Here are some more. First, two blogs that I read sporadically, and where questions such as you ask yourself are sometimes discussed. As they are blogs, you could even ask questions and get some very good answers! Here they come:

http://andrewgelman.com/ (Andrew Gelman)

http://errorstatistics.com/ (Deborah Mayo)

Box, Hunter & Hunter: Statistics for experimenters.

As the title says, this is a (“first”, but really, really … second) course for people which would like to design their own experiments, and so analyze them. Very high on the “why” part.

Then: D R Cox: Principles of Statistical Inference,
another very good book about the “why” not the “how”.

And, since you ask why means and proportions are treated differently, here is a book which does not do that:
http://www.amazon.com/Statistics-4th-David-Freedman/dp/0393929728/ref=sr_1_1?s=books&ie=UTF8&qid=1373395118&sr=1-1&keywords=freedman+statistics

Low on maths, high on principles.