# Is it appropriate to plot the mean in a histogram?

Is it “okay” to add a vertical line to a histogram to visualize the mean value?

It seems okay to me, but I’ve never seen this in textbooks and the likes, so I’m wondering if there’s some sort of convention not to do that?

The graph is for a term paper, I just want to make sure I don’t accidentally break some super important unspoken stats rule. 🙂

Of course, why not?

Here’s an example (one of dozens I found with a simple google search):

(Image source is is the measuring usability blog, here.)

I’ve seen means, means plus or minus a standard deviation, various quantiles (like median, quartiles, 10th and 90th percentiles) all displayed in various ways.

Instead of drawing a line right across the plot, you might mark information along the bottom of it – like so:

There’s an example (one of many to be found) with a boxplot across the top instead of at the bottom, here.

Sometimes people mark in the data:

(I have jittered the data locations slightly because the values were rounded to integers and you couldn’t see the relative density well.)

There’s an example of this kind, done in Stata, on this page (see the third one here)

Histograms are better with a little extra information – they can be misleading on their own

You just need to take care to explain what your plot consists of! (You’d want a better title and x-axis label than I used here, for starters. Plus an explanation in a figure caption explaining what you had marked on it.)

One last plot:

My plots are generated in R.

Edit:

As @gung surmised, `abline(v=mean...` was used to draw the mean-line across the plot and `rug` was used to draw the data values (though I actually used `rug(jitter(...` because the data was rounded to integers).

Here’s a way to do the boxplot in between the histogram and the axis:

``````hist(Davis2[,2],n=30)
boxplot(Davis2[,2],
``````

I’m not going to list what everything there is for, but you can check the arguments in the help (`?boxplot`) to find out what they’re for, and play with them yourself.

However, it’s not a general solution – I don’t guarantee it will always work as well as it does here (note I already changed the `at` and `boxwex` options*). If you don’t write an intelligent function to take care of everything, it’s necessary to pay attention to what everything does to make sure it’s doing what you want.

Here’s how to create the data I used (I was trying to show how Theil regression was really able to handle several influential outliers). It just happened to be data I was playing with when I first answered this question.

`````` library("car")
weight=c(150,130),height=c(NA,NA),repwt=c(55,50),repht=c(NA,NA))
* — an appropriate value for `at` is around -0.5 times the value of `boxwex`; that would be a good default if you write a function to do it; `boxwex` would need to be scaled in a way that relates to the y-scale (height) of the boxplot; I’d suggest 0.04 to 0.05 times the upper y-limit might often be okay.
`````` hist(Davis2[,2],n=30)