I made 100 measurements of a certain quantity, calculated mean and standard deviation (with MySQL), and got mean=0.58, SD=0.34.
The std seemed too high relative to the mean, so I made 1000 measurements. This time I got mean=0.572, SD=0.33.
I got frustrated by the high standard deviation, so I made 10,000 measurements. I got mean=0.5711, SD=0.34.
I thought maybe this was a bug in MySQL, so I tried to use the Excel functions, but got the same results.
Why does the standard deviation remain high even though I do so many measurements?
The standard deviation is a measurement of the “spread” of your data. The analogy I like to use is target shooting. If you’re an accurate shooter, your shots cluster very tightly around the bullseye (small standard deviation). If you’re not accurate, they are more spread out (large standard deviation).
Some data is fundamentally “all over the place”, and some is fundamentally tightly clustered about the mean.
If you take more measurements, you are getting a more accurate picture of the spread. You shouldn’t expect to get less spread–just less error in your measurement of a fundamental characteristic of the data.
If you have an inaccurate shooter take five shots, and an accurate shooter take five shots, you will get a not-too-reliable idea of their accuracy. Maybe the inaccurate shooter got lucky a few times, so the pattern is tighter than you would expect from him over the long haul. Similarly, maybe you caught the accurate shooter at a bad time and just happened to get two bad shots in the five, skewing the results.
If, instead, you have them each take a thousand shots, then you will be much more confident that you are getting a good look at their actual accuracy. It’s not the accuracy of the shooter changing as you get more data–it’s the confidence you have in the picture you are getting of their accuracy.