Let’s say I am calculating heights (in cm) and the numbers must be higher than zero.
Here is the sample list:
0.77132064 0.02075195 0.63364823 0.74880388 0.49850701 0.22479665 0.19806286 0.76053071 0.16911084 0.08833981 Mean: 0.41138725956196015 Std: 0.2860541519582141
In this example, according to the normal distribution, 99.7% of the values must be between ±3 times the standard deviation from the mean. However, even twice the standard deviation becomes negative:
-2 x std calculation = 0.41138725956196015 - 0.2860541519582141 x 2 = -0,160721044354468
However, my numbers must be positive. So they must be above 0. I can ignore negative numbers but I doubt this is the correct way to calculate probabilities using standard deviation.
Can someone help me to understand if I am using this in correct way? Or do I need to chose a different method?
Well to be honest, math is math. It doesn’t matter if it is normal distribution or not. If it works with unsigned numbers, it should work with positive numbers as well! Am I wrong?
EDIT1: Added histogram
To be more clear, I have added my real data’s histogram
EDIT2: Some values
Mean: 0.007041500928135767 Percentile 50: 0.0052000000000000934 Percentile 90: 0.015500000000000047 Std: 0.0063790857035425025 Var: 4.06873389299246e-05
Answer
If your numbers can only be positive, then modeling them as a normal distribution may not be desirable depending on your use case, because the normal distribution is supported on all real numbers.
Perhaps you would want to model height as an exponential distribution, or maybe a truncated normal distribution?
EDIT: After seeing your data, it really looks like it might fit an exponential distribution well! You could estimate the λ parameter by taking, for example, a maximum likelihood approach.
Attribution
Source : Link , Question Author : Don Coder , Answer Author : Kevin Li