How to fit data that looks like a gaussian? [duplicate]

I am quite new to statistics, so please forgive me for using probably the wrong vocabulary.

I have some data that looks (to me) like a gaussian when plotted.

The data is an extract from a jpeg image. It’s a vertical line taken from the image, and only the Red data is used (from RGB).

Here is the full data (27 data points):

> r
 [1] 0.003921569 0.031372549 0.023529412 0.015686275 0.003921569 0.027450980
 [7] 0.003921569 0.015686275 0.031372549 0.105882353 0.305882353 0.490196078
[13] 0.560784314 0.615686275 0.592156863 0.505882353 0.364705882 0.227450980
[19] 0.050980392 0.031372549 0.019607843 0.054901961 0.031372549 0.015686275
[25] 0.027450980 0.003921569 0.011764706

> dput(r)
c(0.00392156862745098, 0.0313725490196078, 0.0235294117647059, 
0.0156862745098039, 0.00392156862745098, 0.0274509803921569, 
0.00392156862745098, 0.0156862745098039, 0.0313725490196078, 
0.105882352941176, 0.305882352941176, 0.490196078431373, 0.56078431372549, 
0.615686274509804, 0.592156862745098, 0.505882352941176, 0.364705882352941, 
0.227450980392157, 0.0509803921568627, 0.0313725490196078, 0.0196078431372549, 
0.0549019607843137, 0.0313725490196078, 0.0156862745098039, 0.0274509803921569, 
0.00392156862745098, 0.0117647058823529)
plot(r)

enter image description here

I would like to find a gaussian that is as close as possible to the plot/data.

I tried with normalmixEM from the R package mixtools.

> fit = normalmixEM(r)

but this seems to try to fit to a mix of two gaussian by default.

I tried to specify that there is only one gaussian using the parameter k:

> fit = normalmixEM(r, k = 1)
Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  : 
  arbmean and arbvar cannot both be FALSE

How can I fit the data?

Answer

There’s a difference between fitting a gaussian distribution and fitting a gaussian density curve. What normalmixEM is doing is the former. What you want is (I guess) the latter.

Fitting a distribution is, roughly speaking, what you’d do if you made a histogram of your data, and tried to see what sort of shape it had. What you’re doing, instead, is simply plotting a curve. That curve happens to have a hump in the middle, like what you get by plotting a gaussian density function.

To get what you want, you can use something like optim to fit the curve to your data. The following code will use nonlinear least-squares to find the three parameters giving the best-fitting gaussian curve: m is the gaussian mean, s is the standard deviation, and k is an arbitrary scaling parameter (since the gaussian density is constrained to integrate to 1, whereas your data isn’t).

x <- seq_along(r)

f <- function(par)
{
    m <- par[1]
    sd <- par[2]
    k <- par[3]
    rhat <- k * exp(-0.5 * ((x - m)/sd)^2)
    sum((r - rhat)^2)
}

optim(c(15, 2, 1), f, method="BFGS", control=list(reltol=1e-9))

Attribution
Source : Link , Question Author : Timothée HENRY , Answer Author : Hong Ooi

Leave a Comment