# How to fit data that looks like a gaussian? [duplicate]

I am quite new to statistics, so please forgive me for using probably the wrong vocabulary.

I have some data that looks (to me) like a gaussian when plotted.

The data is an extract from a jpeg image. It’s a vertical line taken from the image, and only the Red data is used (from RGB).

Here is the full data (27 data points):

``````> r
 0.003921569 0.031372549 0.023529412 0.015686275 0.003921569 0.027450980
 0.003921569 0.015686275 0.031372549 0.105882353 0.305882353 0.490196078
 0.560784314 0.615686275 0.592156863 0.505882353 0.364705882 0.227450980
 0.050980392 0.031372549 0.019607843 0.054901961 0.031372549 0.015686275
 0.027450980 0.003921569 0.011764706

> dput(r)
c(0.00392156862745098, 0.0313725490196078, 0.0235294117647059,
0.0156862745098039, 0.00392156862745098, 0.0274509803921569,
0.00392156862745098, 0.0156862745098039, 0.0313725490196078,
0.105882352941176, 0.305882352941176, 0.490196078431373, 0.56078431372549,
0.615686274509804, 0.592156862745098, 0.505882352941176, 0.364705882352941,
0.227450980392157, 0.0509803921568627, 0.0313725490196078, 0.0196078431372549,
0.0549019607843137, 0.0313725490196078, 0.0156862745098039, 0.0274509803921569,
0.00392156862745098, 0.0117647058823529)
plot(r)
`````` I would like to find a gaussian that is as close as possible to the plot/data.

I tried with normalmixEM from the R package mixtools.

``````> fit = normalmixEM(r)
``````

but this seems to try to fit to a mix of two gaussian by default.

I tried to specify that there is only one gaussian using the parameter k:

``````> fit = normalmixEM(r, k = 1)
Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
arbmean and arbvar cannot both be FALSE
``````

How can I fit the data?

There’s a difference between fitting a gaussian distribution and fitting a gaussian density curve. What `normalmixEM` is doing is the former. What you want is (I guess) the latter.

Fitting a distribution is, roughly speaking, what you’d do if you made a histogram of your data, and tried to see what sort of shape it had. What you’re doing, instead, is simply plotting a curve. That curve happens to have a hump in the middle, like what you get by plotting a gaussian density function.

To get what you want, you can use something like `optim` to fit the curve to your data. The following code will use nonlinear least-squares to find the three parameters giving the best-fitting gaussian curve: `m` is the gaussian mean, `s` is the standard deviation, and `k` is an arbitrary scaling parameter (since the gaussian density is constrained to integrate to 1, whereas your data isn’t).

``````x <- seq_along(r)

f <- function(par)
{
m <- par
sd <- par
k <- par
rhat <- k * exp(-0.5 * ((x - m)/sd)^2)
sum((r - rhat)^2)
}

optim(c(15, 2, 1), f, method="BFGS", control=list(reltol=1e-9))
``````