I have a set of numbers which are assumed to be coming from a Poisson distribution. The set has some outliers also and because of that, maximum likelihood estimates are badly affected. I heard that robust estimation procedures can help in such a situation. Can any one explain how to do this? I am not a statistics student.

I found that the

`glmrob`

function in R can be used for this. (I am quite new to R). But I could not figure out, how to use that despite reading the manual pages. In particular I am unable to understand how to get a`forumula`

which is the first argument to the glmrob function.Thanks.

**Answer**

@cardinal has telegraphed an answer in comments. Let’s flesh it out. His point is that although general linear models (such as implemented by `lm`

and, in this case, `glmRob`

) appear intended to evaluate relationships among variables, they can be powerful tools for studying a single variable, too. The trick relies on the fact that **regressing data against a constant is just another way of estimating its average value (“location”)**.

As an example, generate some Poisson-distributed data:

```
set.seed(17)
x <- rpois(10, lambda=2)
```

In this case, `R`

will produce the vector (1,5,2,3,2,2,1,1,3,1) of values for `x`

from a Poisson distribution of mean 2. Estimate its location with `glmRob`

:

```
library(robust)
glmrob(x ~ 1, family=poisson())
```

The response tells us the *intercept* is estimated at 0.7268. Of course, anyone using a statistical method needs to know how it works: when you use generalized linear models with the Poisson family, the standard “link” function is the logarithm. This means the intercept is the *logarithm* of the estimated location. So we compute

```
exp(0.7268)
```

The result, 2.0685, is comfortably close to 2: the procedure seems to work. To *see* what it is doing, plot the data:

```
plot(x, ylim=c(0, max(x)))
abline(exp(0.7268), 0, col="red")
```

The fitted line is purely horizontal and therefore estimates the middle of the vertical values: our data. That’s all that’s going on.

To check robustness, let’s create a bad outlier by tacking a few zeros onto the first value of `x`

:

```
x[1] <- 100
```

This time, for greater flexibility in post-processing, we will save the output of `glmRob`

:

```
m <- glmrob(x ~ 1, family=poisson())
```

To obtain the estimated average we can request

```
exp(m$coefficients)
```

The value this time equals 2.496: a little off, but not too far off, given that the average value of `x`

(obtained as `mean(x)`

) is 12. That is the sense in which this procedure is “robust.” More information can be obtained via

```
summary(m)
```

Its output shows us, among other things, that the weight associated with the outlying value of 100 in `x[1]`

is just 0.02179, almost 0, pinpointing the suspected outlier.

**Attribution***Source : Link , Question Author : suresh , Answer Author : whuber*