For a linear model the OLS solution provides the best linear unbiased estimator for the parameters.

Of course we can trade in a bias for lower variance, e.g. ridge regression. But my question is regarding having no bias. Are there any other estimators that are somewhat commonly used, which are unbiased but with a higher variance than the OLS estimated parameters?

If I would have a huge data set I could of course sub-sample it and estimate the parameters with less data, and increase the variance. I assume this could be hypothetically useful.

This is more of a rhetorical question, because when I have read about BLUE estimators, a worse alternative is not provided. I guess that providing worse alternatives could also help people understand the power of BLUE estimators better.

**Answer**

One example that comes to mind is some GLS estimator that weights observations differently although that is not necessary when the Gauss-Markov assumptions are met (which the statistician may not know to be the case and hence apply still apply GLS).

Consider the case of a regression of yi, i=1,…,n on a constant for illustration (readily generalizes to general GLS estimators). Here, {yi} is assumed to be a random sample from a population with mean μ and variance σ2.

Then, we know that OLS is just ˆβ=ˉy, the sample mean. To emphasize the point that each observation is weighted with weight 1/n, write this as

ˆβ=n∑i=11nyi.

It is well-known that Var(ˆβ)=σ2/n.

Now, consider another estimator which can be written as

˜β=n∑i=1wiyi,

where the weights are such that ∑iwi=1. This ensures that the estimator is unbiased, as

E(n∑i=1wiyi)=n∑i=1wiE(yi)=n∑i=1wiμ=μ.

Its variance will exceed that of OLS unless wi=1/n for all i (in which case it will of course reduce to OLS), which can for instance be shown via a Lagrangian:

L=V(˜β)−λ(∑iwi−1)=∑iw2iσ2−λ(∑iwi−1),

with partial derivatives w.r.t. wi set to zero being equal to 2σ2wi−λ=0 for all i, and ∂L/∂λ=0 equaling ∑iwi−1=0. Solving the first set of derivatives for λ and equating them yields wi=wj, which implies wi=1/n minimizes the variance, by the requirement that the weights sum to one.

Here is a graphical illustration from a little simulation, created with the code below:

EDIT: In response to @kjetilbhalvorsen’s and @RichardHardy’s suggestions I also include the median of the yi, the MLE of the location parameter pf a t(4) distribution (I get warnings that `In log(s) : NaNs produced`

that I did not check further) and Huber’s estimator in the plot.

We observe that all estimators seem to be unbiased. However, the estimator that uses weights wi=(1±ϵ)/n as weights for either half of the sample is more variable, as are the median, the MLE of the t-distribution and Huber’s estimator (the latter only slightly so, see also here).

That the latter three are outperformed by the OLS solution is not immediately implied by the BLUE property (at least not to me), as it is not obvious if they are linear estimators (nor do I know if the MLE and Huber are unbiased).

```
library(MASS)
n <- 100
reps <- 1e6
epsilon <- 0.5
w <- c(rep((1+epsilon)/n,n/2),rep((1-epsilon)/n,n/2))
ols <- weightedestimator <- lad <- mle.t4 <- huberest <- rep(NA,reps)
for (i in 1:reps)
{
y <- rnorm(n)
ols[i] <- mean(y)
weightedestimator[i] <- crossprod(w,y)
lad[i] <- median(y)
mle.t4[i] <- fitdistr(y, "t", df=4)$estimate[1]
huberest[i] <- huber(y)$mu
}
plot(density(ols), col="purple", lwd=3, main="Kernel-estimate of density of OLS and other estimators",xlab="")
lines(density(weightedestimator), col="lightblue2", lwd=3)
lines(density(lad), col="salmon", lwd=3)
lines(density(mle.t4), col="green", lwd=3)
lines(density(huberest), col="#949413", lwd=3)
abline(v=0,lty=2)
legend('topright', c("OLS","weighted","median", "MLE t, 4 df", "Huber"), col=c("purple","lightblue","salmon","green", "#949413"), lwd=3)
```

**Attribution***Source : Link , Question Author : Gumeo , Answer Author : Christoph Hanck*