Why are rlm() regression coefficient estimates different than lm() in R?

I am using rlm in the R MASS package to regress a multivariate linear model. It works well for a number of samples but I am getting quasi-null coefficients for a particular model:

Call: rlm(formula = Y ~ X1 + X2 + X3 + X4, data = mymodel, maxit = 50, na.action = na.omit)
       Min         1Q     Median         3Q        Max 
-7.981e+01 -6.022e-03 -1.696e-04  8.458e-03  7.706e+01 

             Value    Std. Error t value 
(Intercept)    0.0002   0.0001     1.8418
X1             0.0004   0.0000    13.4478
X2            -0.0004   0.0000   -23.1100
X3            -0.0001   0.0002    -0.5511
X4             0.0006   0.0001     8.1489

Residual standard error: 0.01086 on 49052 degrees of freedom
  (83 observations deleted due to missingness)

For comparison, these are the coefficients calculated by lm():

lm(formula = Y ~ X1 + X2 + X3 + X4, data = mymodel, na.action = na.omit)

    Min      1Q  Median      3Q     Max 
-76.784  -0.459   0.017   0.538  78.665 

              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.016633   0.011622  -1.431    0.152    
X1            0.046897   0.004172  11.240  < 2e-16 ***
X2           -0.054944   0.002184 -25.155  < 2e-16 ***
X3            0.022627   0.019496   1.161    0.246    
X4            0.051336   0.009952   5.159  2.5e-07 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 2.574 on 49052 degrees of freedom
  (83 observations deleted due to missingness)
Multiple R-squared: 0.0182, Adjusted R-squared: 0.01812 
F-statistic: 227.3 on 4 and 49052 DF,  p-value: < 2.2e-16 

The lm plot doesn’t show any particularly high outlier, as measured by Cook’s distance:

lm diagnostic


For reference and after confirming results based on the answer provided by Macro, the R command to set the tuning parameter, k, in the Huber estimator is (k=100 in this case):

rlm(y ~ x, psi = psi.huber, k = 100)


The difference is that rlm() fits models using your choice of a number of different $M$-estimators, while lm() uses ordinary least squares.

In general the $M$-estimator for a regression coefficient minimizes

$$ \sum_{i=1}^{n} \rho \left( \frac{Y_i – {\bf X}_{i} {\boldsymbol \beta}}{\sigma} \right) $$

as a function of ${\boldsymbol \beta}$, where $Y_i$ is the $i$’th response, and ${\bf X}_{i}$ is the predictors for individual $i$. Least squares is a special case of this where $$ \rho(x) = x^2 $$ However, the default setting for rlm(), which you appear to be using, is the Huber $M$-estimator, which uses

\rho(x) = \begin{cases} \frac{1}{2} x^2 &\mbox{if } |x| \leq k\\
k |x| – \frac{1}{2} k^2 & \mbox{if } |x| > k. \end{cases}

where $k$ is a constant. The default in rlm() is $k = 1.345$. These two estimators are minimizing different criteria, so it is no surprise that the estimates are different.

Edit: From the QQ plot shown above, it looks like you have a very long tailed error distribution. This is the kind of situation the Huber M-estimator is designed for and, in that situation, can give quite different estimates:

When the errors are normally distributed, the estimates will be pretty similar since, under the normal distribution, most of the Huber $ρ$ function will fall under the $|x|<k$ situation, which is equivalent to least squares. In the long tailed situation you have, many fall into the $|x|>k$ situation, which is a departure from OLS, which would explain the discrepancy.

Source : Link , Question Author : Robert Kubrick , Answer Author : Macro

Leave a Comment