I am trying to predict age as a function of a set of DNA methylation markers. These predictors are continuous between 0 and 100. When performing OLS regression, I can see that variance increases with age.
Thus, I decided to fit a weighted regression model. However, I am having trouble deciding how to define the weights for my model. I have used the fGLS method, like so:
OLSressq < OLSres^2 # Square residuals lnOLSressq < log(OLSressq) # Take natural log of squared residuals aux < lm(lnOLSressq~X) # Run auxillary model ghat < fitted(aux) # Predict g^ hhat < exp(ghat) # Create h^ fGLS < lm(Y~X, weights = 1/hhat) # Weight is 1/h^
And these were my results:
Call: lm(formula = Y ~ X, weights = 1/hhat) Weighted Residuals: Min 1Q Median 3Q Max 4.9288 1.2491 0.1325 1.2626 5.1452 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 23.1009494 5.2299867 4.417 1.64e05 *** XASPA 0.1441404 0.0474738 3.036 0.00271 ** XPDE4C 0.6421385 0.0812891 7.899 1.83e13 *** XELOVL2 0.2040382 0.0866564 2.355 0.01951 * XELOVL2sq 0.0088532 0.0009381 9.438 < 2e16 *** XEDARADD 0.1965472 0.0348989 5.632 5.98e08 ***  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.762 on 200 degrees of freedom Multiple Rsquared: 0.9687, Adjusted Rsquared: 0.9679 Fstatistic: 1239 on 5 and 200 DF, pvalue: < 2.2e16
However, before figuring out how to perform the fGLS method, I was playing around with different weights just to see what would happen. I used 1/(squared residuals of OLS model) as weights and ended up with this:
Call: lm(formula = Y ~ X, weights = 1/OLSressq) Weighted Residuals: Min 1Q Median 3Q Max 1.0893 0.9916 0.7855 0.9998 2.0238 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 30.8756737 1.1355861 27.19 <2e16 *** XASPA 0.1956188 0.0116329 16.82 <2e16 *** XPDE4C 0.6168490 0.0102149 60.39 <2e16 *** XELOVL2 0.1596969 0.0116723 13.68 <2e16 *** XELOVL2sq 0.0078459 0.0001593 49.26 <2e16 *** XEDARADD 0.2492048 0.0068751 36.25 <2e16 ***  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1 on 200 degrees of freedom Multiple Rsquared: 1, Adjusted Rsquared: 1 Fstatistic: 1.133e+06 on 5 and 200 DF, pvalue: < 2.2e16
Since the residual standard error is smaller, R² equals 1 (is that even possible?) and the F statistic is a lot higher, I am tempted to assume this model is better than what I achieved through the fGLS method. However, it seems to me that randomly picking weights through trial and error should always yield worse results than when you actually mathematically try to estimate the correct weights.
Can someone give me some advice on which weights to use for my model?
I have also read here and there that you cannot interpret R² in the same way you would when performing OLS regression. But then how should it be interpreted and can I still use it to somehow compare my WLS model to my OLS model?
Answer
There are two issues here

You would, ideally, use weights inversely proportional to the variance of the individual Yi. So says the GaussMarkov Theorem.

You don’t know the variance of the individual Yi
If you have deterministic weights wi, you are in the situation that WLS/GLS are designed for. One traditional example is when each observation is an average of multiple measurements, and wi the number of measurements.
If you have weights that depend on the data through a small number of parameters, you can treat them as fixed and use them in WLS/GLS even though they aren’t fixed. For example, you could estimate σ2(μ) as a function of the fitted μ and use wi=1/σ2(μi) — this seems to be what you are doing in the first example. This is also what happens in linear mixed models, where the weights for the fixedeffects part of the model depend on the variance components, which are estimated from the data.
In this scenario it is possible to prove that although there is some randomness in the weights, it does not affect the largesample distribution of the resulting ˆβ. It’s ok to treat the wi as if they were known in advance.
If you have weights that are not nearly deterministic, the whole thing breaks down and the randomness in the weights becomes important for both bias and variance. That’s what happens in your second example, when you use wi=1/r2i. It’s an obvious thing to think of, but it doesn’t work. The estimating equations (normal equations, score equations) for ˆβ are
∑ixiwi(yi−xiβ)=0
With that choice of weights, you get
∑ixi(yi−xiβ)(yi−xiˆβ∗)2=0
where ˆβ∗ is the unweighted estimate. If the new estimate is close to the old one (which should be true for large data sets, because both are consistent), you’d end up with equations like
∑ixi1(yi−xiβ)=0
which divides by a variable with mean zero, a bad sign.
So:
It’s ok to estimate the weights if you have a good mean model (so that the squared residuals are approximately unbiased for the variance) and as long as you don’t overfit them. If you do overfit them, you will get a bad estimate of β and inaccurate standard errors.
Attribution
Source : Link , Question Author : I. Smeers , Answer Author : Thomas Lumley