Reviewer questioning my stats, need a second opinion (multiple linear regression)

I just got reviews for my first article and one of the reviewer is questioning my stats and he made me doubt about it. I cross-posted on reddit and one redditor suggested me to come here for a second opinion (http://tinyurl.com/pqzt).

Here’s a quick rundown of my study: 150 patients. I was interested in how well exposure to a certain toxin (main IV, predictor) could predict scores on a clinical questionnaire (DV). I used a standard linear regression model (“enter” method) and included other predictors that are known to affect scores on my DV (age, education, disease duration and motor disability). All variables are on a continuous scale. I’ve found that my model is significant, so is my main predictor (toxin exposure), along with age and education. There is no colinearity problem as per the VIF index. If that helps, here’s the histogram and the plots of the residuals: http://imgur.com/vsb

One reviewer is questioning my regression model because he says that since about 70 people out of 150 had an exposure value of 0 (non-exposed), my model is only fit for to those who were exposed to the toxin (about 80). My understanding of the regression model and the number of degree of freedom in the ANOVA (140ish) table makes me thinks he is wrong. He also said that the preliminary correlations I ran between were not fit for the whole sample because 70 people had an exposure value of “0”, even though I used the whole sample for the analysis.

I played around with my data to get a better grasp of the problem. When I ran the same regression model (enter method) on those with exposure only (n=80), my predictor “toxin” fell short of significance. I believe the lack of statistical power could be to blame (6 predictors with a “n” of 80 and combined with the “weak” effect size). Then I went back to the whole sample and I added a dichotomous variable (exposed or not exposed), but neither the exposure status (yes or no) nor the total exposure (scale from 0 to 100) were significant with the “enter method” (all variables entered simultaneously). However, using a stepwise method, the toxin exposure level variable (main predictor) was now significant again.

So please, can you confirm that I am right/wrong, and do you have any advice on how to write this to the editor/reviewer so my paper doesn’t get rejected in the second round? Thanks!

Answer

If you think there is a discontinuity in the effect of the exposure at an exposure (toxin level) of zero, you can test a more general hypothesis using at least 2 predictors: an indicator of toxin > 0 and something like log(toxin + 1). The 2 d.f. “chunk” test for the combined effects of these two predictors tests the null hypothesis that toxin level is associated with the outcome, allowing for a discontinuity at zero. You can get the chunk test using a general contrast with 2 d.f. or by omitting both variables and doing the “difference in R2” test.

The reviewer is incorrect.

It is very important to make sure that you have chosen the right model for the clinical outcome score. You are assuming the score is a continuous variable without a great number of ties, and that the residuals from the model have a Gaussian distribution.

Avoid any removal of variables on the basis of P-values.

Attribution
Source : Link , Question Author : nightale , Answer Author : Frank Harrell

Leave a Comment