I am not sure where this question belongs to: Cross Validated, or The Workplace. But my question is vaguely related to statistics.

This question (or I guess questions) arose during my working as a “data science intern”. I was building this linear regression model and examining the residual plot. I saw clear sign of heteroskedasticity. I remember that heteroskedasticity distorts many test statistics such as confidence interval and t-test. So I used weighted least square, following what I have learned at college. My manager saw that and advised me not to do that because “I was making things complicated”, which was not a very convincing reason to me at all.

Another example would be “removing an explanatory variable since its p-value is insignificant”. To be, this advice just does not make sense from a logical point of view. According to what I have learned, insignificant p-value could be due to different reasons: chance, using the wrong model, violating the assumptions, etc.

Yet another example is that, I used k-fold cross validation to evaluate my model. According to the result, $CV_{model 1}$ is just way better than $CV_{model 2}$. But we do have a lower $R^2$ for model 1, and the reason has something to do with the intercept. My supervisor, though, seems to prefer model 2 because it has higher $R^2$. His reasons (such as $R^2$ is robust, or cross-validation is machine learning approach, not statistical approach) just do not seem to be convincing enough to change my mind.

As someone who just graduated from college, I am very confused. I am very passionate about applying correct statistics to solve real world problems, but I don’t know which of the followings is true:

- The statistics I learned by myself is just wrong, so I am just making mistakes.
- There is huge difference between theoretical statistics and building models in companies. And although statistics theory is right, people just don’t follow it.
- The manager is not using statistics correctly.

Update at 4/17/2017:I have decided to pursue a Ph.D. in statistics. Thank you all for your reply.

**Answer**

In a nutshell, you’re right and he’s wrong. The tragedy of data analysis is that a lot of people do it, but only a minority of people do it well, partly due to a weak education in data analysis and partly due to apathy. Turn a critical eye to most any published research article that doesn’t have a statistician or a machine-learning expert on the author list and you’ll quickly spot such elementary mistakes as interpreting $p$-values as the probability that the null hypothesis is true.

I think the only thing to do, when confronted with this kind of situation, is to carefully explain what’s wrong about the wrongheaded practice, with an example or two.

**Attribution***Source : Link , Question Author : 3x89g2 , Answer Author : Kodiologist*