I am currently in a linear regression class, but I can’t shake the feeling that what I am learning is no longer relevant in either modern statistics or machine learning. Why is so much time spent on doing inference on simple or multiple linear regression when so many interesting datasets these days frequently violate many of the unrealistic assumptions of linear regression? Why not instead teach inference on more flexible, modern tools like regression using support vector machines or Gaussian process? Though more complicated than finding a hyperplane in a space, wouldn’t this give students a much better background for which to tackle modern day problems?

**Answer**

It is true that the assumptions of linear regression aren’t realistic. However, this is true of all statistical models. “All models are wrong, but some are useful.”

I guess you’re under the impression that there’s no reason to use linear regression when you could use a more complex model. This isn’t true, because in general, more complex models are more vulnerable to overfitting, and they use more computational resources, which are important if, e.g., you’re trying to do statistics on an embedded processor or a web server. Simpler models are also easier to understand and interpret; by contrast, complex machine-learning models such as neural networks tend to end up as black boxes, more or less.

Even if linear regression someday becomes no longer practically useful (which seems extremely unlikely in the foreseeable future), it will still be theoretically important, because more complex models tend to build on linear regression as a foundation. For example, in order to understand a regularized mixed-effects logistic regression, you need to understand plain old linear regression first.

This isn’t to say that more complex, newer, and shinier models aren’t useful or important. Many of them are. But the simpler models are more widely applicable and hence more important, and clearly make sense to present first if you’re going to present a variety of models. There are a lot of bad data analyses conducted these days by people who call themselves “data scientists” or something but don’t even know the foundational stuff, like what a confidence interval really is. Don’t be a statistic!

**Attribution***Source : Link , Question Author : Community , Answer Author :
Kodiologist
*