I have trained a linear regression model, using a set of variables/features. And the model has a good performance. However, I have realized that there is no variable with a good correlation with the predicted variable. How is it possible?

**Answer**

A pair of variables may show high partial correlation (the correlation accounting for the impact of other variables) but low – or even zero – marginal correlation (pairwise correlation).

Which means that pairwise correlation between a response, y and some predictor, x may be of little value in identifying suitable variables with (linear) “predictive” value among a collection of other variables.

Consider the following data:

```
y x
1 6 6
2 12 12
3 18 18
4 24 24
5 1 42
6 7 48
7 13 54
8 19 60
```

The correlation between y and x is 0. If I draw the least squares line, it’s perfectly horizontal and the R2 is naturally going to be 0.

But when you add a new variable g, which indicates which of two groups the observations came from, x becomes extremely informative:

```
y x g
1 6 6 0
2 12 12 0
3 18 18 0
4 24 24 0
5 1 42 1
6 7 48 1
7 13 54 1
8 19 60 1
```

The R2 of a linear regression model with both the x and g variables in it will be 1.

It’s possible for this sort of thing to happen with every one of the variables in the model – that all have small pairwise correlation with the response, yet the model with them all in there is very good at predicting the response.

Additional reading:

**Attribution***Source : Link , Question Author : Zaratruta , Answer Author : amoeba*