Use Pearson’s correlation coefficient as optimization objective in machine learning

In machine learning (for regression problems), I often see mean-squared-error (MSE) or mean-absolute-error (MAE) being used as the error function to minimize (plus the regularization term). I am wondering if there are situations where using correlation coefficient would be more appropriate? if such situation exists, then:

  1. Under what situations is correlation coefficient a better metric compared to MSE/MAE ?
  2. In these situations, is MSE/MAE still a good proxy cost function to use?
  3. Is maximizing correlation coefficient directly possible? Is this a stable objective function to use?

I couldn’t find cases where correlation coefficient is used directly as the objective function in optimization. I would appreciate if people can point me to information in this area.


Maximizing correlation is useful when the output is highly noisy. In other words, the relationship between inputs and outputs is very weak. In such case, minimizing MSE will tend to make the output close to zero so that the predication error is the same as the variance of the training output.

Directly using correlation as objective function is possible for gradient descent approach (simply change it to minimizing minus correlation). However, I do not know how to optimize it with SGD approach, because the cost function and the gradient involves outputs of all training samples.

Another way to maximize correlation is to minimize MSE with constraining the output variance to be the same as training output variance. However, the constraint also involves all outputs thus there is no way (in my opinion) to take advantage of SGD optimizer.

In case the top layer of the neural network is a linear output layer, we can minimize MSE and then adjust the weights and bias in the linear layer to maximize the correlation. The adjustment can be done similarly to CCA (

Source : Link , Question Author : aha , Answer Author : Bo Tian

Leave a Comment