Suppose you want to estimate a linear model: (n observations of the response, and p+1 predictors)

E(yi)=β0+p∑j=1βjxijOne way to do this is through the OLS solution, i.e. choose the coefficients so that the sum of square errors is minimum:

(β0,β1,⋯,βp)T=argmin

Alternatively, you could use another loss function, like the sum of the absolute deviations, so that:

(\beta_0,\beta_1,\cdots,\beta_p)^T = \underset{\beta_0,\beta_1,\cdots,\beta_p}{\arg \min} \sum_{i=1}^{n} \left| y_i – \beta_0 – \sum_{j=1}^p \beta_j x_{ij} \right|

Suppose you have found the parameters for the two models, and want to choose the model with the smallest value of the loss function. How can you compare the minimum values attained by the loss functions in general? (i.e. not just this specific case – we could also try other L_p based loss functions) There seems to be a difference in the scale of the functions – one deals with squares while the other does not.

**Answer**

*(Converting my comment into an answer.)*

I think you cannot compare the fits that come from different loss functions, because they are answers to different questions. Once you decide that a given loss function is the appropriate one for your situation, the fit follows from that decision. You cannot fold it back to validate the choice of loss function without this becoming circular. If you have some other criterion that both loss functions can be understood to be encompassed by, you could use that, but you need to have defined that in advance.

**Attribution***Source : Link , Question Author : Comp_Warrior , Answer Author : gung – Reinstate Monica*