When Performing a linear regression in

`r`

I came across the following terms.`NBA_test =read.csv("NBA_test.csv") PointsPredictions = predict(PointsReg4, newdata = NBA_test) SSE = sum((PointsPredictions - NBA_test$PTS)^2) SST = sum((mean(NBA$PTS) - NBA_test$PTS) ^ 2) R2 = 1- SSE/SST`

In this case I am predicting the number of points. I understood what is meant by SSE(sum of squared errors), but what actually is SST and R square? Also what is the difference between R2 and RMSE?

**Answer**

Assume that you have n observations yi and that you have an estimator that estimates the values ˆyi.

The mean squared error is MSE=1n∑ni=1(yi−ˆyi)2, the root mean squared error is the square root thus RMSE=√MSE.

The R2 is equal to R2=1−SSETSS where SSE is the sum of squared errors or SSE=∑ni=1(yi−ˆyi)2), and by definition this is equal to SSE=n×MSE.

The TSS is the total sum of squares and is equal to TSS=∑ni=1(yi−ˉy)2, where ˉy=1n∑ni=1yi. So R2=1−n×MSE∑ni=1(yi−ˉy)2.

For a regression with an intercept, R2 is between 0 and 1, and from its definition R2=1−SSETSS we can find an interpretation: SSETSS is the sum of squared errors divided by the total sum of squares, so it is the fraction ot the total sum of squares that is contained in the error term. So one minus this is the fraction of the total sum of squares that is not in the error, or **R2 is the fraction of the total sum of squares that is ‘explained by’ the regression**.

The RMSE is a measure of the average deviation of the estimates from the observed values (this is what @user3796494 also said) .

For R2 you can also take a look at Can the coefficient of determination R2 be more than one? What is its upper bound?

**Attribution***Source : Link , Question Author : user3796494 , Answer Author : Community*