# difference between R square and rmse in linear regression [duplicate]

When Performing a linear regression in r I came across the following terms.

 NBA_test =read.csv("NBA_test.csv")
PointsPredictions  = predict(PointsReg4, newdata =  NBA_test)
SSE = sum((PointsPredictions - NBA_test$PTS)^2) SST = sum((mean(NBA$PTS) - NBA_test\$PTS) ^ 2)
R2 = 1- SSE/SST


In this case I am predicting the number of points. I understood what is meant by SSE(sum of squared errors), but what actually is SST and R square? Also what is the difference between R2 and RMSE?

Assume that you have $n$ observations $y_i$ and that you have an estimator that estimates the values $\hat{y}_i$.

The mean squared error is $MSE=\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$, the root mean squared error is the square root thus $RMSE=\sqrt{MSE}$.

The $R^2$ is equal to $R^2=1-\frac{SSE}{TSS}$ where $SSE$ is the sum of squared errors or $SSE=\sum_{i=1}^n (y_i - \hat{y}_i)^2 )$, and by definition this is equal to $SSE=n \times MSE$.

The $TSS$ is the total sum of squares and is equal to $TSS=\sum_{i=1}^n (y_i - \bar{y} )^2$, where $\bar{y}=\frac{1}n{}\sum_{i=1}^n y_i$. So $R^2=1-\frac{n \times MSE} {\sum_{i=1}^n (y_i - \bar{y} )^2}$.

For a regression with an intercept, $R^2$ is between 0 and 1, and from its definition $R^2=1-\frac{SSE}{TSS}$ we can find an interpretation: $\frac{SSE}{TSS}$ is the sum of squared errors divided by the total sum of squares, so it is the fraction ot the total sum of squares that is contained in the error term. So one minus this is the fraction of the total sum of squares that is not in the error, or $R^2$ is the fraction of the total sum of squares that is ‘explained by’ the regression.

The RMSE is a measure of the average deviation of the estimates from the observed values (this is what @user3796494 also said) .

For $R^2$ you can also take a look at Can the coefficient of determination $R^2$ be more than one? What is its upper bound?