I’m reading a paper and the author wrote:
The effect of A,B, C on Y was studied through the use of multiple regression analysis. A,B,C were entered into the regression equation with Y as the dependent variable. The analysis of variance is presented in Table 3.
The effect of B on Y was significant, with B correlating .27 with Y.English is not my mother tongue and I got really confused here.
First, he said he would run a regression analysis, then he showed us the analysis of variance. Why?
And then he wrote about the correlation coefficient, is that not from correlation analysis? Or this word could also be used to describe regression slope?
Answer
First, he said he would run a regression analysis, then he showed us
the analysis of variance. Why?
Analysis of variance (ANOVA) is just a technique comparing the variance explained by the model versus the variance not explained by the model. Since regression models have both the explained and unexplained component, it’s natural that ANOVA can be applied to them. In many software packages, ANOVA results are routinely reported with linear regression. Regression is also a very versatile technique. In fact, both ttest and ANOVA can be expressed in regression form; they are just a special case of regression.
For example, here is a sample regression output. The outcome is miles per gallon of some cars and the independent variable is whether the car was domestic or foreign:
Source  SS df MS Number of obs = 74
+ F( 1, 72) = 13.18
Model  378.153515 1 378.153515 Prob > F = 0.0005
Residual  2065.30594 72 28.6848048 Rsquared = 0.1548
+ Adj Rsquared = 0.1430
Total  2443.45946 73 33.4720474 Root MSE = 5.3558

mpg  Coef. Std. Err. t P>t [95% Conf. Interval]
+
1.foreign  4.945804 1.362162 3.63 0.001 2.230384 7.661225
_cons  19.82692 .7427186 26.70 0.000 18.34634 21.30751

You can see the ANOVA reported at top left. The overall Fstatistics is 13.18, with a pvalue of 0.0005, indicating the model being predictive. And here is the ANOVA output:
Number of obs = 74 Rsquared = 0.1548
Root MSE = 5.35582 Adj Rsquared = 0.1430
Source  Partial SS df MS F Prob > F
+
Model  378.153515 1 378.153515 13.18 0.0005

foreign  378.153515 1 378.153515 13.18 0.0005

Residual  2065.30594 72 28.6848048
+
Total  2443.45946 73 33.4720474
Notice that you can recover the same Fstatistics and pvalue there.
And then he wrote about the correlation coefficient, is that not from
correlation analysis? Or this word could also be used to describe
regression slope?
Assuming the analysis involved using only B and Y, technically I would not agree with the word choice. In most of the cases, slope and correlation coefficient cannot be used interchangeably. In one special case, these two are the same, that is when both the independent and dependent variables are standardized (aka in the unit of zscore.)
For example, let’s correlate miles per gallon and the price of the car:
 price mpg
+
price  1.0000
mpg  0.4686 1.0000
And here is the same test, using the standardized variables, you can see the correlation coefficient remains unchanged:
 sdprice sdmpg
+
sdprice  1.0000
sdmpg  0.4686 1.0000
Now, here are the two regression models using the original variables:
. reg mpg price
Source  SS df MS Number of obs = 74
+ F( 1, 72) = 20.26
Model  536.541807 1 536.541807 Prob > F = 0.0000
Residual  1906.91765 72 26.4849674 Rsquared = 0.2196
+ Adj Rsquared = 0.2087
Total  2443.45946 73 33.4720474 Root MSE = 5.1464

mpg  Coef. Std. Err. t P>t [95% Conf. Interval]
+
price  .0009192 .0002042 4.50 0.000 .0013263 .0005121
_cons  26.96417 1.393952 19.34 0.000 24.18538 29.74297

… and here is the one with standardized variables:
. reg sdmpg sdprice
Source  SS df MS Number of obs = 74
+ F( 1, 72) = 20.26
Model  16.0295482 1 16.0295482 Prob > F = 0.0000
Residual  56.9704514 72 .791256269 Rsquared = 0.2196
+ Adj Rsquared = 0.2087
Total  72.9999996 73 .999999994 Root MSE = .88953

sdmpg  Coef. Std. Err. t P>t [95% Conf. Interval]
+
sdprice  .4685967 .1041111 4.50 0.000 .6761384 .2610549
_cons  7.22e09 .1034053 0.00 1.000 .2061347 .2061347

As you can see, the slope of the original variables is 0.0009192, and the one with standardized variables is 0.4686, which is also the correlation coefficient.
So, unless the A, B, C, and Y are standardized, I would not agree with the article’s “correlating.” Instead, I’d just opt of a one unit increase in B is associated with the average of Y being 0.27 higher.
In more complicated situation, where more than one independent variable is involved, the phenomenon described above will no longer be true.
Attribution
Source : Link , Question Author : yue86231 , Answer Author : Penguin_Knight