Maybe it’s just that I’m tired, but I’m having trouble trying to understand the Forward Stagewise Regression algorithm. From “Elements of Statistical Learning” page 60:

Forward-stagewise regression (FS) is even more constrained than forward-stepwise regression. It starts like forward-stepwise regression, with an intercept equal to [the mean of] y , and centered predictors with coeﬃcients initially all 0.

At each step the algorithm identiﬁes the variable most correlated with the

current residual. It then computes the simple linear regression coeﬃcient

of the residual on this chosen variable, and then adds it to the current co-

eﬃcient for that variable. This is continued till none of the variables have

correlation with the residuals—i.e. the least-squares ﬁt when N > p.So, is this the algorithm?:

`b[1]=mean(y) b[2..n]=0 r=(y-X*b) index, maxCorr = max(transpose(r)*X) while(abs(maxCorr) > someThreshold) b[index]=b[index]+regress(r,X[1..n][index]) r=(y-X*b) index, maxCorr = max(transpose(r)*X)`

Where b is a column-vector of the coefficients, X is a matrix of inputs, and y is a column-vector of outputs. I.e. y=X*b+error.

Asking because this algorithm gives me only a few non-zero coefficients on the dataset I’m testing it on (with threshold=.0001), and the prediction accuracy isn’t very good at all.

**Answer**

They authors do a poor job of explaining the algorithm in their book. If you look at equations 1.6 and 1.7 in their paper, it becomes clearer. The paper has a slightly different formulation (it builds the residual rather than the coefficient vector), but the key point is that it reaches a least squares fit very in very small steps (this is why the book mentions the algorithm can take “many more than p steps” to finish). You could either replace “regress(…)” with some small number, or you could multiply it by something like 0.05. Play around with it and see what works.

Also, your threshold seems small. r’*X is going to give numbers proportional to but much larger than the actual correlations (e.g. for the diabetes data in the paper the correlations are ~70-900).

**Attribution***Source : Link , Question Author : ektrules , Answer Author : Kevin*