How to scale new observations for making predictions when the model was fitted with scaled data?

I understand the concept of scaling the data matrix to use in a linear regression model. For example, in R you could use: <- scale(data, scale=TRUE)

My only question is, for new observations for which I want to predict the output values, how are they correctly scaled? Would it be, <- (new - mean(data)) / std(data)?


The short answer to your question is, yes – that expression for is correct (except you wanted sd instead of std).

It may be worth noting that scale has optional arguments which you could use: <- scale(new, center = mean(data), scale = sd(data))

Also, the object returned by scale ( has attributes holding the numeric centering and scalings used (if any), which you could use: <- scale(new, attr(, "scaled:center"), attr(, "scaled:scale"))

The advantage of that appears when the original data has more than one column, so there are multiple means and/or standard deviations to consider.

