How to scale new observations for making predictions when the model was fitted with scaled data?

I understand the concept of scaling the data matrix to use in a linear regression model. For example, in R you could use:

scaled.data <- scale(data, scale=TRUE)

My only question is, for new observations for which I want to predict the output values, how are they correctly scaled? Would it be, scaled.new <- (new - mean(data)) / std(data)?

Answer

The short answer to your question is, yes – that expression for scaled.new is correct (except you wanted sd instead of std).

It may be worth noting that scale has optional arguments which you could use:

scaled.new <- scale(new, center = mean(data), scale = sd(data))

Also, the object returned by scale (scaled.data) has attributes holding the numeric centering and scalings used (if any), which you could use:

scaled.new <- scale(new, attr(scaled.data, "scaled:center"), attr(scaled.data, "scaled:scale"))

The advantage of that appears when the original data has more than one column, so there are multiple means and/or standard deviations to consider.

Attribution
Source : Link , Question Author : SamuelNLP , Answer Author : user20637

Leave a Comment