# Finding the fitted and predicted values for a statistical model

Let’s say I have the following data and am running a regression model:

``````df=data.frame(income=c(5,3,47,8,6,5),
won=c(0,0,1,1,1,0),
age=c(18,18,23,50,19,39),
home=c(0,0,1,0,0,1))
``````

On one hand, I run a linear model to predict on income:

``````md1 = lm(income ~ age + home + home, data=df)
``````

Second, I run a logit model to predict on the won variable:

``````md2 = glm(factor(won) ~ age + home, data=df, family=binomial(link="logit"))
``````

For both models, I wonder how I can generate a table or data frame with the predictor response category, fitted value, and the model predicted value.

So for the linear model, something like:

``````age  fitted_income  predicted_income
18    3              5
23    3              3
50    4              2
19    5              5
39    6              4

home   fitted_income    predicted_income
0       5               6
1       3               9
``````

Or perhaps it should be for each data point. So for x_i data point, the fitted and predicted values are:

``````id   age  fitted_income  predicted_income
1     18    3              5
2     23    3              3
3     50    4              2
4     19    5              5
5     39    6              4
``````
1. From a statistical standpoint, is such an undertaking useful? Why or why not?

2. How can this be done in R? (looked at names(md1) and found what I can pull from the model, but haven’t proceeded past that)

Thanks!

You have to be a bit careful with model objects in R. For example, whilst the fitted values and the predictions of the training data should be the same in the `glm()` model case, they are not the same when you use the correct extractor functions:

``````R> fitted(md2)
1         2         3         4         5         6
0.4208590 0.4208590 0.4193888 0.7274819 0.4308001 0.5806112
R> predict(md2)
1          2          3          4          5          6
-0.3192480 -0.3192480 -0.3252830  0.9818840 -0.2785876  0.3252830
``````

That is because the default for `predict.glm()` is to return predictions on the scale of the linear predictor. To get the fitted values we want to apply the inverse of the link function to those values. `fitted()` does that for us, and we can get the correct values using `predict()` as well:

``````R> predict(md2, type = "response")
1         2         3         4         5         6
0.4208590 0.4208590 0.4193888 0.7274819 0.4308001 0.5806112
``````

Likewise with `residuals()` (or `resid()`); the values stored in `md2\$residuals` are the working residuals are are unlikely to be what you want. The `resid()` method allows you to specify the type of residual you want and has a useful default.

For the `glm()` model, something like this will suffice:

``````R> data.frame(Age = df\$age, Won = df\$won, Fitted = fitted(md2))
Age Won    Fitted
1  18   0 0.4208590
2  18   0 0.4208590
3  23   1 0.4193888
4  50   1 0.7274819
5  19   1 0.4308001
6  39   0 0.5806112
``````

Something similar can be done for the `lm()` model:

``````R> data.frame(Age = df\$age, Income = df\$income, Fitted = fitted(md1))
Age Income    Fitted
1  18      5  7.893273
2  18      3  7.893273
3  23     47 28.320749
4  50      8 -1.389725
5  19      6  7.603179
6  39      5 23.679251
``````