I’m new to the R language. I would like to know how to simulate from a multiple linear regression model that fulfills all four assumptions of the regression.
ok.. thank you.
Let say i want to simulate the data based on this data set:
y<c(18.73,14.52,17.43,14.54,13.44,24.39,13.34,22.71,12.68,19.32,30.16,27.09,25.40,26.05,33.49,35.62,26.07,36.78,34.95,43.67) x1<c(610,950,720,840,980,530,680,540,890,730,670,770,880,1000,760,590,910,650,810,500) x2<c(1,1,3,2,1,1,3,3,2,2,1,3,3,2,2,2,3,3,1,2) fit<lm(y~x1+x2) summary(fit)
then i get the output:
Call: lm(formula = y ~ x1 + x2) Residuals: Min 1Q Median 3Q Max 13.2805 7.5169 0.9231 7.2556 12.8209 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 42.85352 11.33229 3.782 0.00149 ** x1 0.02534 0.01293 1.960 0.06662 . x2 0.33188 2.41657 0.137 0.89238  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 8.679 on 17 degrees of freedom Multiple Rsquared: 0.1869, Adjusted Rsquared: 0.09127 Fstatistic: 1.954 on 2 and 17 DF, pvalue: 0.1722
My question is how to simulate a new data that mimic the original data above?
Answer

If you don’t have them already, start by setting up some predictors, $x_1$, $x_2$, …

Choose the population (‘true’) coefficients of your predictors, the $\beta_i$’s, including $\beta_0$, the intercept.

Choose the error variance, $\sigma^2$ or equivalently its square root, $\sigma$

generate the error term, $\varepsilon$, as an independent random normal vector, with mean 0 and variance $\sigma^2$

Let $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + … + \beta_k x_k + \varepsilon$
then you can regress the $y$ on your $x$’s
e.g. in R you could do something like:
x1 < 11:30
x2 < runif(20,5,95)
x3 < rbinom(20,1,.5)
b0 < 17
b1 < 0.5
b2 < 0.037
b3 < 5.2
sigma < 1.4
eps < rnorm(x1,0,sigma)
y < b0 + b1*x1 + b2*x2 + b3*x3 + eps
produces a single simulation of $y$ from the model. Then running
summary(lm(y~x1+x2+x3))
gives
Call:
lm(formula = y ~ x1 + x2 + x3)
Residuals:
Min 1Q Median 3Q Max
2.6967 0.4970 0.1152 0.7536 1.6511
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 16.28141 1.32102 12.325 1.40e09 ***
x1 0.55939 0.04850 11.533 3.65e09 ***
x2 0.01715 0.01578 1.087 0.293
x3 4.91783 0.66547 7.390 1.53e06 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.241 on 16 degrees of freedom
Multiple Rsquared: 0.9343, Adjusted Rsquared: 0.9219
Fstatistic: 75.79 on 3 and 16 DF, pvalue: 1.131e09
You can simplify this procedure in several ways, but I figured spelling it out would help to begin with.
If you want to simulate a new random $y$ but with the same population coefficients, just rerun the last two lines of the procedure above (generate a new random eps
and y
), corresponding to steps 3 and 4 of the algorithm.
Attribution
Source : Link , Question Author : Nor Hisham Haron , Answer Author : Glen_b