# [R-sig-teaching] Simulating Data with predefined reg-coefficients and R2

Achaz von Hardenberg fauna at pngp.it
Fri Nov 21 01:01:17 CET 2008

```Thanks to Greg for a nice solution to the question posed by Markus.
Now I am going to complicate things a bit...
what if besides the regression coefficients (b) I have also their
associated standard errors (b+/-se)?
Is it possible to generate data which,in a multivariate regression,
will yeld not only predefinite r^2 and b values but also their
associated predefinite s.e. values?

achaz

Dr. Achaz von Hardenberg
------------------------------------------------------------------------
--------------------------------
Centro Studi Fauna Alpina - Alpine Wildlife Research Centre
Servizio Sanitario e della Ricerca Scientifica
Parco Nazionale Gran Paradiso, Degioz, 11, 11010-Valsavarenche (Ao),
Italy

E-mail: achaz.hardenberg at pngp.it
fauna at pngp.it
Skype: achazhardenberg
Tel.: +39.0165.905783
Fax: +39.0165.905506
Mobile: +39.328.8736291
------------------------------------------------------------------------
--------------------------------

On 19 Nov 2008, at 17:29, Greg Snow wrote:

> Try this:
>
>
> # generate x's
>
> x1 <- sample(100, 100, TRUE)
> x2 <- sample(100, 100, TRUE)
>
> # generate yhat with b0=1, b1=2, b2=3
>
> yhat <- 1 + 2*x1 + 3*x2
>
> # compute ssr
>
> ssr <- sum( (yhat-mean(yhat))^2 )
>
> # generate errors
>
> e <- rnorm(100)
> e <- resid( lm( e ~ x1 + x2 ) )
>
> # to get R^2 of 0.8, ssr/(ssr+sse)=0.8 so sse=0.2/0.8*ssr
>
> e <- e* sqrt(0.2/0.8*ssr/(sum(e^2)))
>
> # now for y
>
> y <- yhat + e
>
> # put into a data frame and test
>
> mydata <- data.frame( y=y, x1=x1, x2=x2 )
> fit <- lm(y ~ x1 + x2, data=mydata )
> summary(fit)
>
>
> Now just change the values that you want changed to match your
> situation.  It does not matter how the x's are generated, so
> include more, include polynomials, include interactions, etc.
>
> Hope this helps,
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-sig-teaching-bounces at r-project.org [mailto:r-sig-teaching-
>> bounces at r-project.org] On Behalf Of markus
>> Sent: Wednesday, November 19, 2008 1:19 AM
>> To: r-sig-teaching at r-project.org
>> Subject: [R-sig-teaching] Simulating Data with predefined reg-
>> coefficients and R2
>>
>> Hi all at the R-teaching mailing list,
>> I am currently preparing my first  R-based  regression  course. Along
>> this way I encountered the following problem:
>>
>> I want to simulate multivariate data that has some specific
>> predefined
>> attributes. For example I want to produce a Predictor-matrix (X)
>> and a response-vector (y) that will yield a given vector of
>> regression
>> coefficients (b) and a given R2 when I perform a multivariate linear
>> Regression
>> on the dataset. This would be best described by the well known
>> equation
>> y=X*b+e.
>> In some next step I also want to simulate polynomic relationships,
>> but
>> I
>> think that should work not very different.
>>
>> I already searched the web and found some hints, but no clear answer.
>> There is a pdf out there from John H. Walker (Teaching Regression
>> with
>> simulation)
>> which does however not discuss this special topic. I also have a
>> Paper
>> from K.Baumann 'Chance Correlation in variable subset regression:
>> Influence of the objective function, selection mechanism and Ensemble
>> averaging' QCS, 2005. There an 'Autoregressive process' is used to
>> simulate such data.
>>
>> Now my question is:
>> Is it really that difficult to simulate such data? Is there perhaps a
>> package in R facilitating at least parts of this work?
>>
>> Thanks in advance for the help,
>> Markus
>>
>> _______________________________________________
>> R-sig-teaching at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
>
> _______________________________________________
> R-sig-teaching at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
>
> --
> This message was scanned by ESVA and is believed to be clean.
>

```