[R] Generate a serie of new vars that correlate with existing var

Nguyen Dinh Nguyen n.nguyen at garvan.org.au
Wed Apr 4 00:51:48 CEST 2007


Dear Greg,
Thanks million!
"As good as it gets" :)
All the best
Nguyen

-----Original Message-----
From: Greg Snow [mailto:Greg.Snow at intermountainmail.org] 
Sent: Wednesday, April 04, 2007 1:46 AM
To: Nguyen Dinh Nguyen; r-help at stat.math.ethz.ch
Subject: RE: [R] Generate a serie of new vars that correlate with existing
var

Here is one way to do it:

# create the initial x variable
x1 <- rnorm(100, 15, 5)

# x2, x3, and x4 in a matrix, these will be modified to meet the
criteria
x234 <- scale(matrix( rnorm(300), ncol=3 ))

# put all into 1 matrix for simplicity
x1234 <- cbind(scale(x1),x234)

# find the current correlation matrix
c1 <- var(x1234)

# cholesky decomposition to get independence
chol1 <- solve(chol(c1))

newx <-  x1234 %*% chol1 

# check that we have independence and x1 unchanged
zapsmall(cor(newx))
all.equal( x1234[,1], newx[,1] )

# create new correlation structure (zeros can be replaced with other r
vals)
newc <- matrix( 
c(1  , 0.4, 0.5, 0.6, 
  0.4, 1  , 0  , 0  ,
  0.5, 0  , 1  , 0  ,
  0.6, 0  , 0  , 1  ), ncol=4 )

# check that it is positive definite
eigen(newc)

chol2 <- chol(newc)

finalx <- newx %*% chol2 * sd(x1) + mean(x1)

# verify success
mean(x1)
colMeans(finalx)

sd(x1)
apply(finalx, 2, sd)

zapsmall(cor(finalx))
pairs(finalx)

all.equal(x1, finalx[,1])


Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 
 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Nguyen 
> Dinh Nguyen
> Sent: Sunday, April 01, 2007 7:47 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Generate a serie of new vars that correlate with 
> existing var
> 
> Dear R helpers,
> I have a var (let call X1) with approximately Normal 
> distribution (say, mean=15, SD=5).
> I want to generate a series of additional vars X2, X3, 
> X4...such that the correlation between X2 and X1 is o.4, X3 and 
> X1 is 0.5, X4 and X1 is 0.6 and so on with the condition all 
> variables X2, X3, X4....have the same mean and SD with X1.
> Any help should be appreciated
> Regards
> Nguyen
> 
>



More information about the R-help mailing list