[R] Problem when creating matrix of values based on covariance matrix

Michael Dewey info at aghmed.fsnet.co.uk
Sun Aug 12 15:54:28 CEST 2012


At 15:17 11/08/2012, Boel Brynedal wrote:
>Hi,
>
>I want to simulate a data set with similar covariance structure as my
>observed data, and have calculated a covariance matrix (dimensions
>8368*8368). So far I've tried two approaches to simulating data:
>rmvnorm from the mvtnorm package, and by using the Cholesky
>decomposition 
>(http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/).
>The problem is that the resulting covariance structure in my simulated
>data is very different from the original supplied covariance vector.

It is, of course, not guaranteed to be the same as you are only 
sampling from the distribution. In your example below you draw a 
sample of size 1000 from a 8368 variable distribution so I suspect it 
is almost sure to be different although I am surprised how different. 
What happens if you increase the sample size?

>Lets just look at some of the values:
>
> > cov8[1:4,1:4] # covariance of simulated data
>             X1          X2         X3         X4
>X1 34515296.00    99956.69   369538.1  1749086.6
>X2    99956.69 34515296.00  2145289.9  -624961.1
>X3   369538.08  2145289.93 34515296.0  -163716.5
>X4  1749086.62  -624961.09  -163716.5 34515296.0
> > CEUcovar[1:4,1:4]
>              [,1]         [,2]          [,3]         [,4]
>[1,] 0.1873402987  0.001837229  0.0009009272  0.010324521
>[2,] 0.0018372286  0.188665853  0.0124216535 -0.001755035
>[3,] 0.0009009272  0.012421654  0.1867835412 -0.000142395
>[4,] 0.0103245214 -0.001755035 -0.0001423950  0.192883488
>
>So the distribution of the observed covariance is very narrow compared
>to the simulated data.
>
>None of the eigenvalues of the observed covariance matrix are
>negative, and it appears to be a positive definite matrix. Here is
>what I did to create the simulated data:
>
>Chol <- chol(CEUcovar)
>Z <- matrix(rnorm(20351 * 8368), 8368)
>X <- t(Chol) %*% Z
>sample8 <- data.frame(as.matrix(t(X)))
> > dim(sample8)
>[1] 20351  8368
>cov8=cov(sample8,method='spearman')
>
>[earlier I've also tried sample8 <- rmvnorm(1000,
>mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with as
>'bad' results, much larger covariance values in the simulated data ]
>
>Any ideas of WHY the simulated data have such a different covariance?
>Any experience with similar issues? Would be happy to supply the
>covariance matrix if anyone wants to give it a try.
>Any suggestions? Anything apparent that I left our or neglected?
>
>Any advice would be highly appreciated.
>Best,
>Bo

Michael Dewey
info at aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html



More information about the R-help mailing list