[R] Problem when creating matrix of values based on covariance matrix

Boel Brynedal brynedal at gmail.com
Sat Aug 11 16:41:13 CEST 2012


Hi, thanks for the reply.
I am not assuming that the supplied covariance vector in any way
captures the 'true' covariance matrix of the population, but thats not
what I am after either. I just want to simulate data that has a
similar covariance as that covariance matrix. And the numbers are so
hugely different! Could sampling error cause that? Could the
covariance structure be too complicated to simulate?

And yes, I should probably post this on a stat-list, true.

Thanks,
Bo

2012/8/11 Bert Gunter <gunter.berton at gene.com>:
> Sampling error?   Do you realize how large a sample size you would
> need to precisely estimate an 8000 x 8000 covariance matrix? Probably
> exceeds the number of stars in our galaxy...
>
> Numerical issues may also play a role, but I am too ignorant on this
> aspect to offer advice.
>
> Finally, this is really not an R question, so you would probably do
> better to post on a stats site like stats.stackexchange.com rather
> than here.
>
> -- Bert
>
> On Sat, Aug 11, 2012 at 7:17 AM, Boel Brynedal <brynedal at gmail.com> wrote:
>> Hi,
>>
>> I want to simulate a data set with similar covariance structure as my
>> observed data, and have calculated a covariance matrix (dimensions
>> 8368*8368). So far I've tried two approaches to simulating data:
>> rmvnorm from the mvtnorm package, and by using the Cholesky
>> decomposition (http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/).
>> The problem is that the resulting covariance structure in my simulated
>> data is very different from the original supplied covariance vector.
>> Lets just look at some of the values:
>>
>>> cov8[1:4,1:4] # covariance of simulated data
>>             X1          X2         X3         X4
>> X1 34515296.00    99956.69   369538.1  1749086.6
>> X2    99956.69 34515296.00  2145289.9  -624961.1
>> X3   369538.08  2145289.93 34515296.0  -163716.5
>> X4  1749086.62  -624961.09  -163716.5 34515296.0
>>> CEUcovar[1:4,1:4]
>>              [,1]         [,2]          [,3]         [,4]
>> [1,] 0.1873402987  0.001837229  0.0009009272  0.010324521
>> [2,] 0.0018372286  0.188665853  0.0124216535 -0.001755035
>> [3,] 0.0009009272  0.012421654  0.1867835412 -0.000142395
>> [4,] 0.0103245214 -0.001755035 -0.0001423950  0.192883488
>>
>> So the distribution of the observed covariance is very narrow compared
>> to the simulated data.
>>
>> None of the eigenvalues of the observed covariance matrix are
>> negative, and it appears to be a positive definite matrix. Here is
>> what I did to create the simulated data:
>>
>> Chol <- chol(CEUcovar)
>> Z <- matrix(rnorm(20351 * 8368), 8368)
>> X <- t(Chol) %*% Z
>> sample8 <- data.frame(as.matrix(t(X)))
>>> dim(sample8)
>> [1] 20351  8368
>> cov8=cov(sample8,method='spearman')
>>
>> [earlier I've also tried sample8 <- rmvnorm(1000,
>> mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with as
>> 'bad' results, much larger covariance values in the simulated data ]
>>
>> Any ideas of WHY the simulated data have such a different covariance?
>> Any experience with similar issues? Would be happy to supply the
>> covariance matrix if anyone wants to give it a try.
>> Any suggestions? Anything apparent that I left our or neglected?
>>
>> Any advice would be highly appreciated.
>> Best,
>> Bo
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list