[R] Problem when creating matrix of values based on covariance matrix
Boel Brynedal
brynedal at gmail.com
Sat Aug 11 16:41:13 CEST 2012
Hi, thanks for the reply.
I am not assuming that the supplied covariance vector in any way
captures the 'true' covariance matrix of the population, but thats not
what I am after either. I just want to simulate data that has a
similar covariance as that covariance matrix. And the numbers are so
hugely different! Could sampling error cause that? Could the
covariance structure be too complicated to simulate?
And yes, I should probably post this on a stat-list, true.
Thanks,
Bo
2012/8/11 Bert Gunter <gunter.berton at gene.com>:
> Sampling error? Do you realize how large a sample size you would
> need to precisely estimate an 8000 x 8000 covariance matrix? Probably
> exceeds the number of stars in our galaxy...
>
> Numerical issues may also play a role, but I am too ignorant on this
> aspect to offer advice.
>
> Finally, this is really not an R question, so you would probably do
> better to post on a stats site like stats.stackexchange.com rather
> than here.
>
> -- Bert
>
> On Sat, Aug 11, 2012 at 7:17 AM, Boel Brynedal <brynedal at gmail.com> wrote:
>> Hi,
>>
>> I want to simulate a data set with similar covariance structure as my
>> observed data, and have calculated a covariance matrix (dimensions
>> 8368*8368). So far I've tried two approaches to simulating data:
>> rmvnorm from the mvtnorm package, and by using the Cholesky
>> decomposition (http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/).
>> The problem is that the resulting covariance structure in my simulated
>> data is very different from the original supplied covariance vector.
>> Lets just look at some of the values:
>>
>>> cov8[1:4,1:4] # covariance of simulated data
>> X1 X2 X3 X4
>> X1 34515296.00 99956.69 369538.1 1749086.6
>> X2 99956.69 34515296.00 2145289.9 -624961.1
>> X3 369538.08 2145289.93 34515296.0 -163716.5
>> X4 1749086.62 -624961.09 -163716.5 34515296.0
>>> CEUcovar[1:4,1:4]
>> [,1] [,2] [,3] [,4]
>> [1,] 0.1873402987 0.001837229 0.0009009272 0.010324521
>> [2,] 0.0018372286 0.188665853 0.0124216535 -0.001755035
>> [3,] 0.0009009272 0.012421654 0.1867835412 -0.000142395
>> [4,] 0.0103245214 -0.001755035 -0.0001423950 0.192883488
>>
>> So the distribution of the observed covariance is very narrow compared
>> to the simulated data.
>>
>> None of the eigenvalues of the observed covariance matrix are
>> negative, and it appears to be a positive definite matrix. Here is
>> what I did to create the simulated data:
>>
>> Chol <- chol(CEUcovar)
>> Z <- matrix(rnorm(20351 * 8368), 8368)
>> X <- t(Chol) %*% Z
>> sample8 <- data.frame(as.matrix(t(X)))
>>> dim(sample8)
>> [1] 20351 8368
>> cov8=cov(sample8,method='spearman')
>>
>> [earlier I've also tried sample8 <- rmvnorm(1000,
>> mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with as
>> 'bad' results, much larger covariance values in the simulated data ]
>>
>> Any ideas of WHY the simulated data have such a different covariance?
>> Any experience with similar issues? Would be happy to supply the
>> covariance matrix if anyone wants to give it a try.
>> Any suggestions? Anything apparent that I left our or neglected?
>>
>> Any advice would be highly appreciated.
>> Best,
>> Bo
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
More information about the R-help
mailing list