[R] Problem when creating matrix of values based on covariance matrix

William Dunlap wdunlap at tibco.com
Mon Aug 13 19:14:14 CEST 2012


There is also the chance that your sampling code is not correct.
Have you tried it out on, say, 5 dimensional data with increasing
numbers of samples? 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Michael Dewey
> Sent: Sunday, August 12, 2012 6:54 AM
> To: Boel Brynedal; R-help Mailing List
> Subject: Re: [R] Problem when creating matrix of values based on covariance matrix
> 
> At 15:17 11/08/2012, Boel Brynedal wrote:
> >Hi,
> >
> >I want to simulate a data set with similar covariance structure as my
> >observed data, and have calculated a covariance matrix (dimensions
> >8368*8368). So far I've tried two approaches to simulating data:
> >rmvnorm from the mvtnorm package, and by using the Cholesky
> >decomposition
> >(http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-
> normal-generation/).
> >The problem is that the resulting covariance structure in my simulated
> >data is very different from the original supplied covariance vector.
> 
> It is, of course, not guaranteed to be the same as you are only
> sampling from the distribution. In your example below you draw a
> sample of size 1000 from a 8368 variable distribution so I suspect it
> is almost sure to be different although I am surprised how different.
> What happens if you increase the sample size?
> 
> >Lets just look at some of the values:
> >
> > > cov8[1:4,1:4] # covariance of simulated data
> >             X1          X2         X3         X4
> >X1 34515296.00    99956.69   369538.1  1749086.6
> >X2    99956.69 34515296.00  2145289.9  -624961.1
> >X3   369538.08  2145289.93 34515296.0  -163716.5
> >X4  1749086.62  -624961.09  -163716.5 34515296.0
> > > CEUcovar[1:4,1:4]
> >              [,1]         [,2]          [,3]         [,4]
> >[1,] 0.1873402987  0.001837229  0.0009009272  0.010324521
> >[2,] 0.0018372286  0.188665853  0.0124216535 -0.001755035
> >[3,] 0.0009009272  0.012421654  0.1867835412 -0.000142395
> >[4,] 0.0103245214 -0.001755035 -0.0001423950  0.192883488
> >
> >So the distribution of the observed covariance is very narrow compared
> >to the simulated data.
> >
> >None of the eigenvalues of the observed covariance matrix are
> >negative, and it appears to be a positive definite matrix. Here is
> >what I did to create the simulated data:
> >
> >Chol <- chol(CEUcovar)
> >Z <- matrix(rnorm(20351 * 8368), 8368)
> >X <- t(Chol) %*% Z
> >sample8 <- data.frame(as.matrix(t(X)))
> > > dim(sample8)
> >[1] 20351  8368
> >cov8=cov(sample8,method='spearman')
> >
> >[earlier I've also tried sample8 <- rmvnorm(1000,
> >mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with as
> >'bad' results, much larger covariance values in the simulated data ]
> >
> >Any ideas of WHY the simulated data have such a different covariance?
> >Any experience with similar issues? Would be happy to supply the
> >covariance matrix if anyone wants to give it a try.
> >Any suggestions? Anything apparent that I left our or neglected?
> >
> >Any advice would be highly appreciated.
> >Best,
> >Bo
> 
> Michael Dewey
> info at aghmed.fsnet.co.uk
> http://www.aghmed.fsnet.co.uk/home.html
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list