[R] Generate multivariate normal data with a random correlation matrix

Eik Vettorazzi E.Vettorazzi at uke.uni-hamburg.de
Wed Feb 9 17:53:25 CET 2011


Hi Rick,
you can generate an arbitrary matrix X and calculate X'X, which is at
least positive semi-definite (and definite, when X has full rank)

X<-matrix(rnorm(16),4)
 covmat<-t(X)%*%X
#check
 !any(eigen(covmat)$value<0)
#corresponding correlation matrix
cov2cor(covmat)

Random multivariate can be generated with 'mvrnorm' from the MASS-package.


hth.


Am 09.02.2011 17:19, schrieb Rick DeShon:
> Hi All.
> 
> I'd like to generate a sample of n observations from a k dimensional
> multivariate normal distribution with a random correlation matrix.
> 
> My solution:
> 1) The lower (or upper) triangle of the correlation matrix has
> n.tri=(d/2)(d+1)-d entries.
> 2) Take a uniform sample of n.tri possible correlations (runi(n.tr,-.99,.99)
> 3) Populate a triangle of the matrix with the sampled correlations
> 4) Mirror the triangle to populate the other triangle forming a
> symmetric matrix, cormat
> 5) Sample n observations from a multivariate normal distribution with
> mean vector=0 and varcov=cormat
> 
> 
> Problem:
> This approach violates the triangle inequality property of correlation
> matrices.  So, the matrix I've constructed is certainly a valid matrix
> but it is not a valid correlation matrix and it blows up when you
> submit it to a random number generator such as rmnorm.  With a small
> matrix you sometimes get lucky and generate a valid correlation matrix
> but as you increase d the probability of obtaining a valid correlation
> matrix drops off quickly.
> 
> So, any ideas on how to construct a correlation matrix with random
> entries that cover the range (or most of the range) or the correlation
> [-1,1]?
> 
> Here's the code I've used that won't work.
> ************************************************
> library(mnormt)
> n <- 1000
> d <- 50
> 
> n.tri <- ((d*(d+1))/2)-d
> r       <- runif(n.tri, min=-.5, max=.5)
> 
> cormat <- diag(c)
> count1=1
> for (i in 1:c){
>        for (j in 1:c){
>                if (i<j) {
>                                cormat[i,j]=r[count1]
>                                cormat[j,i]=cormat[i,j]
>                                count1=count1+1
>                             }
>        }
> }
> eigen(cormat)     # if negative eigenvalue, then the matrix violates
> the triangle inequality
> 
> x <-  rmnorm(n, rep(0, c), cormat)  # Sample the data
> 
> 
> 
> Thanks in advance,
> 
> Rick DeShon
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790



More information about the R-help mailing list