# [R] Correlated Multivariate Distribution Generator

Enrico Schumann enricoschumann at yahoo.de
Thu Jul 28 07:40:47 CEST 2011

```Just a pointer, not a solution: if you use the method I described, it all
comes down to the initial correlation matrix that you specify for the
Gaussian case. If you had the "correct" initial correlation matrix, it would
lead to the desired linear correlation in your binomials. (But it is not
guaranteed that such a "correct" matrix even exists.)

You could now try to adjust the initial matrix such that the obtained linear
correlation in your final variates is as desired. There are a number of
techniques, but I cannot judge for I have never used any of them. There is,
eg, an approach called NORTA (stands for "normals to anything", or something
like that); you way want to look for that.

But to use your specific example:

N <- 200000
C <- c(   1,   0.9,  0.8,  0.8,
0.9,     1,  0.8,  0.8 ,
0.8,   0.8,    1,  0.9 ,
0.8,   0.8,  0.9,     1 )
dim(C) <- c(4,4)
C <- 2*sin(C*pi/6)  # a small correction
X <- rnorm(N*4); dim(X) <- c(N,4)
X <- X %*% chol(C)
U <- pnorm(X)
B <- qnbinom(U, size = 10, prob = 0.2)

> cor(B)
[,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.9061185 0.8096429 0.8083266
[2,] 0.9061185 1.0000000 0.8098284 0.8087700
[3,] 0.8096429 0.8098284 1.0000000 0.9058436
[4,] 0.8083266 0.8087700 0.9058436 1.0000000

> cor(B, method = "spearman")
[,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.8993782 0.7988458 0.7983378
[2,] 0.8993782 1.0000000 0.7988966 0.7985780
[3,] 0.7988458 0.7988966 1.0000000 0.8993137
[4,] 0.7983378 0.7985780 0.8993137 1.0000000

Given the sample size here, I dare say one would never "spot the difference"
in smaller sample (for these parameters).

Regards,
Enrico

> -----Ursprüngliche Nachricht-----
> Von: Yue Yu [mailto:parn.yy at gmail.com]
> Gesendet: Mittwoch, 27. Juli 2011 21:18
> An: enricoschumann at yahoo.de; Doran, Harold
> Cc: r-help at r-project.org
> Betreff: Re: [R] Correlated Multivariate Distribution Generator
>
> Thanks, Enrico and Harold.
>
> I have searched the archives for this topic, but all I can
> find is about the multivariate normal or uniform generator,
> which doesn't help in my case.
>
> Harold's transformation is good for getting the correlation,
> but the negative binomial distribution will be changed after
> transformation, won't keep the desired parameters r and p.
>
> Enrico's method is brilliant for rank correlation, is there
> any algorithm for linear correlation? Even some references
> for multivariate negative binomial generator will help.
>
> Your help are really appreciated
>
> Best,
>
> Yue Yu
>
> On Wed, Jul 27, 2011 at 11:31, Doran, Harold <HDoran at air.org> wrote:
> > Do you mean something like this?
> >
> >> cors <- matrix(c(1, .9, .8, .8, .9, 1, .8, .8, .8, .8, 1,
> .9, .8, .8,
> >> .9, 1), 4) L <- chol(cors) N <- 1000 dat <-
> cbind(rnbinom(N, mu = 4,
> >> size = 1), rnbinom(N, mu = 4, size = 1), rnbinom(N, mu = 4, size =
> >> 1), rnbinom(N, mu = 4, size = 1)) result <- dat %*% L
> >> cor(dat)
> >           [,1]        [,2]        [,3]       [,4] [1,] 1.00000000
> > 0.04227103  0.00358846 0.02860722 [2,] 0.04227103  1.00000000
> > -0.03122457 0.03070221 [3,] 0.00358846 -0.03122457  1.00000000
> > 0.04621639 [4,] 0.02860722  0.03070221  0.04621639 1.00000000
> >
> >> cor(result)
> >          [,1]      [,2]      [,3]      [,4] [1,] 1.0000000
> 0.9021538
> > 0.8183199 0.8158886 [2,] 0.9021538 1.0000000 0.8114415
> 0.8152286 [3,]
> > 0.8183199 0.8114415 1.0000000 0.9145778 [4,] 0.8158886 0.8152286
> > 0.9145778 1.0000000
> >
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org
> >> [mailto:r-help-bounces at r-project.org] On Behalf Of Yue Yu
> >> Sent: Tuesday, July 26, 2011 11:01 PM
> >> To: r-help at r-project.org
> >> Subject: [R] Correlated Multivariate Distribution Generator
> >>
> >> Dear R User,
> >>
> >> I am wondering if there is a way to generate correlated
> multivariate
> >> non-normal distribution?
> >>
> >> For example, I want to generate four correlated negative binomial
> >> series with parameters r=10, p=0.2, based on the correlation
> >> coefficient matrix
> >> |   1   0.9  0.8  0.8 |
> >> | 0.9     1  0.8  0.8 |
> >> | 0.8  0.8     1  0.9 |
> >> | 0.8  0.8  0.9     1 |
> >>
> >> Thank a lot.
> >>
> >> Best,
> >>
> >> Yue Yu
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >

```