[R] generating correlated random variables from different distributions
Greg Snow
Greg.Snow at imail.org
Tue May 4 22:20:59 CEST 2010
Transforming variables will generally change the correlation, so your method will give you correlated variables, but not exactly at the correlations you specify (though with some trial and error you may be able to get close). If you are happy with those results, then your problem is solved, if you need more control over the relationship then something like the Gibbs sample may be what you need.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: Richard and Barbara Males [mailto:rbmales at gmail.com]
> Sent: Sunday, May 02, 2010 1:37 PM
> To: Greg Snow
> Cc: r-help at r-project.org
> Subject: Re: [R] generating correlated random variables from different
> distributions
>
> Thank you for your reply. The application is a Monte Carlo simulation
> in environmental planning. Different possible remediation measures
> have different costs, and produce different results. For example, a
> $20,000 plan may add 10 acres of wetlands and 12 acres of bird
> habitat. The desire is to describe the uncertainty in the cost and
> the outputs (acres of wetlands, acres of bird habitat) by
> distributions. The cost may be described by a normal distribution,
> mean $20k, $5k SD, and the 12 acres of birds may be described by a
> uniform distribution (10 to 14). [These are just examples, not
> representative of a real problem]. We may know (or think) that
> wetlands and bird habitat are positively correlated (0.6), and that
> there is a stronger correlation of both with cost (0.85). So the
> effort is to generate, through MCS, values at each iteration of cost,
> acres of wetland, and acres of bird habitat, such that the resultant
> values give the same correlation, and the values of cost, bird habitat
> and wetland habitat return the input distributions. The overall
> desire is compare different remediation measures, taking into account
> uncertainty in costs and results.
>
> One possible approach (although I have not tried it yet, but will do
> so in the near future) is to generate, for each iteration, three
> independent (0,1) random variables, correlate them via the Cholesky
> approach, and use them as input to the inverse normal, inverse
> uniform, etc. to get the three variables for each iteration. The
> primary distributions of interest are normal, uniform, triangular,
> gamma, and arbitrary cdf, so this approach seems plausible in that
> inverse distributions are readily available.
>
> Thanks in advance.
>
> Dick Males
> Cincinnati, OH, USA
>
> On Thu, Apr 29, 2010 at 12:31 PM, Greg Snow <Greg.Snow at imail.org>
> wrote:
> > The method you are using (multiply by cholesky) works for normal
> distributions, but not necessarily for others (if you want different
> means/sd, then add/multiply after transforming).
> >
> > For other distributions this process can sometimes give the
> correlation you want, but may change the variable(s) to no longer have
> the desired distribution.
> >
> > The short answer to your question is "It Depends", the full long
> answer could fill a full semester course. If you tell us more of your
> goal we may be able to give a more useful answer. The copula package
> is one possibility. If you know the conditional distribution of each
> variable given the others then you can use gibbs sampling.
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.snow at imail.org
> > 801.408.8111
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> >> project.org] On Behalf Of Richard and Barbara Males
> >> Sent: Thursday, April 29, 2010 9:18 AM
> >> To: r-help at r-project.org
> >> Subject: [R] generating correlated random variables from different
> >> distributions
> >>
> >> I need to generate a set of correlated random variables for a Monte
> >> Carlo simulation. The solutions I have found
> >> (http://www.stat.uiuc.edu/stat428/cndata.html,
> >> http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers),
> using
> >> Cholesky Decomposition, seem to work only if the variables come from
> >> the same distribution with the same parameters. My situation is
> that
> >> each variable may be described by a different distribution (or
> >> different parameters of the same distribution). This approach does
> >> not seem to work, see code and results below. Am I missing
> something
> >> here? My math/statistics is not very good, will I need to generate
> >> correlated uniform random variables on (0,1) and then use the
> inverse
> >> distributions to get the desired results I am looking for? That is
> >> acceptable, but I would prefer to just generate the individual
> >> distributions and then correlate them. Any advice much appreciated.
> >> Thanks in advance
> >>
> >> R. Males
> >> Cincinnati, Ohio, USA
> >>
> >> Sample Code:
> >> # Testing Correlated Random Variables
> >>
> >> # reference
> >> http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers
> >> # reference http://www.stat.uiuc.edu/stat428/cndata.html
> >> # create the correlation matrix
> >> corMat=matrix(c(1,0.6,0.3,0.6,1,0.5,0.3,0.5,1),3,3)
> >> cholMat=chol(corMat)
> >> # create the matrix of random variables
> >> set.seed(1000)
> >> nValues=10000
> >>
> >> # generate some random values
> >>
> >> matNormalAllSame=cbind(rnorm(nValues),rnorm(nValues),rnorm(nValues))
> >>
> matNormalDifferent=cbind(rnorm(nValues,1,1.5),rnorm(nValues,2,0.5),rnor
> >> m(nValues,6,1.8))
> >>
> matUniformAllSame=cbind(runif(nValues),runif(nValues),runif(nValues))
> >>
> matUniformDifferent=cbind(runif(nValues,1,1.5),runif(nValues,2,3.5),run
> >> if(nValues,6,10.8))
> >>
> >> # bind to a matrix
> >> print("correlation Matrix")
> >> print(corMat)
> >> print("Cholesky Decomposition")
> >> print (cholMat)
> >>
> >> # test same normal
> >>
> >> resultMatNormalAllSame=matNormalAllSame%*%cholMat
> >> print("correlation matNormalAllSame")
> >> print(cor(resultMatNormalAllSame))
> >>
> >> # test different normal
> >>
> >> resultMatNormalDifferent=matNormalDifferent%*%cholMat
> >> print("correlation matNormalDifferent")
> >> print(cor(resultMatNormalDifferent))
> >>
> >> # test same uniform
> >> resultMatUniformAllSame=matUniformAllSame%*%cholMat
> >> print("correlation matUniformAllSame")
> >> print(cor(resultMatUniformAllSame))
> >>
> >> # test different uniform
> >> resultMatUniformDifferent=matUniformDifferent%*%cholMat
> >> print("correlation matUniformDifferent")
> >> print(cor(resultMatUniformDifferent))
> >>
> >> and results
> >>
> >> [1] "correlation Matrix"
> >> [,1] [,2] [,3]
> >> [1,] 1.0 0.6 0.3
> >> [2,] 0.6 1.0 0.5
> >> [3,] 0.3 0.5 1.0
> >> [1] "Cholesky Decomposition"
> >> [,1] [,2] [,3]
> >> [1,] 1 0.6 0.3000000
> >> [2,] 0 0.8 0.4000000
> >> [3,] 0 0.0 0.8660254
> >> [1] "correlation matNormalAllSame" <== ok
> >> [,1] [,2] [,3]
> >> [1,] 1.0000000 0.6036468 0.3013823
> >> [2,] 0.6036468 1.0000000 0.5005440
> >> [3,] 0.3013823 0.5005440 1.0000000
> >> [1] "correlation matNormalDifferent" <== no good
> >> [,1] [,2] [,3]
> >> [1,] 1.0000000 0.9141472 0.2676162
> >> [2,] 0.9141472 1.0000000 0.2959178
> >> [3,] 0.2676162 0.2959178 1.0000000
> >> [1] "correlation matUniformAllSame" <== ok
> >> [,1] [,2] [,3]
> >> [1,] 1.0000000 0.5971519 0.2959195
> >> [2,] 0.5971519 1.0000000 0.5011267
> >> [3,] 0.2959195 0.5011267 1.0000000
> >> [1] "correlation matUniformDifferent" <== no good
> >> [,1] [,2] [,3]
> >> [1,] 1.0000000 0.2312000 0.0351460
> >> [2,] 0.2312000 1.0000000 0.1526293
> >> [3,] 0.0351460 0.1526293 1.0000000
> >> >
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
More information about the R-help
mailing list