[R] Create correlated data with skew

Dimitris Rizopoulos Dimitris.Rizopoulos at med.kuleuven.be
Tue Sep 18 20:49:27 CEST 2007


Quoting bbolker <bolker at ufl.edu>:

>
>
>
> Mike Lawrence wrote:
>>
>> Hi all,
>>
>> I understand that it is simple to create data with a specific
>> correlation (say, .5) using mvrnorm from the MASS library:
>>
>>  > library(MASS)
>>  > set.seed(1)
>>  >
>>  > a=mvrnorm(
>> + 	n=10
>> + 	,mu=rep(0,2)
>> + 	,Sigma=matrix(c(1,.5,.5,1),2,2)
>> + 	,empirical=T
>> + )
>>  > a
>>              [,1]         [,2]
>> [1,] -1.0008380 -1.233467875
>> [2,] -0.1588633 -0.003410001
>> [3,]  1.2054727 -0.620558768
>> [4,]  1.9580971  2.389495155
>> [5,] -0.9447473 -0.141852055
>> [6,]  0.6236799 -0.826952659
>> [7,]  0.1421782  0.452217611
>> [8,] -0.9050954  0.330991444
>> [9,] -0.7261632  0.217740460
>> [10,] -0.1937206 -0.564203311
>>  > cor(a)
>>       [,1] [,2]
>> [1,]  1.0  0.5
>> [2,]  0.5  1.0
>>
>>
>> But I'm looking to create data where the variables are non-normally
>> distributed (i.e. somewhat skewed). Any suggestions?
>>
>> Mike
>>
>> --
>> Mike Lawrence
>> Graduate Student, Department of Psychology, Dalhousie University
>>
>> Website: http://memetic.ca
>>
>> Public calendar: http://icalx.com/public/informavore/Public
>>
>> "The road to wisdom? Well, it's plain and simple to express:
>> Err and err and err again, but less and less and less."
>> 	- Piet Hein
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> The simplest (?) solution is probably to exponentiate your MVN data,
> leading to a bivariate log-normal distribution.  The hard part is
> specifying the parameters of the lognormal in terms of the desired
> variance-covariance matrix.   Variances are not too bad, but correlation
> may not be solvable.  (Of course, if you don't care much about the
> precise characteristics of the simulated data and/or are willing to
> use some trial and error to get the desired variance/correlation you
> don't have to deal with this.)
> See e.g.
>
> http://www.stuart.iit.edu/faculty/workingpapers/thomopoulos/SomeMeasuresontheStandardBivariateLognormalDistribution.doc
>
>  for some of the relevant formulas.
>
>   good luck
>     Ben Bolker
>


Another possibility is to use copulas, e.g.,

cop <- claytonCopula(2)
x <- mvdc(cop, c("gamma", "gamma"),
     list(list(shape = 3, rate = 2), list(shape = 2, rate = 4)))
x.samp <- rmvdc(x, 1000)


for the Clayton copula with parameter 2, the correlation (in terms of  
Kendall's-tau) is 0.5:

cor(x.samp, method = "kendall")


Best,
Dimitris

-- 
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
      http://www.student.kuleuven.be/~m0390867/dimitris.htm



> View this message in context:   
> http://www.nabble.com/Create-correlated-data-with-skew-tf4468269.html#a12762799
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



More information about the R-help mailing list